(Last updated: 04 April 2022)
This is the companion website to the Saraga collections. Saraga collections are currently the largest annotated open data collections available for computational research on Indian Art Music. They comprise audio, editorial metadata, manual and automatically extracted annotations for different aspects of melody, rhythm and structure. The Saraga collections were conceived and built as a part of the CompMusic project and is currently maintained by the Music Technology Group, Universitat Pompeu Fabra, Barcelona.
This page provides up-to-date statistics of the collections, detailed documentation on the organization of the collections, available annotations and their formats, example scripts to access the collections through the PyCompMusic API.
This website is intended to be the companion website to the following paper, which describes the Saraga collections in detail:
A. Srinivasamurthy, S. Gulati, R. Caro, X. Serra, "Saraga: Open Datasets for Research on Indian Art Music", Empirical Musicology Review, vol. 16, no. 1, pp. 85-98, 2021.
If you use the Saraga data collections in your work, please cite the above paper and include a link to this webpage (https://mtg.github.io/saraga/).
As of 03 Oct 2020, here are the coverage and completeness statistics of the Saraga collections.
Coverage statistics describe the different unique entities of different concepts in the data collections.
|Total recordings in multi-track
|Total artists (lead+accompanying)
Completeness statistics describe how complete the available data and metadata are, by reporting the total number of recordings in the collections that contain the associated metadata. Detailed information is provided along with a description of file formats, but a summary is provided below. The last two columns show the number of recordings in the Hindustani music collection (HM) and Carnatic Music collection (CM) that contain the corresponding data/metadata.
-NA indicates “Not Available”.
|Audio file (mp3)
|raga/s of the music piece
|taala/s of the music recording
|laya/s of the music recording
|Music form of the recording
|The tonic of the recording
|F0 extracted from mixed stereo track
|F0 extracted from the vocal track
|Lead vocal track
|Secondary vocal track
|Mridangam bass drum track
|Mridangam treble drum track
|Manually annotated melodic phrases
|Time-aligned sama annotations
|Tempo related annotations
|Section boundaries and section names
The data in Saraga collections are organized by individual music cultures - Carnatic and Hindustani, and further organized into releases and recordings. For each recording, we have associated audio, editorial metadata and annotations identified by the MusicBrainz ID of the recording. Further details are here: Organization and file formats
There are primarily two ways of accessing the Saraga data collections
The data in the collections can be accessed through the PyCompMusic API built to access Dunya collections. In addition, the Saraga repository provides a dump of all editorial metadata and some annotations. Larger files such as audio and pitch annotations are not amenable for storage in a git repository and hence a zip archive of the data is available. For tracking and versioning, the Saraga repository stores the md5 checksum of the file for comparsion with the files in the zip archive. Further details are here: Access to data and example scripts
We illustrate some of the applications that use Saraga collections for music education or exploration:
The Musical Bridges project aims at bringing together different cultures through music understanding. The tools in Musical Bridges offer interactive visualizations synchronized to recordings from the Saraga collections and facilitate the comprehension of some of the key elements of Carnatic and Hindustani music traditions.
Saraga App: We have built an android mobile app that is a music exploration tool and provides an enhanced listening experience with a subset of Saraga Hindustani and Carnatic collections. The app is free - we invite you to download the app from Play Store and try it out.
The audio, metadata, annotations in the collections and the code in this repository are released under different open licenses. Please see LICENSE file for more details.
Several people and institutions have contributed to the Saraga data collections. Please see the list of contributors for a complete list of contributors.
Saraga collections are envisioned to be living and growing collections of data. Community contributions to the collections in terms of more audio data, editorial metadata, manual annotations and automatic extractors are welcome. Research problems and innovative applications using Saraga data collections can be showcased here. If you have audio data and annotations that align well with Saraga collections in content, quality and distribution licenses, please contact any of us below to contribute them to the data collections.
The editorial metadata for the audio recordings that already exist in the collection can be added/updated on MusicBrainz. New manual annotations or updates to the existing ones could be submitted through a pull request to the git repository, which will be merged after due verification by community experts. Automatic annotations can be added to the recordings in the collection by writing an extractor in PyCompMusic tools, which will then extract automatic annotations to be added as derived files.
If you have any questions, queries or comments about the data collections, or if you wish to contribute to the collections, please contact any of us below:
Ajay Srinivasamurthy (firstname.lastname@example.org)
Sankalp Gulati (email@example.com)
Rafael Caro (firstname.lastname@example.org)
Prof. Xavier Serra (email@example.com)
We would be glad to know more about how you have used the Saraga data collections in your work!