The collections are organized into Carnatic and Hindustani sub-collections and present in the dataset folder of the repository. Within each folder, the collections are organized into releases and then into recordings, following the <music_tradition>/<release>/<recording_name>
format, where the release and recording names are provided in a human readable names for easy browsing. The release
is of the form “release_name by lead_artist” where release_name and lead_artist correspond to the names from editorial metadata for the release in MusicBrainz. The recording_name
also corresponds to the name of the recording in MusicBrainz.
Within each recording_name
folder, all the editorial metadata, source and derived files of a recording are stored. The source and derived files for each recording are stored within the folder <music_tradition>/<release>/<recording_name>/
as <recording_name>.<extension>
. The extension
depends on the type of the source or derived files corresponding to the recording. For each recording, the mapping from the MusicBrainz ID (MBID) to the file paths is provided in the following mapping files:
The format of each source and derived files along with the slug is explained in the following table. The last two columns show the number of recordings in the Hindustani music collection (HM) and Carnatic Music collection (CM) that contain the corresponding source/derived files. The file extension and the slug (a machine readable identifier for that particular type of file to be used for API access) is also listed below.
File type | File extension | Slug | Format | Source/Derived | Description | HM | CM |
---|---|---|---|---|---|---|---|
metadata | .json |
json |
JSON | Metadata | A json containing the editorial metadata about the recording | 108 | 249 |
mp3 | .mp3.mp3 |
mp3 |
Audio file format | Source | Audio file of a stereo mix of the recording | 108 | 249 |
multitrack-vocal | .multitrack-vocal.mp3 |
multitrack-vocal |
Audio file format | Source | Audio file corresponding to the lead vocal track (if available) | - | 168 |
multitrack-vocal-s | .multitrack-vocal-s.mp3 |
multitrack-vocal-s |
Audio file format | Source | Audio file corresponding to the secondary vocal track (if available) | - | 24 |
multitrack-violin | .multitrack-violin.mp3 |
multitrack-violin |
Audio file format | Source | Audio file corresponding to the violin track (if available) | - | 168 |
multitrack-mridangam-left | .multitrack-mridangam-left.mp3 |
multitrack-mridangam-left |
Audio file format | Source | Audio file corresponding to the mridangam bass drum track (if available) | - | 168 |
multitrack-mridangam-right | .multitrack-mridangam-right.mp3 |
multitrack-mridangam-right |
Audio file format | Source | Audio file corresponding to the mridangam treble drum track (if available) | - | 168 |
multitrack-ghatam | .multitrack-ghatam.mp3 |
multitrack-ghatam |
Audio file format | Source | Audio file corresponding to the ghatam track (if available) | - | 46 |
tonic | .ctonic.txt |
ctonic |
Tonic file format | Derived | A file with the tonic of the recording in Hz | 108 | 249 |
pitch | .pitch.txt |
pitch |
Pitch file format | Derived | A file with predominant melody (F0) extracted from mixed stereo track using Melodia algorithm in Essentia | 108 | 249 |
pitch-vocal | .pitch-vocal.txt |
pitch-vocal |
Pitch file format | Derived | A file with predominant melody (F0) extracted from the vocal track using Melodia algorithm in Essentia | - | 168 |
mphrases-manual | .mphrases-manual.txt |
mphrases-manual |
Melodic phrases file format | Source | A text file with manually annotated melodic phrases | 53 | 117 |
sama-manual | .sama-manual.txt |
sama-manual |
Sama file format | Source | A text file with timestamps of sama in the audio recording | 75 | 141 |
tempo-manual | .tempo-manual.txt |
tempo-manual |
Tempo file format | Source | This file shows different tempo related annotations derived from sama timestamps | 75 | 133 |
sections-manual | .sections-manual-p.txt |
sections-manual-p |
Sections file format | Source | A text file containing the section boundaries and section names | 75 | 119 |
All audio files (both stereo mixes and multitracks) in the collection are stored as 128 kbps mp3 audio sampled at 44.1 kHz. However, since large audio files are not suitable for storing in a git repository, the audio files can be accessed through the API or through the tarball of audio recordings. However, to enable versioning of the audio files, we store the md5 checksum of the audio file in the repository (as .md5
files) so that it can be used to verify and reconcile with the audio files.
Tonic file is a text file with a single float value corresponding to the tonic of the recording expressed in Hz.
The pitch files in the collection are derived from the pitch extractor in PyCompMusic tools, which internally uses the Melodia algorithm to extract predominant melody. The pitch is typically computed over short audio frames with a hop size of 4.4 ms. The file is stored as a text file with the format
timestamp pitch_value
in each line of the file, e.g.
11.2400000 154.6684418
11.2444444 156.4656219
11.2488889 158.2836609
The timestamp
are expressed in seconds and pitch_value
in Hz. A zero pitch value indicates unvoiced regions.
Since pitch files are large text files that are not suitable for storing in a git repository, the pitch files can be accessed through the API or through the tarball of audio recordings. However, to enable versioning of the pitch files, we store the md5 checksum of the pitch file in the repository (as .md5
files) so that it can be used to verify and reconcile with the pitch files.
The manually annotated melodic phrases are stored as a text file. Each line of the file corresponds to one instance of an annotated phrase and stored in the format
start_time flag duration notes
e.g.
19.159183673 1 2.253061224 pdnspmg
28.816326530 2 2.089795918 pngpm
The start_time indicates the start time of the phrase annotation in the audio file expressed in seconds. The flag
takes two values, with 1 indicating a representative phrase of a recording, and 2 representative phrase of a rāga, which also implies a representative phrase of the recording. duration
indicates the duration of the phrase expressed in seconds. notes
denotes the sequence of notes played/sung in the phrase.
The sama file is a text file that contains a non-decreasing series of time instants corresponding to successive sama of the tāla cycles in the music piece, aligned with the audio recording. Each line of the file is one timestamp of a sama (expressed in seconds), e.g.
18.426
23.187
27.944
32.683
37.390
42.055
The section file stores the different structural sections in a recording. The sections in Carnatic music are lyrical, while Hindustani music sections are based on different bandishes in the same rāg with different tāl and lay. The file formats are different for Carnatic and Hindustani music collections.
For Hindustani music recordings, the section file has one section annotation per line with the following format:
start_time,section_number,duration,section_name
e.g.
0.0,1,29.694,Ālāp
29.694,2,198.953,Khyāl (vilambit ēktāl)
228.647,3,104.179122449,Tarānā (dr̥t tīntāl)
start_time
denotes the start of the section (in seconds) in the audio recording. duration
indicates the duration of the section (in seconds) and section_name
is the name of the section. section_number
indicates the serial number of the section in the recording. The section name also includes the name of the tāl if available.
For Carnatic music recordings, the section file has one section annotation per line with the following format:
start_time ignore_flag duration section_name
e.g.
50.808163265 1 75.069387755 Anupallavi
125.87755102 1 137.697959184 Caraṇam
start_time
denotes the start of the section (in seconds) in the audio recording. duration
indicates the duration of the section (in seconds) and section_name
is the name of the section. ignore_flag
does not capture any useful information and can be ignored.
The tempo file stores different tempo related annotations derived from sama timestamps for each section of the audio recording. The file format is different for Carnatic and Hindustani music collections to capture different information.
For Hindustani music recordings, since the sections are related to rhythmic changes (laya), the tempo file stores the tempo information for each timestamped section of the recording. Each line of the tempo file is stored as,
tempo, matra_interval, sama_interval, matras_per_cycle, start_time, end_time
e.g. if there are three sections in an audio recording, the tempo file might look like:
-1, -1, -1, -1, 0.000, 29.000
26, 2.323, 27.879, 12, 29.694, 224.114
243, 0.247, 3.955, 16, 228.647, 317.696
tempo
stores the median tempo for the section in mātrās per minute (MPM), matra_interval
is the tempo expressed as the duration of the mātra (essentially dividing 60 by tempo, expressed in seconds), sama_interval
is the median duration of one tāl cycle in the section, matras_per_cycle
is an indicator of the structure of the tāl, showing the number of mātrā in a cycle of the tāl of the recording. The last two columns, start_time
and end_time
are expressed in seconds and correspond to the start time and end time of the section in the audio recording.
For Carnatic music recordings, since the sections are lyrical, they do not typically associated with a change of tempo. Hence the tempo files are stored with tempo information on a single line corresponding to the entire audio file treated as one section, with the following format:
tempo_apm, tempo_bpm, sama_interval, beats_per_cycle, subdivisions
e.g.
340, 170, 2.471, 14, 2
tempo_apm
and tempo_bpm
stores the median tempo of the recording in aksharas per minute (APM) and beats per minute (BPM), respectively. sama_interval
is the median duration (in seconds) of one tāla cycle in the recording. The last two columns capture the structure of the tāla, with beats_per_cycle
storing the number of beats in one cycle of the tāla of the recording, and subdivision
storing the number of aksharas per beat of the tāla of the recording (called the naḍe in Carnatic music terminology).
In both Carnatic and Hindustani recordings, a value of -1 for tempo related values indicate a section/recording without rhythmic content (such as an melodic improvisation).