Automatic raga recognition#

Automatic raga recognition works intend to identify the raga that is performed in an Indian Art Music recording. Tradition-specific crafted features [GSIS16] or distributional representation of raga grammar [GR18] are used to identify and relate ragas. DL is also used to automatically recognize ragas on top of pitch curves using Long-Short Memory Time (LSTM) networks, given its ability to capture sequence information [MC19]. In a recently published paper – included in the list of ISMIR 2022 accepted papers – the authors propose a novel approach for classifying ragas in a multimodal scenario using pitch curves and tonic (audio domain), and hand movements (video domain) [CRS+22].

Most of the research on this topic builds on top of the Dunya Carnatic and Hindustani corpora [SKG+14].

## Installing (if not) and importing compiam to the project
import importlib.util
if importlib.util.find_spec('compiam') is None:
    ## Bear in mind this will only run in a jupyter notebook / Collab session
    %pip install compiam
import compiam

# Import extras and supress warnings to keep the tutorial clean
import numpy as np
from pprint import pprint
import warnings
warnings.filterwarnings('ignore')

Let’s first list the available tools for the task of raga recognition.

compiam.melody.raga_recognition.list_tools()
['DEEPSRGM*']

DEEPSRGM#

We already see that the available tool appears with * at the end of the name, which indicates that it may be loaded using the compiam.load_models() wrapper. Observing the entry of this tool in the compiam documentation, we note that an optional dependency is needed to properly load this tool. Let’s install it:

%pip install torch==1.8.0

DEEPSRGM [MC19] is a DL-based model that uses melodic features to automatically identify the raga from an Indian Art Music recording. In the original paper, the authors use the Indian Art Music Raga Dataset [GSG+16], and report accuracy higher than 95%.

# This tool has * at the end: we can load the pre-trained instance
deepsrgm = compiam.load_model("melody:deepsrgm")

Important

Deep Learning model checkpoints tend to be large in size, therefore storing these in compiam may become unsustainable. For that reason, we store the checkpoints in the cloud and download these when the user initializes a model using the wrapper. Note that you can specify to which location the checkpoint should be donwloaded by specifying data_home argument in load_model().

# Showing raga mapping
pprint(deepsrgm.mapping)
{0: 'Bhairav',
 1: 'Madhukauns',
 2: 'Mōhanaṁ',
 3: 'Hamsadhvāni',
 4: 'Varāḷi',
 5: 'Dēś',
 6: 'Kamās',
 7: 'Yaman kalyāṇ',
 8: 'Bilahari',
 9: 'Ahira bhairav'}

Therefore, the provided pre-trained model has been trained targetting to the ragas included in this mapping. Let’s select some examples that are tagged with raga instances included in the default mapping to evaluate this tool and showcase how it works.

# Initialize saraga Carnatic on our example audio folder
saraga_carnatic = compiam.load_dataset(
    "saraga_carnatic", data_home="./../audio/mir_datasets/"
)

Note

Please note that to save space, we are not loading the entire dataset but just a single recording. However, we can still use the dataset loader, which in this case will properly work as long as we only retrieve the data for this particular recording.

The recording we have included beforehand for tutoring purposes is Sri Raghuvara Sugunaalaya, performed by Sanjay Subrahmanyan, at Vani Mahal. We will load this track by first parsing its corresponding id in the mirdata dataloader (we do that beforehand). The said identifier is: 109_Sri_Raghuvara_Sugunaalaya.

# We first load the {ids: tracks} dict for Saraga Carnatic
saraga_tracks = saraga_carnatic.load_tracks()

# Now we can get a specific track
track_data = saraga_tracks["109_Sri_Raghuvara_Sugunaalaya"]
print("This recording includes raaga:", track_data.metadata["raaga"][0]["name"])
print("This recordings has unique Dunya ID:", track_data.metadata["raaga"][0]["uuid"])
This recording includes raaga: Bhairavi
This recordings has unique Dunya ID: 123b09bd-9901-4e64-a65a-10b02c9e0597

Nice! Raga Bhairavi is both in this recording and included in the mapping of DEEPSRGM, therefore considered during the training process. We first need to extract the features from an audio recording.

# Computing features for the given recording
feat = deepsrgm.get_features(track_data.audio_path)

Finally, we can pass the computed features through the pre-trained model to perform inference and try to automatically recognise the raga performed in the recording.

deepsrgm.predict(feat)

Oops! Why is this line of code not being executed? This .predict() method runs inference with the DEEPSRGM model using the passed features. As you may now, DL models tend to occupy an important amount of memory from your computer to run, especially in the training stage. Some light-weight models can be run from a conventional laptop, but in some other cases you might need a machine with enough power to run your models and perform inference.

Tip

As mentioned in the introduction, Google Collab has GPU and TPU access which may allow you running certain models that are too big for your machine.

## LOAD THIS NOTEBOOK INTO A GOOGLE COLLAB SESSION
## THEN, UNCOMMENT AND RUN THIS LINE:

# deepsrgm.predict(feat)