Melodic pattern dicovery#
Melodic pattern discovery is a core task within the melodic analysis of Indian Art Music, most of the approaches building on top of time-series of pitch [GSIS15a, GSIS16, GSIS15b, IDBM13, RRG+14]. In fact, [VK18] reviews the melodic recognition task in Carnatic Music and concludes that using pitch data drives to better performance. On the same line, a musically-relevant statistical analysis of melodic patterns is proposed in [VAM17], aiming at providing a more handy representation of this important aspect for Carnatic and Hindustani Music.
More recently, the task of melodic pattern discovery has been approached taking advantage of DL techniques by combining the learnt features from a complex autoencoder and an attention-based vocal pitch extraction model [NPRPS22a]. Multiple design choices of this entire process are informed by tradition characteristics.
## Installing (if not) and importing compiam to the project
import importlib.util
if importlib.util.find_spec('compiam') is None:
## Bear in mind this will only run in a jupyter notebook / Collab session
%pip install compiam
import compiam
# Import extras and supress warnings to keep the tutorial clean
import os
import numpy as np
from pprint import pprint
import warnings
warnings.filterwarnings('ignore')
Sañcara search#
In this walkthrough we demonstrate the task of repeated melodic motif identification in Carnatic Music. The methodologies used are presented in [NPRPS22a] and [PRNP+23] for which compIAM repository serves as the most up to date and well-maintained implementation.
In the compIAM documentation we can see that the tool we showcase in this page has torch
as a dependency for the DL models.
%pip install torch
1. Data#
This notebook works with a performance from the Saraga Carnatic Dataset. This performance audio is not provided alongside this notebook as part of the compIAM package but can be downloaded manually following the instructions in the Access section of the provided link. Although we encourage the reader to try with their own Carnatic performance audio.
audio_path = os.path.join("..", "audio", "pattern_finding", "Koti Janmani.multitrack-vocal.mp3") # path to audio
raga = 'ritigowla' # carnatic raga
2. Pitch processing#
2.1 Predominant pitch extraction#
Owing to the the coarticulation (merging) of svaras through gamakas, musically salient units in Carnatic Music are often better characterised by continuous time series pitch data rather than transcription to symbolic notation [NPRPS22b, Pea16].
However, Carnatic Music constitutes a difficult case for vocal pitch extraction – although performances place strong emphasis on a monophonic melodic line from the soloist singer, heterophonic melodic elements also occur, for example from the accompanying violinist who shadows the melody of the soloist often at a lag and with variation. In addition, there are the sounds of the tanpura (plucked lute that creates an oscillating drone) and pitched percussion instruments [PRNP+23].
Here we use a pretrained FTA-Net model for the task. This is provided with the compIAM package via the compiam.load_model()
function. The model is an attention-based network that leverages and fuses information from the frequency and periodicity domains to capture the correct pitch values for the predominant source. It learns to focus on this source by using an additional branch that helps reduce the false alarm rate (detecting pitch values that do not correspond to the source we target) [YSYL21].
This FTANet instance is trained on the Saraga Carnatic Melody Synth dataset (SMCS), a dataset including more than 1000 minutes of time-aligned and continuous vocal melody annotations for the Carnatic music tradition [PRNP+23]. See also the FTANet-Carnatic walkthrough in this tutorial.
ftanet = compiam.load_model("melody:ftanet-carnatic")
Extracting the vocal pitch track:
pitch_track = ftanet.predict(audio_path)
pitch = pitch_track[:,1] # Pitch in Hz
time = pitch_track[:,0] # Time in seconds
timestep = time[2]-time[1] # time in seconds between elements of pitch track
We can interpolate small silences to account for minor errors in the pitch exrraction process, typically caused by glottal sounds and sudden decrease of pitch salience in gamakas [NPRPS22b].
from compiam.utils.pitch import interpolate_below_length
pitch = interpolate_below_length(
pitch, # track to interpolate
0, # value to interpolate
350*0.001/timestep # maximum gap in number sequence elements to interpolate for
)
2.2 Visualising predominant pitch#
We can plot our pitch track using the visualisation tools in compiam.visualisation
from compiam.visualisation.pitch import plot_pitch, flush_matplotlib
from compiam.utils import ipy_audio
We want to accompany pitch plots with audio so load raw audio too.
# let's load the audio also
import librosa
sr = 44100 # sampling_rate
audio, _ = librosa.load(audio_path,sr=sr)
t1 = 304 # in seconds
t2 = 324 # in seconds
t1s = round(t1/timestep) # in sequence elements
t2s = round(t2/timestep) # in sequence elements
this_pitch = pitch[t1s:t2s]
this_time = time[t1s:t2s]
plot_pitch(this_pitch, this_time, title='Excerpt of Koti Jamani by The Akkarai Sisters')
ipy_audio(audio, t1, t2, sr=sr)