Rhythm analysis

Percussion Transcription

Mnemonic Transcription

Note

REQUIRES: torch

class compiam.rhythm.transcription.mnemonic_transcription.MnemonicTranscription(syllables, feature_kwargs={'hop_length': 256, 'n_mfcc': 13, 'win_length': 1024}, model_kwargs={'algorithm': 'viterbi', 'n_components': 7, 'n_iter': 100, 'n_mix': 3, 'params': 'mcw'}, sr=44100)[source]

bōl or solkattu transcription from audio. Based on model presented in [1]

[1] Gupta, S., Srinivasamurthy, A., Kumar, M., Murthy, H., & Serra, X. (2015, October). Discovery of Syllabic Percussion Patterns in Tabla Solo Recordings. In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR 2015) (pp. 385–391). Malaga, Spain.

extract_features(audio, sr=None)[source]

Convert input audio to features MFCC features

Parameters:
  • audio (np.array) – time series representation of audio

  • sr – sampling rate of audio to train on (default <self.sr>)

  • sr – int

Returns:

array of features

Return type:

np.array

get_sample_ix(annotations, audio, syl)[source]

Convert input onset annotations to list of in/out points for a specific bōl/solkattu syllable, <syl>

Parameters:
  • annotations (list/iterable) – onset annotations of the form [(timestamp in seconds, bōl/solkattu),… ]

  • audio (np.array) – time series representation of audio

  • syl (str) – bōl/solkattu syllable to extract

Returns:

list or [(t1,t2),..] where t1 and t2 correspdong to in and out points of single bōls/solkattus

Return type:

str

load_annotations(annotation_path)[source]

Load onset annotations from <annotation_path>

Parameters:

annotation_path (str) – path to onset annotations for one recording of the form (timestamp in seconds, bōl/solkattu syllable)

Returns:

list of onset annotations (timestamp seconds, bōl/solkattu syllable)

Return type:

list

map(a)[source]

Map input bōl/solkattu, <a> to reduced bōl/solkattu vocabulary

Parameters:

a (np.array) – bōl/solkattu string (that must exist in self.mapping)

Returns:

mapped bōl/solkattu label

Return type:

str

predict(file_paths, onsets=None, sr=None)[source]

Predict bōl/solkattu transcription for list of input audios at <file_paths>.

Parameters:
  • file_paths (list or string) – Either one file_path or list of file_paths to audios to predict on

  • onsets (list or None) – list representing onsets in audios. If None, compiam.rhythm.akshara_pulse_tracker is used to automatically identify bōl/solkattu onsets. If passed should be a list of onset annotations, each being a list of bōl/solkattu onsets in seconds. <onsets> should contain one set of onset annotations for each file_path in <file_paths>

  • sr – sampling rate of audio to train on (default <self.sr>)

  • sr – int

Returns:

if <file_paths> is a list, then return a list of transcriptions, each transcription of the form [(timestamp in seconds, bōl/solkattu),…]. Or if <file_paths> is a single fiel path string, return a single transcription.

Return type:

list

predict_sample(sample)[source]

Predict one sample using internal models. One sample should correspond to one bōl/solkattu

Parameters:

sample (np.array) – Numpy array features corresponding to <sample> (extracted using self.extract_features)

Returns:

bōl/solkattu label

Return type:

str

predict_single(file_path, onsets=None, sr=None)[source]

Predict bōl/solkattu transcription directly from audio time series (such as for example that loaded by librosa.load)

Parameters:
  • file_path (str) – File path to audio to analyze

  • onsets (list or None) – If None, compiam.rhythm.akshara_pulse_tracker is used to automatically identify bōl/solkattu onsets. If passed <onsets> should be a list of bōl/solkattu onsets in seconds

  • sr – sampling rate of audio to train on (default <self.sr>)

  • sr – int

Returns:

bōl/solkattu transcription of form [(time in seconds, syllable),… ]

Return type:

list

save(model_path)[source]

Save model at path as .pkl

Parameters:

model_path (strs) – Path to save model to

train(file_paths_audio, file_paths_annotation, sr=None)[source]

Train one gaussian mixture model hidden markov model for each syllables passed at initialisation on input audios and annotations passed via <file_paths_audio> and <file_paths_annotation>. Training hyperparameters are configured upon intialisation and can be accessed/changed via self.model_kwargs.

Parameters:
  • file_paths_audio (list) – List of file_paths to audios to train on

  • file_paths_annotation (list) – List of file_paths to annotations to train on. annotations should be in csv format, with no header of (timestamp in seconds, <syllable>). Annotated syllables that do not correspond to syllables passed at initialisation will be ignored One annotations path should be passed for each audio path

  • sr – sampling rate of audio to train on (default <self.sr>)

  • sr – int

Meter tracking

Akshara Pulse Tracker

class compiam.rhythm.meter.akshara_pulse_tracker.AksharaPulseTracker(Nfft=4096, frmSize=1024, Fs=44100, hop=512, fBands=array([[10, 110], [110, 500], [500, 3000], [3000, 5000], [5000, 10000], [0, 22000]]), songLenMin=600, octCorrectParam=0.25, tempoWindow=8, stepSizeTempogram=0.5, BPM=array([40., 40.5, 41., ..., 599., 599.5, 600.]), minBPM=120, octTol=20, theta=0.005, delta=1000000, maxLen=0.6, binWidth=0.01, thres=0.05, ignoreTooClose=0.6, decayCoeff=15, backSearch=[5.0, 0.5], alphaDP=3, smoothTime=2560, pwtol=0.2)[source]

Akshara onset detection. CompMusic Rhythm Extractor.

extract(input_data, input_sr=44100, verbose=True)[source]

Run extraction of akshara pulses from input audio file

Parameters:
  • input_data – path to audio file or numpy array like audio signal

  • input_sr – sampling rate of the input array of data (if any). This variable is only relevant if the input is an array of data instead of a filepath

  • verbose – verbose level

Returns:

dict containing estimation for sections, matra period, akshara pulses, and tempo curve