Rhythm analysis

Percussion Transcription

Mnemonic Transcription

Note

REQUIRES: torch

class compiam.rhythm.transcription.mnemonic_transcription.MnemonicTranscription(syllables, feature_kwargs={'hop_length': 256, 'n_mfcc': 13, 'win_length': 1024}, model_kwargs={'algorithm': 'viterbi', 'n_components': 7, 'n_iter': 100, 'n_mix': 3, 'params': 'mcw'}, sr=44100)[source]

bōl or solkattu transcription from audio. Based on model presented in [1]

[1] Gupta, S., Srinivasamurthy, A., Kumar, M., Murthy, H., & Serra, X. (2015, October). Discovery of Syllabic Percussion Patterns in Tabla Solo Recordings. In Proceedings of the 16th International Society for Music Information Retrieval Conference (ISMIR 2015) (pp. 385–391). Malaga, Spain.

extract_features(audio, sr=None)[source]

Convert input audio to features MFCC features

Parameters:
  • audio (np.array) – time series representation of audio

  • sr – sampling rate of audio to train on (default <self.sr>)

  • sr – int

Returns:

array of features

Return type:

np.array

get_sample_ix(annotations, audio, syl)[source]

Convert input onset annotations to list of in/out points for a specific bōl/solkattu syllable, <syl>

Parameters:
  • annotations (list/iterable) – onset annotations of the form [(timestamp in seconds, bōl/solkattu),… ]

  • audio (np.array) – time series representation of audio

  • syl (str) – bōl/solkattu syllable to extract

Returns:

list or [(t1,t2),..] where t1 and t2 correspdong to in and out points of single bōls/solkattus

Return type:

str

load_annotations(annotation_path)[source]

Load onset annotations from <annotation_path>

Parameters:

annotation_path (str) – path to onset annotations for one recording of the form (timestamp in seconds, bōl/solkattu syllable)

Returns:

list of onset annotations (timestamp seconds, bōl/solkattu syllable)

Return type:

list

map(a)[source]

Map input bōl/solkattu, <a> to reduced bōl/solkattu vocabulary

Parameters:

a (np.array) – bōl/solkattu string (that must exist in self.mapping)

Returns:

mapped bōl/solkattu label

Return type:

str

predict(file_paths, onsets=None, sr=None)[source]

Predict bōl/solkattu transcription for list of input audios at <file_paths>.

Parameters:
  • file_paths (list or string) – Either one file_path or list of file_paths to audios to predict on

  • onsets (list or None) – list representing onsets in audios. If None, compiam.rhythm.akshara_pulse_tracker is used to automatically identify bōl/solkattu onsets. If passed should be a list of onset annotations, each being a list of bōl/solkattu onsets in seconds. <onsets> should contain one set of onset annotations for each file_path in <file_paths>

  • sr – sampling rate of audio to train on (default <self.sr>)

  • sr – int

Returns:

if <file_paths> is a list, then return a list of transcriptions, each transcription of the form [(timestamp in seconds, bōl/solkattu),…]. Or if <file_paths> is a single fiel path string, return a single transcription.

Return type:

list

predict_sample(sample)[source]

Predict one sample using internal models. One sample should correspond to one bōl/solkattu

Parameters:

sample (np.array) – Numpy array features corresponding to <sample> (extracted using self.extract_features)

Returns:

bōl/solkattu label

Return type:

str

predict_single(file_path, onsets=None, sr=None)[source]

Predict bōl/solkattu transcription directly from audio time series (such as for example that loaded by librosa.load)

Parameters:
  • file_path (str) – File path to audio to analyze

  • onsets (list or None) – If None, compiam.rhythm.akshara_pulse_tracker is used to automatically identify bōl/solkattu onsets. If passed <onsets> should be a list of bōl/solkattu onsets in seconds

  • sr – sampling rate of audio to train on (default <self.sr>)

  • sr – int

Returns:

bōl/solkattu transcription of form [(time in seconds, syllable),… ]

Return type:

list

save(model_path)[source]

Save model at path as .pkl

Parameters:

model_path (strs) – Path to save model to

train(file_paths_audio, file_paths_annotation, sr=None)[source]

Train one gaussian mixture model hidden markov model for each syllables passed at initialisation on input audios and annotations passed via <file_paths_audio> and <file_paths_annotation>. Training hyperparameters are configured upon intialisation and can be accessed/changed via self.model_kwargs.

Parameters:
  • file_paths_audio (list) – List of file_paths to audios to train on

  • file_paths_annotation (list) – List of file_paths to annotations to train on. annotations should be in csv format, with no header of (timestamp in seconds, <syllable>). Annotated syllables that do not correspond to syllables passed at initialisation will be ignored One annotations path should be passed for each audio path

  • sr – sampling rate of audio to train on (default <self.sr>)

  • sr – int

Meter tracking

Akshara Pulse Tracker

class compiam.rhythm.meter.akshara_pulse_tracker.AksharaPulseTracker(Nfft=4096, frmSize=1024, Fs=44100, hop=512, fBands=array([[10, 110], [110, 500], [500, 3000], [3000, 5000], [5000, 10000], [0, 22000]]), songLenMin=600, octCorrectParam=0.25, tempoWindow=8, stepSizeTempogram=0.5, BPM=array([40., 40.5, 41., ..., 599., 599.5, 600.], shape=(1121,)), minBPM=120, octTol=20, theta=0.005, delta=1000000, maxLen=0.6, binWidth=0.01, thres=0.05, ignoreTooClose=0.6, decayCoeff=15, backSearch=[5.0, 0.5], alphaDP=3, smoothTime=2560, pwtol=0.2)[source]

Akshara onset detection. CompMusic Rhythm Extractor.

extract(input_data, input_sr=44100, verbose=True)[source]

Run extraction of akshara pulses from input audio file

Parameters:
  • input_data – path to audio file or numpy array like audio signal

  • input_sr – sampling rate of the input array of data (if any). This variable is only relevant if the input is an array of data instead of a filepath

  • verbose – verbose level

Returns:

dict containing estimation for sections, matra period, akshara pulses, and tempo curve

TCN Carnatic

Note

REQUIRES: torch

class compiam.rhythm.meter.tcn_carnatic.TCNTracker(post_processor='joint', model_version=42, model_path=None, download_link=None, download_checksum=None, gpu=-1)[source]

TCN beat tracker tuned to Carnatic Music.

download_model(model_path=None, force_overwrite=True)[source]

Download pre-trained model.

load_model(model_path)[source]

Load pre-trained model weights.

predict(input_data: str, sr: int = 44100, min_bpm=55, max_bpm=230, beats_per_bar=[3, 5, 7, 8]) Dict[source]

Run inference on input audio file.

Parameters:
  • input_data – path to audio file or numpy array like audio signal.

  • sr – sampling rate of the input audio signal (default: 44100).

  • min_bpm – minimum BPM for beat tracking (default: 55).

  • max_bpm – maximum BPM for beat tracking (default: 230).

  • beats_per_bar – list of possible beats per bar for downbeat tracking (default: [3, 5, 7, 8]).

Returns:

a 2-D list with beats and beat positions.

preprocess_audio(input_data: str, input_sr: int) ndarray[source]

Preprocess input audio file to extract features for inference. :param audio_path: Path to the input audio file. :param input_sr: Sampling rate of the input audio file.

Returns:

Preprocessed features as a numpy array.

static save_pitch(data, output_path)[source]

Calling the write_csv function in compiam.io to write the output beat track in a file

Parameters:
  • data – the data to write

  • output_path – the path where the data is going to be stored

Returns:

None

select_gpu(gpu='-1')[source]

Select the GPU to use for inference.

Parameters:

gpu – Id of the available GPU to use (-1 by default, to run on CPU), use string: ‘0’, ‘1’, etc.

Returns:

None