Melodic analysis

Tonic Identification

Tonic Indian Art Music (Multipitch approach)

Note

REQUIRES: essentia

class compiam.melody.tonic_identification.tonic_multipitch.TonicIndianMultiPitch(bin_resolution=10, frame_size=2048, harmonic_weight=0.8, hop_size=128, magnitude_compression=1, magnitude_threshold=40, max_tonic_frequency=375, min_tonic_frequency=100, num_harmonics=20, ref_frequency=55, sample_rate=44100)[source]

MultiPitch approach to extract the tonic from IAM music signals.

extract(input_data, input_sr=44100)[source]

Extract the tonic from a given file.

Parameters:

input_data – path to audio file or numpy array like audio signal
input_sr – sampling rate of the input array of data (if any). This variable is only relevant if the input is an array of data instead of a filepath.

Returns:

a floating point number representing the tonic of the input recording.

Pitch Extraction

Melodia

Note

REQUIRES: essentia

class compiam.melody.pitch_extraction.melodia.Melodia(bin_resolution=10, filter_iterations=3, frame_size=2048, guess_unvoiced=False, harmonic_weight=0.8, hop_size=128, magnitude_compression=1, magnitude_threshold=40, max_frequency=20000, min_duration=100, min_frequency=80, num_harmonics=20, peak_distribution_threshold=0.9, peak_frame_threshold=0.9, pitch_continuity=27.5625, reference_frequency=55, sample_rate=44100, time_continuity=100, voice_vibrato=False, voicing_tolerance=0.2)[source]

Melodia predominant melody extraction

extract(input_data, input_sr=44100, out_step=None)[source]

Extract the melody from a given file.

Parameters:

input_data – path to audio file or numpy array like audio signal
input_sr – sampling rate of the input array of data (if any). This variable is only relevant if the input is an array of data instead of a filepath.
out_step – particular time-step duration if needed at output

Returns:

a 2-D list with time-stamps and pitch values per timestamp.

static normalise_pitch(pitch, tonic, bins_per_octave=120, max_value=4)[source]

Normalize pitch given a tonic.

Parameters:

pitch – a 2-D list with time-stamps and pitch values per timestamp.
tonic – recording tonic to normalize the pitch to.
bins_per_octave – number of frequency bins per octave.
max_value – maximum value to clip the normalized pitch to.

Returns:

a 2-D list with time-stamps and normalized to a given tonic pitch values per timestamp.

static save_pitch(data, output_path)[source]

Calling the write_csv function in compiam.io to write the output pitch curve in a fle

Parameters:

data – the data to write
output_path – the path where the data is going to be stored

Returns:

None

FTANet-Carnatic

Note

REQUIRES: tensorflow

class compiam.melody.pitch_extraction.FTANetCarnatic(model_path=None, download_link=None, download_checksum=None, sample_rate=8000, gpu='-1')[source]

FTA-Net melody extraction tuned to Carnatic Music.

static FTA_Module(x, shape, kt, kf)[source]

Selection and fusion module. Implementation taken from https://github.com/yushuai/FTANet-melodic

Parameters:

x – input tensor.
shape – the shape of the input tensor.
kt – kernel size for time attention.
kf – kernel size for frequency attention.

Returns:

the resized input, the time-attention map, and the frequency-attention map.

static SF_Module(x_list, n_channel, reduction, limitation)[source]

Selection and fusion module. Implementation taken from https://github.com/yushuai/FTANet-melodic

Parameters:

x_list – list of tensor inputs.
n_channel – number of feature channels.
reduction – the rate to which the data is compressed.
limitation – setting a compressing limit.

Returns:

a tensor with the fused and selected feature map.

download_model(model_path=None, force_overwrite=False)[source]: Download pre-trained model.

static normalise_pitch(pitch, tonic, bins_per_octave=120, max_value=4)[source]

Normalise pitch given a tonic.

Parameters:

pitch – a 2-D list with time-stamps and pitch values per timestamp.
tonic – recording tonic to normalize the pitch to.
bins_per_octave – number of frequency bins per octave.
max_value – maximum value to clip the normalized pitch to.

Returns:

a 2-D list with time-stamps and normalised to a given tonic pitch values per timestamp.

predict(input_data, input_sr=44100, hop_size=80, batch_size=5, out_step=None, gpu='-1')[source]

Extract melody from input_data. Implementation taken (and slightly adapted) from https://github.com/yushuai/FTANet-melodic.

Parameters:

input_data – path to audio file or numpy array like audio signal.
input_sr – sampling rate of the input array of data (if any). This variable is only relevant if the input is an array of data instead of a filepath.
hop_size – hop size between frequency estimations.
batch_size – batches of seconds that are passed through the model (defaulted to 5, increase if enough computational power, reduce if needed).
out_step – particular time-step duration if needed at output
gpu – Id of the available GPU to use (-1 by default, to run on CPU), use string: ‘0’, ‘1’, etc.

Returns:

a 2-D list with time-stamps and pitch values per timestamp.

static save_pitch(data, output_path)[source]

Calling the write_csv function in compiam.io to write the output pitch curve in a fle

Parameters:

data – the data to write
output_path – the path where the data is going to be stored

Returns:

None

FTAResNet-Carnatic

Note

REQUIRES: torch

class compiam.melody.pitch_extraction.FTAResNetCarnatic(model_path=None, download_link=None, download_checksum=None, sample_rate=44100, gpu='-1')[source]

FTA-ResNet melody extraction tuned to Carnatic Music.

download_model(model_path=None, force_overwrite=False)[source]: Download pre-trained model.

load_model(model_path)[source]: Load pre-trained model weights.

static normalise_pitch(pitch, tonic, bins_per_octave=120, max_value=4)[source]

Normalise pitch given a tonic.

Parameters:

pitch – a 2-D list with time-stamps and pitch values per timestamp.
tonic – recording tonic to normalize the pitch to.
bins_per_octave – number of frequency bins per octave.
max_value – maximum value to clip the normalized pitch to.

Returns:

a 2-D list with time-stamps and normalised to a given tonic pitch values per timestamp.

predict(input_data, input_sr=44100, hop_size=441, time_frame=128, out_step=None, gpu='-1')[source]

Extract melody from input_data. Implementation taken (and slightly adapted) from https://github.com/yushuai/FTANet-melodic.

Parameters:

input_data – path to audio file or numpy array like audio signal.
input_sr – sampling rate of the input array of data (if any). This variable is only relevant if the input is an array of data instead of a filepath.
hop_size – hop size between frequency estimations.
batch_size – batches of seconds that are passed through the model (defaulted to 5, increase if enough computational power, reduce if needed).
out_step – particular time-step duration if needed at output
gpu – Id of the available GPU to use (-1 by default, to run on CPU), use string: ‘0’, ‘1’, etc.

Returns:

a 2-D list with time-stamps and pitch values per timestamp.

static save_pitch(data, output_path)[source]

Calling the write_csv function in compiam.io to write the output pitch curve in a file

Parameters:

data – the data to write
output_path – the path where the data is going to be stored

Returns:

None

select_gpu(gpu='-1')[source]

Select the GPU to use for inference.

Parameters:: gpu – Id of the available GPU to use (-1 by default, to run on CPU), use string: ‘0’, ‘1’, etc.
Returns:: None

Melodic Pattern Discovery

CAE-Carnatic (Wrapper)

Note

REQUIRES: torch

class compiam.melody.pattern.sancara_search.CAEWrapper(model_path, conf_path, spec_path, download_link, download_checksum, device='cpu')[source]

Wrapper for the Complex Autoencoder found at https://github.com/SonyCSLParis/cae-invar#quick-start specifically for the task of embedding audio to learnt CAE features. This wrapper is used for inference and it is not trainable. Please initialize it using compiam.load_model()

download_model(model_path=None, force_overwrite=False)[source]: Download pre-trained model.

extract_features(file_path, sr=None)[source]

Extract CAE features using self.model on audio at <file_path>

Parameters:

file_path (str) – path to audio
sr (int) – sampling rate of audio at <file_path>, if None, use self.sr

Returns:

amplitude vector, phases vector

Return type:

np.ndarray, np.ndarray

get_cqt(file_path, sr=None)[source]

Extract CQT representation from audio at <file_path> according to parameters specified in conf at self.conf_path

Parameters:

file_path (str) – path to audio
sr (int) – sampling rate of audio at <file_path>, if None, use self.sr

Returns:

cqt representation

Return type:

np.ndarray

load_conf(path, spec)[source]

Load .ini conf at <path>

Parameters:

path (str) – path to .ini configuration file
spec (str) – path to .cfg configuration spec file

Returns:

dict of parameters

Return type:

dict

load_model(model_path, conf_path, spec_path)[source]

Load model at <model_path>. Expects model parameters to correspond to those found in self.params (loaded from self.conf_path).

Parameters:: model_path (str) – path to model

to_amp_phase(cqt)[source]

Extract amplitude and phase vector from model on <cqt> representation

Parameters:: cqt (np.ndarray) – CQT representation of audio
Returns:: amplitude vector, phases vector
Return type:: np.ndarray, np.ndarray

validate_conf(conf)[source]

Ensure all relevant parameters for feature extraction are present in <conf>

Parameters:: path (dict) – dict of parameters
Returns:: True/False, are relevant parameters present
Return type:: bool

Self-similarity matrix

Note

REQUIRES: torch

compiam.melody.pattern.sancara_search.extraction.self_sim.convert_mask(arr, mask, timestep, hop_length, sr)[source]

Get mask of excluded regions in the same dimension as array, <arr>

Parameters:

arr (np.ndarray) – array corresponding to features extracted from audio
mask (np.ndarray) – Mask indicating whether element should be excluded (different dimensions to <arr>)
timestep (float) – time in seconds between each element in <mask>
hop_length (int) – how many frames of audio correspond to each element in <arr>
sr (int) – sampling rate of audio from which <arr> was computed

Returns:

array of mask values equal in length to one dimension of <arr> - 0/1 is masked?

Return type:

np.ndarray

compiam.melody.pattern.sancara_search.extraction.self_sim.create_ss_matrix(feats, mode='cosine')[source]

Compute self similarity matrix between features in <feats> using distance measure, <mode>

Parameters:

feats (np.ndarray) – array of features
mode (str) – name of distance measure (recognised by scipy.spatial.distance)

Returns:

self similarity matrix

Return type:

np.ndarray

compiam.melody.pattern.sancara_search.extraction.self_sim.get_conversion_mappings(mask)[source]

Before reducing an array to only include elements that do not correspond to <mask>. We want to record the relationship between the new (sparse) array index and the old (orig) array.

Parameters:

mask – mask of 0/1 - is element to be excluded
type – np.ndarray

Returns:

orig_sparse_lookup - dict of {index in orig array: index of same element in sparse array} sparse_orig_lookup - dict of {index in sparse array: index of same element in orig array} boundaries_orig - list of boundaries between wanted and unwanted regions in orig array boundaries_sparse - list of boundaries between formally separated wanted regions in sparse array

Return type:

(dict, dict, list, list)

compiam.melody.pattern.sancara_search.extraction.self_sim.get_param_hash_filepath(out_dir, *params)[source]

Build filepath by creating string of input <params> in <out_dir>

Params out_dir:: directory path
Params params:: arguments, any type
Returns:: filepath unique to input params in <out_dir>
Return type:: str

compiam.melody.pattern.sancara_search.extraction.self_sim.get_report_paths(out_dir)[source]

Get dictionary of fielpaths relevant to progress plots in extract_segments()

Params out_dir:: directory path to save plots in
Returns:: dict of filepaths
Return type:: dict

compiam.melody.pattern.sancara_search.extraction.self_sim.normalise_self_sim(matrix)[source]

Normalise self similarity matrix:: invert and convolve

Parameters:: matrix (np.ndarray) – self similarity matrix
Returns:: matrix normalized, same dimensions
Return type:: np.ndarray

compiam.melody.pattern.sancara_search.extraction.self_sim.save_matrix(X, filepath)[source]

if <filepath>, save <X> at <filepath>

Parameters:

X (np.ndarray) – matrix to save
filepath (str or None) – filepath

class compiam.melody.pattern.sancara_search.extraction.self_sim.segmentExtractor(X, window_size, sr=44100, cache_dir=None)[source]

Manipulate and extract segments from self similarity matrix

emphasize_diagonals(bin_thresh=0.025, gauss_sigma=None, cont_thresh=None, etc_kernel_size=10, binop_dim=3, image_report=False, verbose=False)[source]

From self similarity matrix, self.X. Emphasize diagonals using a series of image processing steps.

Parameters:

bin_thresh (float) – Threshold for binarization of self similarity array. Values below this threshold are set to 0 (not significant), those above or equal too are set to 1. Very important parameter
gauss_sigma (float or None) – If not None, sigma for diagonal gaussian blur to apply to matrix
cont_thresh (float or None) – Only applicable if <gauss_sigma>. This binary threshold isreapplied after gaussian blur to ensure matrix of 0 and 1. if None, equal to <bin_thresh>
etc_kernel_size (int) – Kernel size for morphological closing
binop_dim (int) – square dimension of binary opening structure (square matrix of zeros with 1 across the diagonal)
image_report (None) – str corresponding to folder to save progress images in.
verbose (bool) – Display progress

Returns:

list of segments in the form [((x0,y0),(x1,y1)),..]

Return type:

list

extract_segments(etc_kernel_size=10, binop_dim=3, perc_tail=0.5, bin_thresh_segment=None, min_diff_trav=0.5, min_pattern_length_seconds=2, boundaries=None, lookup=None, break_mask=None, timestep=None, verbose=False)[source]

From self similarity matrix, <self.X_proc>. Return list of segments, each corresponding to two regions of the input axis.

Parameters:

etc_kernel_size (int) – Kernel size for morphological closing
binop_dim (int) – square dimension of binary opening structure (square matrix of zeros with 1 across the diagonal)
perc_tail (int) – Percentage either size of a segment along its trajectory considered for lower threshold for significance
bin_thresh_segment (float) – Reduced <bin_thresh> threshold for areas neighbouring identified segments. If None, use 0.5*<bin_thresh>
min_diff_trav (float) – Min time difference in seconds between two segments for them to be joined to one.
min_pattern_length_seconds (float) – Minimum length of any returned pattern in seconds
boundaries (list or None) – list of boundaries in <X> corresponding to breaks due to sparsity
lookup (dict) – Lookup of sparse index (in X): non-sparse index
break_mask (array) – any segment that traverses a non-zero element in <break_mask> is broken into two according to this non-zero value
timestep (float or None) – Time in seconds between each element in <break_mask>
verbose (bool) – Display progress

Returns:

list of segments in the form [((x0,y0),(x1,y1)),..]

Return type:

list

compiam.melody.pattern.sancara_search.extraction.self_sim.self_similarity(features, exclusion_mask=None, timestep=None, hop_length=None, sr=44100)[source]

Compute self similarity matrix between features in <features>. If an <exclusion_mask> is passed. Regions corresponding to that mask will be excluded from the computation and the returned matrix will correspond only to those regions marked as 0 in the mask.

Parameters:

features (np.ndarray) – array of features extracted from audio
exclusion_mask (np.ndarray or None) – array of 0 and 1, should be masked or not? [Optional]
timestep (float or None) – time in seconds between elements of <exclusion_mask> Only required if <exclusion_mask> is passed
hop_length (int or None) – number of audio frames corresponding to one element in <features> Only required if <exclusion_mask> is passed
sr (int or None) – sampling rate of audio corresponding to <features> Only required if <exclusion_mask> is passed

Returns:

if exclusion mask is passed return…: matrix - self similarity matrix orig_sparse_lookup - dict of {index in orig array: index of same element in sparse array} sparse_orig_lookup - dict of {index in sparse array: index of same element in orig array} boundaries_orig - list of boundaries between wanted and unwanted regions in orig array boundaries_sparse - list of boundaries between formally separated wanted regions in sparse array
else return: matrix - self similarity matrix

Return type:

(np.ndarray, dict, dict, list, list) or np.ndarray

compiam.melody.pattern.sancara_search.extraction.self_sim.sparse_to_original(all_segments, boundaries_sparse, lookup)[source]

Convert indices corresponding to segments in <all_segments> to their non-sparse form using mapping in <lookup>

Parameters:

all_segments (list) – list of segments, [(x0,y0),(x1,y1),…]
boundaries_sparse (list) – list indices in sparse array corresponding to splits in original array
lookup (dict) – dict of sparse_index:non-sparse index

Returns:

<all_segments> with indices replaced according to lookup

Return type:

list

Raga Recognition

DEEPSRGM

Note

REQUIRES: torch

class compiam.melody.raga_recognition.deepsrgm.DEEPSRGM(model_path=None, download_link=None, download_checksum=None, rnn='lstm', mapping_path=None, sample_rate=44100, device=None)[source]

DEEPSRGM model for raga classification. This DEEPSGRM implementation has been kindly provided by Shubham Lohiya and Swarada Bharadwaj.

download_model(model_path=None, force_overwrite=False)[source]: Download pre-trained model.

get_features(input_data=None, input_sr=44100, pitch_path=None, tonic_path=None, from_mirdata=False, track_id=None, k=5)[source]

Computing features for prediction of DEEPSRM

Parameters:

input_data – path to audio file or numpy array like audio signal.
input_sr – sampling rate of the input array of data (if any). This variable is only relevant if the input is an array of data instead of a filepath.
pitch_path – path to pre-computed pitch file (if available).
tonic_path – path to pre-computed tonic file (if available).
from_mirdata – boolean to indicate if the features are parsed from the mirdata loader of Indian Art Music Raga Recognition Dataset (must be specifically this one).
track_id – track id for the Indian Art Music Raga Recognition Dataset if from_mirdata is set to True.
k – k indicating the precision of the pitch feature.

load_mapping(selection=None)[source]

Loading raga mapping for DEEPSRGM

Parameters:: selection – Selection of ragas for the DEEPSRGM model. A default selection is initialized by default in compiam v1.0. Flexible selection and training of this model is under development at this moment and will be available in the next release.

load_model(model_path, rnn='lstm')[source]

Loading weights for DEEPSRGM

Parameters:

model_path – path to model.
rnn – lstm (default) or gru.

load_raga_dataset(data_home=None, download=False)[source]

Load an instance of the Compmusic raga dataset to assist the tool

Parameters:

data_home – path where to store the dataset data
download

predict(features, threshold=0.6, gpu='-1')[source]

Predict raga for recording

Parameters:

features – all subsequences for a certain music recording
threshold – majority voting threshold
gpu – Id of the available GPU to use (-1 by default, to run on CPU)

Returns:

recognition result