Melodic analysis
Tonic Identification
Tonic Indian Art Music (Multipitch approach)
Note
REQUIRES: essentia
- class compiam.melody.tonic_identification.tonic_multipitch.TonicIndianMultiPitch(bin_resolution=10, frame_size=2048, harmonic_weight=0.8, hop_size=128, magnitude_compression=1, magnitude_threshold=40, max_tonic_frequency=375, min_tonic_frequency=100, num_harmonics=20, ref_frequency=55, sample_rate=44100)[source]
MultiPitch approach to extract the tonic from IAM music signals.
- extract(input_data, input_sr=44100)[source]
Extract the tonic from a given file.
- Parameters:
input_data – path to audio file or numpy array like audio signal
input_sr – sampling rate of the input array of data (if any). This variable is only relevant if the input is an array of data instead of a filepath.
- Returns:
a floating point number representing the tonic of the input recording.
Pitch Extraction
Melodia
Note
REQUIRES: essentia
- class compiam.melody.pitch_extraction.melodia.Melodia(bin_resolution=10, filter_iterations=3, frame_size=2048, guess_unvoiced=False, harmonic_weight=0.8, hop_size=128, magnitude_compression=1, magnitude_threshold=40, max_frequency=20000, min_duration=100, min_frequency=80, num_harmonics=20, peak_distribution_threshold=0.9, peak_frame_threshold=0.9, pitch_continuity=27.5625, reference_frequency=55, sample_rate=44100, time_continuity=100, voice_vibrato=False, voicing_tolerance=0.2)[source]
Melodia predominant melody extraction
- extract(input_data, input_sr=44100, out_step=None)[source]
Extract the melody from a given file.
- Parameters:
input_data – path to audio file or numpy array like audio signal
input_sr – sampling rate of the input array of data (if any). This variable is only relevant if the input is an array of data instead of a filepath.
out_step – particular time-step duration if needed at output
- Returns:
a 2-D list with time-stamps and pitch values per timestamp.
- static normalise_pitch(pitch, tonic, bins_per_octave=120, max_value=4)[source]
Normalize pitch given a tonic.
- Parameters:
pitch – a 2-D list with time-stamps and pitch values per timestamp.
tonic – recording tonic to normalize the pitch to.
bins_per_octave – number of frequency bins per octave.
max_value – maximum value to clip the normalized pitch to.
- Returns:
a 2-D list with time-stamps and normalized to a given tonic pitch values per timestamp.
FTANet-Carnatic
Note
REQUIRES: tensorflow
- class compiam.melody.pitch_extraction.FTANetCarnatic(model_path=None, download_link=None, download_checksum=None, sample_rate=8000, gpu='-1')[source]
FTA-Net melody extraction tuned to Carnatic Music.
- static FTA_Module(x, shape, kt, kf)[source]
Selection and fusion module. Implementation taken from https://github.com/yushuai/FTANet-melodic
- Parameters:
x – input tensor.
shape – the shape of the input tensor.
kt – kernel size for time attention.
kf – kernel size for frequency attention.
- Returns:
the resized input, the time-attention map, and the frequency-attention map.
- static SF_Module(x_list, n_channel, reduction, limitation)[source]
Selection and fusion module. Implementation taken from https://github.com/yushuai/FTANet-melodic
- Parameters:
x_list – list of tensor inputs.
n_channel – number of feature channels.
reduction – the rate to which the data is compressed.
limitation – setting a compressing limit.
- Returns:
a tensor with the fused and selected feature map.
- static normalise_pitch(pitch, tonic, bins_per_octave=120, max_value=4)[source]
Normalise pitch given a tonic.
- Parameters:
pitch – a 2-D list with time-stamps and pitch values per timestamp.
tonic – recording tonic to normalize the pitch to.
bins_per_octave – number of frequency bins per octave.
max_value – maximum value to clip the normalized pitch to.
- Returns:
a 2-D list with time-stamps and normalised to a given tonic pitch values per timestamp.
- predict(input_data, input_sr=44100, hop_size=80, batch_size=5, out_step=None, gpu='-1')[source]
Extract melody from input_data. Implementation taken (and slightly adapted) from https://github.com/yushuai/FTANet-melodic.
- Parameters:
input_data – path to audio file or numpy array like audio signal.
input_sr – sampling rate of the input array of data (if any). This variable is only relevant if the input is an array of data instead of a filepath.
hop_size – hop size between frequency estimations.
batch_size – batches of seconds that are passed through the model (defaulted to 5, increase if enough computational power, reduce if needed).
out_step – particular time-step duration if needed at output
gpu – Id of the available GPU to use (-1 by default, to run on CPU), use string: ‘0’, ‘1’, etc.
- Returns:
a 2-D list with time-stamps and pitch values per timestamp.
FTAResNet-Carnatic
Note
REQUIRES: torch
- class compiam.melody.pitch_extraction.FTAResNetCarnatic(model_path=None, download_link=None, download_checksum=None, sample_rate=44100, gpu='-1')[source]
FTA-ResNet melody extraction tuned to Carnatic Music.
- static normalise_pitch(pitch, tonic, bins_per_octave=120, max_value=4)[source]
Normalise pitch given a tonic.
- Parameters:
pitch – a 2-D list with time-stamps and pitch values per timestamp.
tonic – recording tonic to normalize the pitch to.
bins_per_octave – number of frequency bins per octave.
max_value – maximum value to clip the normalized pitch to.
- Returns:
a 2-D list with time-stamps and normalised to a given tonic pitch values per timestamp.
- predict(input_data, input_sr=44100, hop_size=441, time_frame=128, out_step=None, gpu='-1')[source]
Extract melody from input_data. Implementation taken (and slightly adapted) from https://github.com/yushuai/FTANet-melodic.
- Parameters:
input_data – path to audio file or numpy array like audio signal.
input_sr – sampling rate of the input array of data (if any). This variable is only relevant if the input is an array of data instead of a filepath.
hop_size – hop size between frequency estimations.
batch_size – batches of seconds that are passed through the model (defaulted to 5, increase if enough computational power, reduce if needed).
out_step – particular time-step duration if needed at output
gpu – Id of the available GPU to use (-1 by default, to run on CPU), use string: ‘0’, ‘1’, etc.
- Returns:
a 2-D list with time-stamps and pitch values per timestamp.
Melodic Pattern Discovery
CAE-Carnatic (Wrapper)
Note
REQUIRES: torch
- class compiam.melody.pattern.sancara_search.CAEWrapper(model_path, conf_path, spec_path, download_link, download_checksum, device='cpu')[source]
Wrapper for the Complex Autoencoder found at https://github.com/SonyCSLParis/cae-invar#quick-start specifically for the task of embedding audio to learnt CAE features. This wrapper is used for inference and it is not trainable. Please initialize it using compiam.load_model()
- extract_features(file_path, sr=None)[source]
Extract CAE features using self.model on audio at <file_path>
- Parameters:
file_path (str) – path to audio
sr (int) – sampling rate of audio at <file_path>, if None, use self.sr
- Returns:
amplitude vector, phases vector
- Return type:
np.ndarray, np.ndarray
- get_cqt(file_path, sr=None)[source]
Extract CQT representation from audio at <file_path> according to parameters specified in conf at self.conf_path
- Parameters:
file_path (str) – path to audio
sr (int) – sampling rate of audio at <file_path>, if None, use self.sr
- Returns:
cqt representation
- Return type:
np.ndarray
- load_conf(path, spec)[source]
Load .ini conf at <path>
- Parameters:
path (str) – path to .ini configuration file
spec (str) – path to .cfg configuration spec file
- Returns:
dict of parameters
- Return type:
dict
- load_model(model_path, conf_path, spec_path)[source]
Load model at <model_path>. Expects model parameters to correspond to those found in self.params (loaded from self.conf_path).
- Parameters:
model_path (str) – path to model
Self-similarity matrix
Note
REQUIRES: torch
- compiam.melody.pattern.sancara_search.extraction.self_sim.convert_mask(arr, mask, timestep, hop_length, sr)[source]
Get mask of excluded regions in the same dimension as array, <arr>
- Parameters:
arr (np.ndarray) – array corresponding to features extracted from audio
mask (np.ndarray) – Mask indicating whether element should be excluded (different dimensions to <arr>)
timestep (float) – time in seconds between each element in <mask>
hop_length (int) – how many frames of audio correspond to each element in <arr>
sr (int) – sampling rate of audio from which <arr> was computed
- Returns:
array of mask values equal in length to one dimension of <arr> - 0/1 is masked?
- Return type:
np.ndarray
- compiam.melody.pattern.sancara_search.extraction.self_sim.create_ss_matrix(feats, mode='cosine')[source]
Compute self similarity matrix between features in <feats> using distance measure, <mode>
- Parameters:
feats (np.ndarray) – array of features
mode (str) – name of distance measure (recognised by scipy.spatial.distance)
- Returns:
self similarity matrix
- Return type:
np.ndarray
- compiam.melody.pattern.sancara_search.extraction.self_sim.get_conversion_mappings(mask)[source]
Before reducing an array to only include elements that do not correspond to <mask>. We want to record the relationship between the new (sparse) array index and the old (orig) array.
- Parameters:
mask – mask of 0/1 - is element to be excluded
type – np.ndarray
- Returns:
orig_sparse_lookup - dict of {index in orig array: index of same element in sparse array} sparse_orig_lookup - dict of {index in sparse array: index of same element in orig array} boundaries_orig - list of boundaries between wanted and unwanted regions in orig array boundaries_sparse - list of boundaries between formally separated wanted regions in sparse array
- Return type:
(dict, dict, list, list)
- compiam.melody.pattern.sancara_search.extraction.self_sim.get_param_hash_filepath(out_dir, *params)[source]
Build filepath by creating string of input <params> in <out_dir>
- Params out_dir:
directory path
- Params params:
arguments, any type
- Returns:
filepath unique to input params in <out_dir>
- Return type:
str
- compiam.melody.pattern.sancara_search.extraction.self_sim.get_report_paths(out_dir)[source]
Get dictionary of fielpaths relevant to progress plots in extract_segments()
- Params out_dir:
directory path to save plots in
- Returns:
dict of filepaths
- Return type:
dict
- compiam.melody.pattern.sancara_search.extraction.self_sim.normalise_self_sim(matrix)[source]
- Normalise self similarity matrix:
invert and convolve
- Parameters:
matrix (np.ndarray) – self similarity matrix
- Returns:
matrix normalized, same dimensions
- Return type:
np.ndarray
- compiam.melody.pattern.sancara_search.extraction.self_sim.save_matrix(X, filepath)[source]
if <filepath>, save <X> at <filepath>
- Parameters:
X (np.ndarray) – matrix to save
filepath (str or None) – filepath
- class compiam.melody.pattern.sancara_search.extraction.self_sim.segmentExtractor(X, window_size, sr=44100, cache_dir=None)[source]
Manipulate and extract segments from self similarity matrix
- emphasize_diagonals(bin_thresh=0.025, gauss_sigma=None, cont_thresh=None, etc_kernel_size=10, binop_dim=3, image_report=False, verbose=False)[source]
From self similarity matrix, self.X. Emphasize diagonals using a series of image processing steps.
- Parameters:
bin_thresh (float) – Threshold for binarization of self similarity array. Values below this threshold are set to 0 (not significant), those above or equal too are set to 1. Very important parameter
gauss_sigma (float or None) – If not None, sigma for diagonal gaussian blur to apply to matrix
cont_thresh (float or None) – Only applicable if <gauss_sigma>. This binary threshold isreapplied after gaussian blur to ensure matrix of 0 and 1. if None, equal to <bin_thresh>
etc_kernel_size (int) – Kernel size for morphological closing
binop_dim (int) – square dimension of binary opening structure (square matrix of zeros with 1 across the diagonal)
image_report (None) – str corresponding to folder to save progress images in.
verbose (bool) – Display progress
- Returns:
list of segments in the form [((x0,y0),(x1,y1)),..]
- Return type:
list
- extract_segments(etc_kernel_size=10, binop_dim=3, perc_tail=0.5, bin_thresh_segment=None, min_diff_trav=0.5, min_pattern_length_seconds=2, boundaries=None, lookup=None, break_mask=None, timestep=None, verbose=False)[source]
From self similarity matrix, <self.X_proc>. Return list of segments, each corresponding to two regions of the input axis.
- Parameters:
etc_kernel_size (int) – Kernel size for morphological closing
binop_dim (int) – square dimension of binary opening structure (square matrix of zeros with 1 across the diagonal)
perc_tail (int) – Percentage either size of a segment along its trajectory considered for lower threshold for significance
bin_thresh_segment (float) – Reduced <bin_thresh> threshold for areas neighbouring identified segments. If None, use 0.5*<bin_thresh>
min_diff_trav (float) – Min time difference in seconds between two segments for them to be joined to one.
min_pattern_length_seconds (float) – Minimum length of any returned pattern in seconds
boundaries (list or None) – list of boundaries in <X> corresponding to breaks due to sparsity
lookup (dict) – Lookup of sparse index (in X): non-sparse index
break_mask (array) – any segment that traverses a non-zero element in <break_mask> is broken into two according to this non-zero value
timestep (float or None) – Time in seconds between each element in <break_mask>
verbose (bool) – Display progress
- Returns:
list of segments in the form [((x0,y0),(x1,y1)),..]
- Return type:
list
- compiam.melody.pattern.sancara_search.extraction.self_sim.self_similarity(features, exclusion_mask=None, timestep=None, hop_length=None, sr=44100)[source]
Compute self similarity matrix between features in <features>. If an <exclusion_mask> is passed. Regions corresponding to that mask will be excluded from the computation and the returned matrix will correspond only to those regions marked as 0 in the mask.
- Parameters:
features (np.ndarray) – array of features extracted from audio
exclusion_mask (np.ndarray or None) – array of 0 and 1, should be masked or not? [Optional]
timestep (float or None) – time in seconds between elements of <exclusion_mask> Only required if <exclusion_mask> is passed
hop_length (int or None) – number of audio frames corresponding to one element in <features> Only required if <exclusion_mask> is passed
sr (int or None) – sampling rate of audio corresponding to <features> Only required if <exclusion_mask> is passed
- Returns:
- if exclusion mask is passed return…
matrix - self similarity matrix orig_sparse_lookup - dict of {index in orig array: index of same element in sparse array} sparse_orig_lookup - dict of {index in sparse array: index of same element in orig array} boundaries_orig - list of boundaries between wanted and unwanted regions in orig array boundaries_sparse - list of boundaries between formally separated wanted regions in sparse array
- else return
matrix - self similarity matrix
- Return type:
(np.ndarray, dict, dict, list, list) or np.ndarray
- compiam.melody.pattern.sancara_search.extraction.self_sim.sparse_to_original(all_segments, boundaries_sparse, lookup)[source]
Convert indices corresponding to segments in <all_segments> to their non-sparse form using mapping in <lookup>
- Parameters:
all_segments (list) – list of segments, [(x0,y0),(x1,y1),…]
boundaries_sparse (list) – list indices in sparse array corresponding to splits in original array
lookup (dict) – dict of sparse_index:non-sparse index
- Returns:
<all_segments> with indices replaced according to lookup
- Return type:
list
Raga Recognition
DEEPSRGM
Note
REQUIRES: torch
- class compiam.melody.raga_recognition.deepsrgm.DEEPSRGM(model_path=None, download_link=None, download_checksum=None, rnn='lstm', mapping_path=None, sample_rate=44100, device=None)[source]
DEEPSRGM model for raga classification. This DEEPSGRM implementation has been kindly provided by Shubham Lohiya and Swarada Bharadwaj.
- get_features(input_data=None, input_sr=44100, pitch_path=None, tonic_path=None, from_mirdata=False, track_id=None, k=5)[source]
Computing features for prediction of DEEPSRM
- Parameters:
input_data – path to audio file or numpy array like audio signal.
input_sr – sampling rate of the input array of data (if any). This variable is only relevant if the input is an array of data instead of a filepath.
pitch_path – path to pre-computed pitch file (if available).
tonic_path – path to pre-computed tonic file (if available).
from_mirdata – boolean to indicate if the features are parsed from the mirdata loader of Indian Art Music Raga Recognition Dataset (must be specifically this one).
track_id – track id for the Indian Art Music Raga Recognition Dataset if from_mirdata is set to True.
k – k indicating the precision of the pitch feature.
- load_mapping(selection=None)[source]
Loading raga mapping for DEEPSRGM
- Parameters:
selection – Selection of ragas for the DEEPSRGM model. A default selection is initialized by default in compiam v1.0. Flexible selection and training of this model is under development at this moment and will be available in the next release.
- load_model(model_path, rnn='lstm')[source]
Loading weights for DEEPSRGM
- Parameters:
model_path – path to model.
rnn – lstm (default) or gru.