Util functions

Data augmentation utils

compiam.utils.augment.attack_remix(in_path, out_dir, gain_factors=0.3, winDur=46.4, hopDur=5, sr=16000, n_jobs=4)[source]

Modifying the relative levels of attack and decay regions of an audio

Parameters:

in_path (str) – Path to input audio
out_dir (str) – Directory to output pitch shifted audios
gain_factors (float or list) – gain factor (or list of) to scale attack portion with
winDur (float) – Window size in milliseconds
hopDur (float) – Hop size in milliseconds
sr (float) – Sampling rate of input audio
n_jobs (int) – n jobs for parrelelization

compiam.utils.augment.spectral_shape(in_path, out_dir, gain_factors=(0.6, 2, 0.2), winDur=46.4, hopDur=0.005, sr=16000, n_jobs=4)[source]

Augmenting data by perturbin ‘nuisance attributes’ that are unimportant in the specific discrimination task.

Parameters:

in_path (str) – Path to input audio
out_dir (str) – Directory to output pitch shifted audios
gain_factors (float or list) – 3-tuple (or list of) with gain factors for remixing. Tuple entries correspond to each of bass, treble, & damped components.
winDur (float) – Window size in milliseconds
hopDur (float) – Hop size in milliseconds
sr (float) – Sampling rate of input audio
n_jobs (int) – n jobs for parrelelization

compiam.utils.augment.stroke_remix(in_path, out_dir, gain_factors=(0.6, 2, 0.2), templates='/home/runner/work/compIAM/compIAM/compiam/utils/augment/augmentation/templates.npy', winDur=46.4, hopDur=0.005, sr=16000, n_jobs=4)[source]

Simulate the expected variations of relative strengths of drums in a mix: using non-negative matrix factorization (NMF).

Parameters:

in_path (str) – Path to input audio
out_dir (str) – Directory to output pitch shifted audios
gain_factors (float or list) – 3-tuple (or list of) with gain factors for remixing. Tuple entries correspond to each of bass, treble, & damped components.
templates (str) – path to saved nmf templates
winDur (float) – Window size in milliseconds
hopDur (float) – Hop size in milliseconds
sr (float) – Sampling rate of input audio
n_jobs (int) – n jobs for parrelelization

Pitch-related utils

compiam.utils.pitch.cents_to_pitch(c, tonic)[source]

Convert cents value, <c> to pitch in Hz

Parameters:

c (float/int) – Pitch value in cents above <tonic>
tonic (float) – Tonic value in Hz

Returns:

Pitch value, <c> in Hz

Return type:

float

compiam.utils.pitch.extract_stability_mask(pitch, min_stab_sec, hop_sec, var, timestep)[source]

Extract boolean array corresponding to <pitch> - yes/no does point correspond to a region of “stable” pitch.

A window is passed along the pitch track, <pitch> and the minimum and maximum values compared to the average for that window. Regions corresponding to windows whose extremes deviate significantly from their means are marked as stable. Consecutive stable regions summing to more than <min_stab_sec> seconds in length are annotated with 1 indicating stable. Regions which are not stable or that are stable but do not make up at least <min_stab_sec> in length are annotated with 0 - not stable.

Parameters:

pitch (np.ndarray) – Pitch values in Hz or cents
min_stab_sec (float) – Stable regions of at least <min_stab_sec> seconds in length are annotated as stable. Shorter regions are not annotated.
hop_sec (float) – Hop length in seconds of window
var (float) – If the maximum/minimum pitch in a window deviates from its mean by more than this value, the window is considered unstable. Important to consider if the input is in cents or hertz!
timestep (float) – Time difference, in seconds, between each element in pitch

Returns:

Boolean array equal in length to <pitch>: is stable region or not?

Return type:

np.ndarray

compiam.utils.pitch.interpolate_below_length(arr, val, gap)[source]

Interpolate gaps of value, <val> of length equal to or shorter than <gap> in <arr>

Parameters:

arr (np.array) – Array to interpolate
val (number) – Value expected in gaps to interpolate
gap (number) – Maximum gap length to interpolate, gaps of <val> longer than <g> will not be interpolated

Returns:

interpolated array

Return type:

np.array

compiam.utils.pitch.is_stable(seq, max_var)[source]

Compute is sequence of value has stability given an input tolerance

Parameters:

seq – sequence of values to study
max_var – Maximum tolerance to consider stable/not stable

Returns:

1 (stable) or 0 (not stable)

compiam.utils.pitch.normalisation(pitch, tonic, bins_per_octave=120, max_value=4)[source]

Normalize pitch given a tonic.

Parameters:

pitch – a 2-D list with time-stamps and pitch values per timestamp.
tonic – recording tonic to normalize the pitch to.
bins_per_octave – number of frequency bins per octave.
max_value – maximum value to clip the normalized pitch to.

Returns:

a 2-D list with time-stamps and normalized to a given tonic pitch values per timestamp.

compiam.utils.pitch.pitch_seq_to_cents(pseq, tonic)[source]

Convert sequence of pitch values to sequence of cents above <tonic> values

Parameters:

pseq (np.array) – Array of pitch values in Hz
tonic (float) – Tonic value in Hz

Returns:

Sequence of original pitch value in cents above <tonic>

Return type:

np.array

compiam.utils.pitch.pitch_to_cents(p, tonic)[source]

Convert pitch value, <p> to cents above <tonic>.

Parameters:

p (float) – Pitch value in Hz
tonic (float) – Tonic value in Hz

Returns:

Pitch value, <p> in cents above <tonic>

Return type:

float

compiam.utils.pitch.resample_melody_series(times, frequencies, voicing, times_new, kind='linear')[source]

NOTE: This function is DIRECTLY PORTED FROM mir_eval.melody

Resamples frequency and voicing time series to a new timescale. Maintains any zero (“unvoiced”) values in frequencies.

If times and times_new are equivalent, no resampling will be performed.

Parameters:

times – Times of each frequency value
frequencies – Array of frequency values, >= 0
voicing – Array which indicates voiced or unvoiced. This array may be binary or have continuous values between 0 and 1.
times_new – Times to resample frequency and voicing sequences to

:param kind:kind parameter to pass to scipy.interpolate.interp1d.: (Default value = ‘linear’)

Returns frequencies_resampled:: Frequency array resampled to new timebase
Returns voicing_resampled:: Voicing array resampled to new timebase

compiam.utils.pitch.resampling(pitch, new_len)[source]

Resample pitch to a given new length in samples

Parameters:

pitch – a 2-D list with time-stamps and pitch values per timestamp.
new_len – new length of the output pitch