Util functions
Data augmentation utils
- compiam.utils.augment.attack_remix(in_path, out_dir, gain_factors=0.3, winDur=46.4, hopDur=5, sr=16000, n_jobs=4)[source]
Modifying the relative levels of attack and decay regions of an audio
- Parameters:
in_path (str) – Path to input audio
out_dir (str) – Directory to output pitch shifted audios
gain_factors (float or list) – gain factor (or list of) to scale attack portion with
winDur (float) – Window size in milliseconds
hopDur (float) – Hop size in milliseconds
sr (float) – Sampling rate of input audio
n_jobs (int) – n jobs for parrelelization
- compiam.utils.augment.spectral_shape(in_path, out_dir, gain_factors=(0.6, 2, 0.2), winDur=46.4, hopDur=0.005, sr=16000, n_jobs=4)[source]
Augmenting data by perturbin ‘nuisance attributes’ that are unimportant in the specific discrimination task.
- Parameters:
in_path (str) – Path to input audio
out_dir (str) – Directory to output pitch shifted audios
gain_factors (float or list) – 3-tuple (or list of) with gain factors for remixing. Tuple entries correspond to each of bass, treble, & damped components.
winDur (float) – Window size in milliseconds
hopDur (float) – Hop size in milliseconds
sr (float) – Sampling rate of input audio
n_jobs (int) – n jobs for parrelelization
- compiam.utils.augment.stroke_remix(in_path, out_dir, gain_factors=(0.6, 2, 0.2), templates='/home/runner/work/compIAM/compIAM/compiam/utils/augment/augmentation/templates.npy', winDur=46.4, hopDur=0.005, sr=16000, n_jobs=4)[source]
- Simulate the expected variations of relative strengths of drums in a mix
using non-negative matrix factorization (NMF).
- Parameters:
in_path (str) – Path to input audio
out_dir (str) – Directory to output pitch shifted audios
gain_factors (float or list) – 3-tuple (or list of) with gain factors for remixing. Tuple entries correspond to each of bass, treble, & damped components.
templates (str) – path to saved nmf templates
winDur (float) – Window size in milliseconds
hopDur (float) – Hop size in milliseconds
sr (float) – Sampling rate of input audio
n_jobs (int) – n jobs for parrelelization
Pitch-related utils
- compiam.utils.pitch.cents_to_pitch(c, tonic)[source]
Convert cents value, <c> to pitch in Hz
- Parameters:
c (float/int) – Pitch value in cents above <tonic>
tonic (float) – Tonic value in Hz
- Returns:
Pitch value, <c> in Hz
- Return type:
float
- compiam.utils.pitch.extract_stability_mask(pitch, min_stab_sec, hop_sec, var, timestep)[source]
Extract boolean array corresponding to <pitch> - yes/no does point correspond to a region of “stable” pitch.
A window is passed along the pitch track, <pitch> and the minimum and maximum values compared to the average for that window. Regions corresponding to windows whose extremes deviate significantly from their means are marked as stable. Consecutive stable regions summing to more than <min_stab_sec> seconds in length are annotated with 1 indicating stable. Regions which are not stable or that are stable but do not make up at least <min_stab_sec> in length are annotated with 0 - not stable.
- Parameters:
pitch (np.ndarray) – Pitch values in Hz or cents
min_stab_sec (float) – Stable regions of at least <min_stab_sec> seconds in length are annotated as stable. Shorter regions are not annotated.
hop_sec (float) – Hop length in seconds of window
var (float) – If the maximum/minimum pitch in a window deviates from its mean by more than this value, the window is considered unstable. Important to consider if the input is in cents or hertz!
timestep (float) – Time difference, in seconds, between each element in pitch
- Returns:
Boolean array equal in length to <pitch>: is stable region or not?
- Return type:
np.ndarray
- compiam.utils.pitch.interpolate_below_length(arr, val, gap)[source]
Interpolate gaps of value, <val> of length equal to or shorter than <gap> in <arr>
- Parameters:
arr (np.array) – Array to interpolate
val (number) – Value expected in gaps to interpolate
gap (number) – Maximum gap length to interpolate, gaps of <val> longer than <g> will not be interpolated
- Returns:
interpolated array
- Return type:
np.array
- compiam.utils.pitch.is_stable(seq, max_var)[source]
Compute is sequence of value has stability given an input tolerance
- Parameters:
seq – sequence of values to study
max_var – Maximum tolerance to consider stable/not stable
- Returns:
1 (stable) or 0 (not stable)
- compiam.utils.pitch.normalisation(pitch, tonic, bins_per_octave=120, max_value=4)[source]
Normalize pitch given a tonic.
- Parameters:
pitch – a 2-D list with time-stamps and pitch values per timestamp.
tonic – recording tonic to normalize the pitch to.
bins_per_octave – number of frequency bins per octave.
max_value – maximum value to clip the normalized pitch to.
- Returns:
a 2-D list with time-stamps and normalized to a given tonic pitch values per timestamp.
- compiam.utils.pitch.pitch_seq_to_cents(pseq, tonic)[source]
Convert sequence of pitch values to sequence of cents above <tonic> values
- Parameters:
pseq (np.array) – Array of pitch values in Hz
tonic (float) – Tonic value in Hz
- Returns:
Sequence of original pitch value in cents above <tonic>
- Return type:
np.array
- compiam.utils.pitch.pitch_to_cents(p, tonic)[source]
Convert pitch value, <p> to cents above <tonic>.
- Parameters:
p (float) – Pitch value in Hz
tonic (float) – Tonic value in Hz
- Returns:
Pitch value, <p> in cents above <tonic>
- Return type:
float
- compiam.utils.pitch.resample_melody_series(times, frequencies, voicing, times_new, kind='linear')[source]
NOTE: This function is DIRECTLY PORTED FROM mir_eval.melody
Resamples frequency and voicing time series to a new timescale. Maintains any zero (“unvoiced”) values in frequencies.
If
times
andtimes_new
are equivalent, no resampling will be performed.- Parameters:
times – Times of each frequency value
frequencies – Array of frequency values, >= 0
voicing – Array which indicates voiced or unvoiced. This array may be binary or have continuous values between 0 and 1.
times_new – Times to resample frequency and voicing sequences to
- :param kind:kind parameter to pass to scipy.interpolate.interp1d.
(Default value = ‘linear’)
- Returns frequencies_resampled:
Frequency array resampled to new timebase
- Returns voicing_resampled:
Voicing array resampled to new timebase