Util functions

Data augmentation utils

compiam.utils.augment.attack_remix(in_path, out_dir, gain_factors=0.3, winDur=46.4, hopDur=5, sr=16000, n_jobs=4)[source]

Modifying the relative levels of attack and decay regions of an audio

Parameters:
  • in_path (str) – Path to input audio

  • out_dir (str) – Directory to output pitch shifted audios

  • gain_factors (float or list) – gain factor (or list of) to scale attack portion with

  • winDur (float) – Window size in milliseconds

  • hopDur (float) – Hop size in milliseconds

  • sr (float) – Sampling rate of input audio

  • n_jobs (int) – n jobs for parrelelization

compiam.utils.augment.spectral_shape(in_path, out_dir, gain_factors=(0.6, 2, 0.2), winDur=46.4, hopDur=0.005, sr=16000, n_jobs=4)[source]

Augmenting data by perturbin ‘nuisance attributes’ that are unimportant in the specific discrimination task.

Parameters:
  • in_path (str) – Path to input audio

  • out_dir (str) – Directory to output pitch shifted audios

  • gain_factors (float or list) – 3-tuple (or list of) with gain factors for remixing. Tuple entries correspond to each of bass, treble, & damped components.

  • winDur (float) – Window size in milliseconds

  • hopDur (float) – Hop size in milliseconds

  • sr (float) – Sampling rate of input audio

  • n_jobs (int) – n jobs for parrelelization

compiam.utils.augment.stroke_remix(in_path, out_dir, gain_factors=(0.6, 2, 0.2), templates='/home/runner/work/compIAM/compIAM/compiam/utils/augment/augmentation/templates.npy', winDur=46.4, hopDur=0.005, sr=16000, n_jobs=4)[source]
Simulate the expected variations of relative strengths of drums in a mix

using non-negative matrix factorization (NMF).

Parameters:
  • in_path (str) – Path to input audio

  • out_dir (str) – Directory to output pitch shifted audios

  • gain_factors (float or list) – 3-tuple (or list of) with gain factors for remixing. Tuple entries correspond to each of bass, treble, & damped components.

  • templates (str) – path to saved nmf templates

  • winDur (float) – Window size in milliseconds

  • hopDur (float) – Hop size in milliseconds

  • sr (float) – Sampling rate of input audio

  • n_jobs (int) – n jobs for parrelelization

Pitch-related utils

compiam.utils.pitch.cents_to_pitch(c, tonic)[source]

Convert cents value, <c> to pitch in Hz

Parameters:
  • c (float/int) – Pitch value in cents above <tonic>

  • tonic (float) – Tonic value in Hz

Returns:

Pitch value, <c> in Hz

Return type:

float

compiam.utils.pitch.extract_stability_mask(pitch, min_stab_sec, hop_sec, var, timestep)[source]

Extract boolean array corresponding to <pitch> - yes/no does point correspond to a region of “stable” pitch.

A window is passed along the pitch track, <pitch> and the minimum and maximum values compared to the average for that window. Regions corresponding to windows whose extremes deviate significantly from their means are marked as stable. Consecutive stable regions summing to more than <min_stab_sec> seconds in length are annotated with 1 indicating stable. Regions which are not stable or that are stable but do not make up at least <min_stab_sec> in length are annotated with 0 - not stable.

Parameters:
  • pitch (np.ndarray) – Pitch values in Hz or cents

  • min_stab_sec (float) – Stable regions of at least <min_stab_sec> seconds in length are annotated as stable. Shorter regions are not annotated.

  • hop_sec (float) – Hop length in seconds of window

  • var (float) – If the maximum/minimum pitch in a window deviates from its mean by more than this value, the window is considered unstable. Important to consider if the input is in cents or hertz!

  • timestep (float) – Time difference, in seconds, between each element in pitch

Returns:

Boolean array equal in length to <pitch>: is stable region or not?

Return type:

np.ndarray

compiam.utils.pitch.interpolate_below_length(arr, val, gap)[source]

Interpolate gaps of value, <val> of length equal to or shorter than <gap> in <arr>

Parameters:
  • arr (np.array) – Array to interpolate

  • val (number) – Value expected in gaps to interpolate

  • gap (number) – Maximum gap length to interpolate, gaps of <val> longer than <g> will not be interpolated

Returns:

interpolated array

Return type:

np.array

compiam.utils.pitch.is_stable(seq, max_var)[source]

Compute is sequence of value has stability given an input tolerance

Parameters:
  • seq – sequence of values to study

  • max_var – Maximum tolerance to consider stable/not stable

Returns:

1 (stable) or 0 (not stable)

compiam.utils.pitch.normalisation(pitch, tonic, bins_per_octave=120, max_value=4)[source]

Normalize pitch given a tonic.

Parameters:
  • pitch – a 2-D list with time-stamps and pitch values per timestamp.

  • tonic – recording tonic to normalize the pitch to.

  • bins_per_octave – number of frequency bins per octave.

  • max_value – maximum value to clip the normalized pitch to.

Returns:

a 2-D list with time-stamps and normalized to a given tonic pitch values per timestamp.

compiam.utils.pitch.pitch_seq_to_cents(pseq, tonic)[source]

Convert sequence of pitch values to sequence of cents above <tonic> values

Parameters:
  • pseq (np.array) – Array of pitch values in Hz

  • tonic (float) – Tonic value in Hz

Returns:

Sequence of original pitch value in cents above <tonic>

Return type:

np.array

compiam.utils.pitch.pitch_to_cents(p, tonic)[source]

Convert pitch value, <p> to cents above <tonic>.

Parameters:
  • p (float) – Pitch value in Hz

  • tonic (float) – Tonic value in Hz

Returns:

Pitch value, <p> in cents above <tonic>

Return type:

float

compiam.utils.pitch.resample_melody_series(times, frequencies, voicing, times_new, kind='linear')[source]

NOTE: This function is DIRECTLY PORTED FROM mir_eval.melody

Resamples frequency and voicing time series to a new timescale. Maintains any zero (“unvoiced”) values in frequencies.

If times and times_new are equivalent, no resampling will be performed.

Parameters:
  • times – Times of each frequency value

  • frequencies – Array of frequency values, >= 0

  • voicing – Array which indicates voiced or unvoiced. This array may be binary or have continuous values between 0 and 1.

  • times_new – Times to resample frequency and voicing sequences to

:param kind:kind parameter to pass to scipy.interpolate.interp1d.

(Default value = ‘linear’)

Returns frequencies_resampled:

Frequency array resampled to new timebase

Returns voicing_resampled:

Voicing array resampled to new timebase

compiam.utils.pitch.resampling(pitch, new_len)[source]

Resample pitch to a given new length in samples

Parameters:
  • pitch – a 2-D list with time-stamps and pitch values per timestamp.

  • new_len – new length of the output pitch