new Essentia( Essentia [, isDebug ] )

Parameters
Name Type Attributes Default Description
Essentia EssentiaWASM

WASM backend (emcripten global module object) which is loaded from 'essentia-wasm.*.js file'

isDebug boolean <optional>
false
Details

Methods


<async> getAudioBufferFromURL( audioURL, webAudioCtx ) → {AudioBuffer}

Description

Decode and returns the audio buffer of a given audio url or blob uri using Web Audio API. (NOTE: This method doesn't works on Safari browser)

Parameters
Name Type Description
audioURL string

web url or blob uri of a audio file

webAudioCtx AudioContext

an instance of Web Audio API AudioContext

Returns

decoded audio buffer object

Details

<async> getAudioChannelDataFromURL( audioURL, webAudioCtx [, channel ] ) → {Float32Array}

Description

Decode and returns the audio channel data from an given audio url or blob uri using Web Audio API. (NOTE: This method doesn't works on Safari browser)

Parameters
Name Type Attributes Default Description
audioURL string

web url or blob uri of a audio file

webAudioCtx AudioContext

an instance of Web Audio API AudioContext

channel number <optional>
0

audio channel number

Returns

decode and returns the audio data as Float32 array for the given channel

Details

audioBufferToMonoSignal( buffer ) → {Float32Array}

Description

Convert an AudioBuffer object to a Mono audio signal array. The audio signal is downmixed to mono using essentia MonoMixer algorithm if the audio buffer has 2 channels of audio. Throws an expection if the input AudioBuffer object has more than 2 channels of audio.

Parameters
Name Type Description
buffer AudioBuffer

AudioBuffer object decoded from an audio file.

Returns

audio channel data. (downmixed to mono if its stereo signal).

Details

shutdown()

Description

Method to shutdown essentia algorithm instance after it's use

Details

reinstantiate()

Description

Method for re-instantiating essentia algorithms instance after using the shutdown method

Details

"delete"()

Description

Delete essentiajs class instance

Details

arrayToVector( inputArray ) → {VectorFloat}

Description

Convert an input JS array into VectorFloat type

Parameters
Name Type Description
inputArray Float32Array

input JS typed array

Returns

returns vector float

Details

vectorToArray( inputVector ) → {Float32Array}

Description

Convert an input VectorFloat array into typed JS Float32Array

Parameters
Name Type Description
inputVector VectorFloat

input VectorFloat array

Returns

returns converted JS typed array

Details

FrameGenerator( inputAudioData [, frameSize [, hopSize ] ] ) → {VectorVectorFloat}

Description

Cuts an audio signal data into overlapping frames given frame size and hop size

Parameters
Name Type Attributes Default Description
inputAudioData Float32Array

a single channel audio channel data

frameSize number <optional>
2048

frame size for cutting the audio signal

hopSize number <optional>
1024

size of overlapping frame

Returns

Returns a 2D vector float of sliced audio frames

Details

MonoMixer( leftChannel, rightChannel ) → {object}

Description

This algorithm downmixes the signal into a single channel given a stereo signal. It is a wrapper around https://essentia.upf.edu/reference/std_MonoMixer.html.

Parameters
Name Type Description
leftChannel VectorFloat

the left channel of the stereo audio signal

rightChannel VectorFloat

the right channel of the stereo audio signal

Returns

{audio: 'the downmixed mono signal'}

Details

LoudnessEBUR128( leftChannel, rightChannel [, hopSize [, sampleRate [, startAtZero ] ] ] ) → {object}

Description

This algorithm computes the EBUR128 loudness descriptors of an audio signal. It is a wrapper around https://essentia.upf.edu/reference/std_LoudnessEBUR128.html.

Parameters
Name Type Attributes Default Description
leftChannel VectorFloat

the left channel of the stereo audio signal

rightChannel VectorFloat

the right channel of the stereo audio signal

hopSize number <optional>
0.1

the hop size with which the loudness is computed [s]

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

startAtZero boolean <optional>
false

start momentary/short-term loudness estimation at time 0 (zero-centered loudness estimation windows) if true; otherwise start both windows at time 0 (time positions for momentary and short-term values will not be syncronized)

Returns

{momentaryLoudness: 'momentary loudness (over 400ms) (LUFS)', shortTermLoudness: 'short-term loudness (over 3 seconds) (LUFS)', integratedLoudness: 'integrated loudness (overall) (LUFS)', loudnessRange: 'loudness range over an arbitrary long time interval [3] (dB, LU)'}

Details

AfterMaxToBeforeMaxEnergyRatio( pitch ) → {object}

Description

This algorithm computes the ratio between the pitch energy after the pitch maximum and the pitch energy before the pitch maximum. Sounds having an monotonically ascending pitch or one unique pitch will show a value of (0,1], while sounds having a monotonically descending pitch will show a value of [1,inf). In case there is no energy before the max pitch, the algorithm will return the energy after the maximum pitch. Check https://essentia.upf.edu/reference/std_AfterMaxToBeforeMaxEnergyRatio.html for more details.

Parameters
Name Type Description
pitch VectorFloat

the array of pitch values [Hz]

Returns

{afterMaxToBeforeMaxEnergyRatio: 'the ratio between the pitch energy after the pitch maximum to the pitch energy before the pitch maximum'}

Details

AllPass( signal [, bandwidth [, cutoffFrequency [, order [, sampleRate ] ] ] ] ) → {object}

Description

This algorithm implements a IIR all-pass filter of order 1 or 2. Because of its dependence on IIR, IIR's requirements are inherited. Check https://essentia.upf.edu/reference/std_AllPass.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input signal

bandwidth number <optional>
500

the bandwidth of the filter [Hz] (used only for 2nd-order filters)

cutoffFrequency number <optional>
1500

the cutoff frequency for the filter [Hz]

order number <optional>
1

the order of the filter

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

Returns

{signal: 'the filtered signal'}

Details

AudioOnsetsMarker( signal [, onsets [, sampleRate [, type ] ] ] ) → {object}

Description

This algorithm creates a wave file in which a given audio signal is mixed with a series of time onsets. The sonification of the onsets can be heard as beeps, or as short white noise pulses if configured to do so. Check https://essentia.upf.edu/reference/std_AudioOnsetsMarker.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input signal

onsets Array.<any> <optional>
[]

the list of onset locations [s]

sampleRate number <optional>
44100

the sampling rate of the output signal [Hz]

type string <optional>
beep

the type of sound to be added on the event

Returns

{signal: 'the input signal mixed with bursts at onset locations'}

Details

AutoCorrelation( array [, frequencyDomainCompression [, generalized [, normalization ] ] ] ) → {object}

Description

This algorithm computes the autocorrelation vector of a signal. It uses the version most commonly used in signal processing, which doesn't remove the mean from the observations. Using the 'generalized' option this algorithm computes autocorrelation as described in [3]. Check https://essentia.upf.edu/reference/std_AutoCorrelation.html for more details.

Parameters
Name Type Attributes Default Description
array VectorFloat

the array to be analyzed

frequencyDomainCompression number <optional>
0.5

factor at which FFT magnitude is compressed (only used if 'generalized' is set to true, see [3])

generalized boolean <optional>
false

bool value to indicate whether to compute the 'generalized' autocorrelation as described in [3]

normalization string <optional>
standard

type of normalization to compute: either 'standard' (default) or 'unbiased'

Returns

{autoCorrelation: 'the autocorrelation vector'}

Details

BFCC( spectrum [, dctType [, highFrequencyBound [, inputSize [, liftering [, logType [, lowFrequencyBound [, normalize [, numberBands [, numberCoefficients [, sampleRate [, type [, weighting ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes the bark-frequency cepstrum coefficients of a spectrum. Bark bands and their subsequent usage in cepstral analysis have shown to be useful in percussive content [1, 2] This algorithm is implemented using the Bark scaling approach in the Rastamat version of the MFCC algorithm and in a similar manner to the MFCC-FB40 default specs: Check https://essentia.upf.edu/reference/std_BFCC.html for more details.

Parameters
Name Type Attributes Default Description
spectrum VectorFloat

the audio spectrum

dctType number <optional>
2

the DCT type

highFrequencyBound number <optional>
11000

the upper bound of the frequency range [Hz]

inputSize number <optional>
1025

the size of input spectrum

liftering number <optional>
0

the liftering coefficient. Use '0' to bypass it

logType string <optional>
dbamp

logarithmic compression type. Use 'dbpow' if working with power and 'dbamp' if working with magnitudes

lowFrequencyBound number <optional>
0

the lower bound of the frequency range [Hz]

normalize string <optional>
unit_sum

'unit_max' makes the vertex of all the triangles equal to 1, 'unit_sum' makes the area of all the triangles equal to 1

numberBands number <optional>
40

the number of bark bands in the filter

numberCoefficients number <optional>
13

the number of output cepstrum coefficients

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

type string <optional>
power

use magnitude or power spectrum

weighting string <optional>
warping

type of weighting function for determining triangle area

Returns

{bands: 'the energies in bark bands', bfcc: 'the bark frequency cepstrum coefficients'}

Details

BPF( x [, xPoints [, yPoints ] ] ) → {object}

Description

This algorithm implements a break point function which linearly interpolates between discrete xy-coordinates to construct a continuous function. Check https://essentia.upf.edu/reference/std_BPF.html for more details.

Parameters
Name Type Attributes Default Description
x number

the input coordinate (x-axis)

xPoints Array.<any> <optional>
[0, 1]

the x-coordinates of the points forming the break-point function (the points must be arranged in ascending order and cannot contain duplicates)

yPoints Array.<any> <optional>
[0, 1]

the y-coordinates of the points forming the break-point function

Returns

{y: 'the output coordinate (y-axis)'}

Details

BandPass( signal [, bandwidth [, cutoffFrequency [, sampleRate ] ] ] ) → {object}

Description

This algorithm implements a 2nd order IIR band-pass filter. Because of its dependence on IIR, IIR's requirements are inherited. Check https://essentia.upf.edu/reference/std_BandPass.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input audio signal

bandwidth number <optional>
500

the bandwidth of the filter [Hz]

cutoffFrequency number <optional>
1500

the cutoff frequency for the filter [Hz]

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

Returns

{signal: 'the filtered signal'}

Details

BandReject( signal [, bandwidth [, cutoffFrequency [, sampleRate ] ] ] ) → {object}

Description

This algorithm implements a 2nd order IIR band-reject filter. Because of its dependence on IIR, IIR's requirements are inherited. Check https://essentia.upf.edu/reference/std_BandReject.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input signal

bandwidth number <optional>
500

the bandwidth of the filter [Hz]

cutoffFrequency number <optional>
1500

the cutoff frequency for the filter [Hz]

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

Returns

{signal: 'the filtered signal'}

Details

BarkBands( spectrum [, numberBands [, sampleRate ] ] ) → {object}

Description

This algorithm computes energy in Bark bands of a spectrum. The band frequencies are: [0.0, 50.0, 100.0, 150.0, 200.0, 300.0, 400.0, 510.0, 630.0, 770.0, 920.0, 1080.0, 1270.0, 1480.0, 1720.0, 2000.0, 2320.0, 2700.0, 3150.0, 3700.0, 4400.0, 5300.0, 6400.0, 7700.0, 9500.0, 12000.0, 15500.0, 20500.0, 27000.0]. The first two Bark bands [0,100] and [100,200] have been split in half for better resolution (because of an observed better performance in beat detection). For each bark band the power-spectrum (mag-squared) is summed. Check https://essentia.upf.edu/reference/std_BarkBands.html for more details.

Parameters
Name Type Attributes Default Description
spectrum VectorFloat

the input spectrum

numberBands number <optional>
27

the number of desired barkbands

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

Returns

{bands: 'the energy of the bark bands'}

Details

BeatTrackerDegara( signal [, maxTempo [, minTempo ] ] ) → {object}

Description

This algorithm estimates the beat positions given an input signal. It computes 'complex spectral difference' onset detection function and utilizes the beat tracking algorithm (TempoTapDegara) to extract beats [1]. The algorithm works with the optimized settings of 2048/1024 frame/hop size for the computation of the detection function, with its posterior x2 resampling.) While it has a lower accuracy than BeatTrackerMultifeature (see the evaluation results in [2]), its computational speed is significantly higher, which makes reasonable to apply this algorithm for batch processings of large amounts of audio signals. Check https://essentia.upf.edu/reference/std_BeatTrackerDegara.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the audio input signal

maxTempo number <optional>
208

the fastest tempo to detect [bpm]

minTempo number <optional>
40

the slowest tempo to detect [bpm]

Returns

{ticks: ' the estimated tick locations [s]'}

Details

BeatTrackerMultiFeature( signal [, maxTempo [, minTempo ] ] ) → {object}

Description

This algorithm estimates the beat positions given an input signal. It computes a number of onset detection functions and estimates beat location candidates from them using TempoTapDegara algorithm. Thereafter the best candidates are selected using TempoTapMaxAgreement. The employed detection functions, and the optimal frame/hop sizes used for their computation are: - complex spectral difference (see 'complex' method in OnsetDetection algorithm, 2048/1024 with posterior x2 upsample or the detection function) - energy flux (see 'rms' method in OnsetDetection algorithm, the same settings) - spectral flux in Mel-frequency bands (see 'melflux' method in OnsetDetection algorithm, the same settings) - beat emphasis function (see 'beat_emphasis' method in OnsetDetectionGlobal algorithm, 2048/512) - spectral flux between histogrammed spectrum frames, measured by the modified information gain (see 'infogain' method in OnsetDetectionGlobal algorithm, 2048/512) Check https://essentia.upf.edu/reference/std_BeatTrackerMultiFeature.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the audio input signal

maxTempo number <optional>
208

the fastest tempo to detect [bpm]

minTempo number <optional>
40

the slowest tempo to detect [bpm]

Returns

{ticks: ' the estimated tick locations [s]', confidence: 'confidence of the beat tracker [0, 5.32]'}

Details

Beatogram( loudness, loudnessBandRatio [, size ] ) → {object}

Description

This algorithm filters the loudness matrix given by BeatsLoudness algorithm in order to keep only the most salient beat band representation. This algorithm has been found to be useful for estimating time signatures. Check https://essentia.upf.edu/reference/std_Beatogram.html for more details.

Parameters
Name Type Attributes Default Description
loudness VectorFloat

the loudness at each beat

loudnessBandRatio VectorVectorFloat

matrix of loudness ratios at each band and beat

size number <optional>
16

number of beats for dynamic filtering

Returns

{beatogram: 'filtered matrix loudness'}

Details

BeatsLoudness( signal [, beatDuration [, beatWindowDuration [, beats [, frequencyBands [, sampleRate ] ] ] ] ] ) → {object}

Description

This algorithm computes the spectrum energy of beats in an audio signal given their positions. The energy is computed both on the whole frequency range and for each of the specified frequency bands. See the SingleBeatLoudness algorithm for a more detailed explanation. Check https://essentia.upf.edu/reference/std_BeatsLoudness.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input audio signal

beatDuration number <optional>
0.05

the duration of the window in which the beat will be restricted [s]

beatWindowDuration number <optional>
0.1

the duration of the window in which to look for the beginning of the beat (centered around the positions in 'beats') [s]

beats Array.<any> <optional>
[]

the list of beat positions (each position is in seconds)

frequencyBands Array.<any> <optional>
[20, 150, 400, 3200, 7000, 22000]

the list of bands to compute energy ratios [Hz

sampleRate number <optional>
44100

the audio sampling rate [Hz]

Returns

{loudness: 'the beat's energy in the whole spectrum', loudnessBandRatio: 'the ratio of the beat's energy on each frequency band'}

Details

BinaryOperator( array1, array2 [, type ] ) → {object}

Description

This algorithm performs basic arithmetical operations element by element given two arrays. Note: - using this algorithm in streaming mode can cause diamond shape graphs which have not been tested with the current scheduler. There is NO GUARANTEE of its correct work for diamond shape graphs. - for y<0, x/y is invalid Check https://essentia.upf.edu/reference/std_BinaryOperator.html for more details.

Parameters
Name Type Attributes Default Description
array1 VectorFloat

the first operand input array

array2 VectorFloat

the second operand input array

type string <optional>
add

the type of the binary operator to apply to the input arrays

Returns

{array: 'the array containing the result of binary operation'}

Details

BinaryOperatorStream( array1, array2 [, type ] ) → {object}

Description

This algorithm performs basic arithmetical operations element by element given two arrays. Note: - using this algorithm in streaming mode can cause diamond shape graphs which have not been tested with the current scheduler. There is NO GUARANTEE of its correct work for diamond shape graphs. - for y<0, x/y is invalid Check https://essentia.upf.edu/reference/std_BinaryOperatorStream.html for more details.

Parameters
Name Type Attributes Default Description
array1 VectorFloat

the first operand input array

array2 VectorFloat

the second operand input array

type string <optional>
add

the type of the binary operator to apply to the input arrays

Returns

{array: 'the array containing the result of binary operation'}

Details

BpmHistogramDescriptors( bpmIntervals ) → {object}

Description

This algorithm computes beats per minute histogram and its statistics for the highest and second highest peak. Note: histogram vector contains occurance frequency for each bpm value, 0-th element corresponds to 0 bpm value. Check https://essentia.upf.edu/reference/std_BpmHistogramDescriptors.html for more details.

Parameters
Name Type Description
bpmIntervals VectorFloat

the list of bpm intervals [s]

Returns

{firstPeakBPM: 'value for the highest peak [bpm]', firstPeakWeight: 'weight of the highest peak', firstPeakSpread: 'spread of the highest peak', secondPeakBPM: 'value for the second highest peak [bpm]', secondPeakWeight: 'weight of the second highest peak', secondPeakSpread: 'spread of the second highest peak', histogram: 'bpm histogram [bpm]'}

Details

BpmRubato( beats [, longRegionsPruningTime [, shortRegionsMergingTime [, tolerance ] ] ] ) → {object}

Description

This algorithm extracts the locations of large tempo changes from a list of beat ticks. Check https://essentia.upf.edu/reference/std_BpmRubato.html for more details.

Parameters
Name Type Attributes Default Description
beats VectorFloat

list of detected beat ticks [s]

longRegionsPruningTime number <optional>
20

time for the longest constant tempo region inside a rubato region [s]

shortRegionsMergingTime number <optional>
4

time for the shortest constant tempo region from one tempo region to another [s]

tolerance number <optional>
0.08

minimum tempo deviation to look for

Returns

{rubatoStart: 'list of timestamps where the start of a rubato region was detected [s]', rubatoStop: 'list of timestamps where the end of a rubato region was detected [s]', rubatoNumber: 'number of detected rubato regions'}

Details

CentralMoments( array [, mode [, range ] ] ) → {object}

Description

This algorithm extracts the 0th, 1st, 2nd, 3rd and 4th central moments of an array. It returns a 5-tuple in which the index corresponds to the order of the moment. Check https://essentia.upf.edu/reference/std_CentralMoments.html for more details.

Parameters
Name Type Attributes Default Description
array VectorFloat

the input array

mode string <optional>
pdf

compute central moments considering array values as a probability density function over array index or as sample points of a distribution

range number <optional>
1

the range of the input array, used for normalizing the results in the 'pdf' mode

Returns

{centralMoments: 'the central moments of the input array'}

Details

Centroid( array [, range ] ) → {object}

Description

This algorithm computes the centroid of an array. The centroid is normalized to a specified range. This algorithm can be used to compute spectral centroid or temporal centroid. Check https://essentia.upf.edu/reference/std_Centroid.html for more details.

Parameters
Name Type Attributes Default Description
array VectorFloat

the input array

range number <optional>
1

the range of the input array, used for normalizing the results

Returns

{centroid: 'the centroid of the array'}

Details

ChordsDescriptors( chords, key, scale ) → {object}

Description

Given a chord progression this algorithm describes it by means of key, scale, histogram, and rate of change. Note: - chordsHistogram indexes follow the circle of fifths order, while being shifted to the input key and scale - key and scale are taken from the most frequent chord. In the case where multiple chords are equally frequent, the chord is hierarchically chosen from the circle of fifths. - chords should follow this name convention <A-G>[<#/b><m>] (i.e. C, C# or C#m are valid chords). Chord names not fitting this convention will throw an exception. Check https://essentia.upf.edu/reference/std_ChordsDescriptors.html for more details.

Parameters
Name Type Description
chords VectorString

the chord progression

key string

the key of the whole song, from A to G

scale string

the scale of the whole song (major or minor)

Returns

{chordsHistogram: 'the normalized histogram of chords', chordsNumberRate: 'the ratio of different chords from the total number of chords in the progression', chordsChangesRate: 'the rate at which chords change in the progression', chordsKey: 'the most frequent chord of the progression', chordsScale: 'the scale of the most frequent chord of the progression (either 'major' or 'minor')'}

Details

ChordsDetection( pcp [, hopSize [, sampleRate [, windowSize ] ] ] ) → {object}

Description

This algorithm estimates chords given an input sequence of harmonic pitch class profiles (HPCPs). It finds the best matching major or minor triad and outputs the result as a string (e.g. A#, Bm, G#m, C). This algorithm uses the Sharp versions of each Flatted note (i.e. Bb -> A#). Check https://essentia.upf.edu/reference/std_ChordsDetection.html for more details.

Parameters
Name Type Attributes Default Description
pcp VectorVectorFloat

the pitch class profile from which to detect the chord

hopSize number <optional>
2048

the hop size with which the input PCPs were computed

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

windowSize number <optional>
2

the size of the window on which to estimate the chords [s]

Returns

{chords: 'the resulting chords, from A to G', strength: 'the strength of the chord'}

Details

ChordsDetectionBeats( pcp, ticks [, chromaPick [, hopSize [, sampleRate ] ] ] ) → {object}

Description

This algorithm estimates chords using pitch profile classes on segments between beats. It is similar to ChordsDetection algorithm, but the chords are estimated on audio segments between each pair of consecutive beats. For each segment the estimation is done based on a chroma (HPCP) vector characterizing it, which can be computed by two methods: - 'interbeat_median', each resulting chroma vector component is a median of all the component values in the segment - 'starting_beat', chroma vector is sampled from the start of the segment (that is, its starting beat position) using its first frame. It makes sense if chroma is preliminary smoothed. Check https://essentia.upf.edu/reference/std_ChordsDetectionBeats.html for more details.

Parameters
Name Type Attributes Default Description
pcp VectorVectorFloat

the pitch class profile from which to detect the chord

ticks VectorFloat

the list of beat positions (in seconds)

chromaPick string <optional>
interbeat_median

method of calculating singleton chroma for interbeat interval

hopSize number <optional>
2048

the hop size with which the input PCPs were computed

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

Returns

{chords: 'the resulting chords, from A to G', strength: 'the strength of the chords'}

Details

ChromaCrossSimilarity( queryFeature, referenceFeature [, binarizePercentile [, frameStackSize [, frameStackStride [, noti [, oti [, otiBinary [, streaming ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes a binary cross similarity matrix from two chromagam feature vectors of a query and reference song. Check https://essentia.upf.edu/reference/std_ChromaCrossSimilarity.html for more details.

Parameters
Name Type Attributes Default Description
queryFeature VectorVectorFloat

frame-wise chromagram of the query song (e.g., a HPCP)

referenceFeature VectorVectorFloat

frame-wise chromagram of the reference song (e.g., a HPCP)

binarizePercentile number <optional>
0.095

maximum percent of distance values to consider as similar in each row and each column

frameStackSize number <optional>
9

number of input frames to stack together and treat as a feature vector for similarity computation. Choose 'frameStackSize=1' to use the original input frames without stacking

frameStackStride number <optional>
1

stride size to form a stack of frames (e.g., 'frameStackStride'=1 to use consecutive frames; 'frameStackStride'=2 for using every second frame)

noti number <optional>
12

number of circular shifts to be checked for Optimal Transposition Index [1]

oti boolean <optional>
true

whether to transpose the key of the reference song to the query song by Optimal Transposition Index [1]

otiBinary boolean <optional>
false

whether to use the OTI-based chroma binary similarity method [3]

streaming boolean <optional>
false

whether to accumulate the input 'queryFeature' in the euclidean similarity matrix calculation on each compute() method call

Returns

{csm: '2D binary cross-similarity matrix of the query and reference features'}

Details

Chromagram( frame [, binsPerOctave [, minFrequency [, minimumKernelSize [, normalizeType [, numberBins [, sampleRate [, scale [, threshold [, windowType [, zeroPhase ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes the Constant-Q chromagram using FFT. See ConstantQ algorithm for more details. Check https://essentia.upf.edu/reference/std_Chromagram.html for more details.

Parameters
Name Type Attributes Default Description
frame VectorFloat

the input audio frame

binsPerOctave number <optional>
12

number of bins per octave

minFrequency number <optional>
32.7

minimum frequency [Hz]

minimumKernelSize number <optional>
4

minimum size allowed for frequency kernels

normalizeType string <optional>
unit_max

normalize type

numberBins number <optional>
84

number of frequency bins, starting at minFrequency

sampleRate number <optional>
44100

FFT sampling rate [Hz]

scale number <optional>
1

filters scale. Larger values use longer windows

threshold number <optional>
0.01

bins whose magnitude is below this quantile are discarded

windowType string <optional>
hann

the window type, which can be 'hamming', 'hann', 'triangular', 'square' or 'blackmanharrisXX'

zeroPhase boolean <optional>
true

a boolean value that enables zero-phase windowing. Input audio frames should be windowed with the same phase mode

Returns

{chromagram: 'the magnitude constant-Q chromagram'}

Details

ClickDetector( frame [, detectionThreshold [, frameSize [, hopSize [, order [, powerEstimationThreshold [, sampleRate [, silenceThreshold ] ] ] ] ] ] ] ) → {object}

Description

This algorithm detects the locations of impulsive noises (clicks and pops) on the input audio frame. It relies on LPC coefficients to inverse-filter the audio in order to attenuate the stationary part and enhance the prediction error (or excitation noise)[1]. After this, a matched filter is used to further enhance the impulsive peaks. The detection threshold is obtained from a robust estimate of the excitation noise power [2] plus a parametric gain value. Check https://essentia.upf.edu/reference/std_ClickDetector.html for more details.

Parameters
Name Type Attributes Default Description
frame VectorFloat

the input frame (must be non-empty)

detectionThreshold number <optional>
30

'detectionThreshold' the threshold is based on the instant power of the noisy excitation signal plus detectionThreshold dBs

frameSize number <optional>
512

the expected size of the input audio signal (this is an optional parameter to optimize memory allocation)

hopSize number <optional>
256

hop size used for the analysis. This parameter must be set correctly as it cannot be obtained from the input data

order number <optional>
12

scalar giving the number of LPCs to use

powerEstimationThreshold number <optional>
10

the noisy excitation is clipped to 'powerEstimationThreshold' times its median.

sampleRate number <optional>
44100

sample rate used for the analysis

silenceThreshold number <optional>
-50

threshold to skip silent frames

Returns

{starts: 'starting indexes of the clicks', ends: 'ending indexes of the clicks'}

Details

Clipper( signal [, max [, min ] ] ) → {object}

Description

This algorithm clips the input signal to fit its values into a specified interval. Check https://essentia.upf.edu/reference/std_Clipper.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input signal

max number <optional>
1

the maximum value above which the signal will be clipped

min number <optional>
-1

the minimum value below which the signal will be clipped

Returns

{signal: 'the output signal with the added noise'}

Details

CoverSongSimilarity( inputArray [, alignmentType [, disExtension [, disOnset [, distanceType ] ] ] ] ) → {object}

Description

This algorithm computes a cover song similiarity measure from a binary cross similarity matrix input between two chroma vectors of a query and reference song using various alignment constraints of smith-waterman local-alignment algorithm. Check https://essentia.upf.edu/reference/std_CoverSongSimilarity.html for more details.

Parameters
Name Type Attributes Default Description
inputArray VectorVectorFloat

a 2D binary cross-similarity matrix between two audio chroma vectors (query vs reference song) (refer 'ChromaCrossSimilarity' algorithm').

alignmentType string <optional>
serra09

choose either one of the given local-alignment constraints for smith-waterman algorithm as described in [2] or [3] respectively.

disExtension number <optional>
0.5

penalty for disruption extension

disOnset number <optional>
0.5

penalty for disruption onset

distanceType string <optional>
asymmetric

choose the type of distance. By default the algorithm outputs a asymmetric disctance which is obtained by normalising the maximum score in the alignment score matrix with length of reference song

Returns

{scoreMatrix: 'a 2D smith-waterman alignment score matrix from the input binary cross-similarity matrix', distance: 'cover song similarity distance between the query and reference song from the input similarity matrix. Either 'asymmetric' (as described in [2]) or 'symmetric' (maximum score in the alignment score matrix).'}

Details

Crest( array ) → {object}

Description

This algorithm computes the crest of an array. The crest is defined as the ratio between the maximum value and the arithmetic mean of an array. Typically it is used on the magnitude spectrum. Check https://essentia.upf.edu/reference/std_Crest.html for more details.

Parameters
Name Type Description
array VectorFloat

the input array (cannot contain negative values, and must be non-empty)

Returns

{crest: 'the crest of the input array'}

Details

CrossCorrelation( arrayX, arrayY [, maxLag [, minLag ] ] ) → {object}

Description

This algorithm computes the cross-correlation vector of two signals. It accepts 2 parameters, minLag and maxLag which define the range of the computation of the innerproduct. Check https://essentia.upf.edu/reference/std_CrossCorrelation.html for more details.

Parameters
Name Type Attributes Default Description
arrayX VectorFloat

the first input array

arrayY VectorFloat

the second input array

maxLag number <optional>
1

the maximum lag to be computed between the two vectors

minLag number <optional>
0

the minimum lag to be computed between the two vectors

Returns

{crossCorrelation: 'the cross-correlation vector between the two input arrays (its size is equal to maxLag - minLag + 1)'}

Details

CrossSimilarityMatrix( queryFeature, referenceFeature [, binarize [, binarizePercentile [, frameStackSize [, frameStackStride ] ] ] ] ) → {object}

Description

This algorithm computes a euclidean cross-similarity matrix of two sequences of frame features. Similarity values can be optionally binarized Check https://essentia.upf.edu/reference/std_CrossSimilarityMatrix.html for more details.

Parameters
Name Type Attributes Default Description
queryFeature VectorVectorFloat

input frame features of the query song (e.g., a chromagram)

referenceFeature VectorVectorFloat

input frame features of the reference song (e.g., a chromagram)

binarize boolean <optional>
false

whether to binarize the euclidean cross-similarity matrix

binarizePercentile number <optional>
0.095

maximum percent of distance values to consider as similar in each row and each column

frameStackSize number <optional>
1

number of input frames to stack together and treat as a feature vector for similarity computation. Choose 'frameStackSize=1' to use the original input frames without stacking

frameStackStride number <optional>
1

stride size to form a stack of frames (e.g., 'frameStackStride'=1 to use consecutive frames; 'frameStackStride'=2 for using every second frame)

Returns

{csm: '2D cross-similarity matrix of two input frame sequences (query vs reference)'}

Details

CubicSpline( x [, leftBoundaryFlag [, leftBoundaryValue [, rightBoundaryFlag [, rightBoundaryValue [, xPoints [, yPoints ] ] ] ] ] ] ) → {object}

Description

Computes the second derivatives of a piecewise cubic spline. The input value, i.e. the point at which the spline is to be evaluated typically should be between xPoints[0] and xPoints[size-1]. If the value lies outside this range, extrapolation is used. Regarding [left/right] boundary condition flag parameters: - 0: the cubic spline should be a quadratic over the first interval - 1: the first derivative at the [left/right] endpoint should be [left/right]BoundaryFlag - 2: the second derivative at the [left/right] endpoint should be [left/right]BoundaryFlag References: [1] Spline interpolation - Wikipedia, the free encyclopedia, http://en.wikipedia.org/wiki/Spline_interpolation Check https://essentia.upf.edu/reference/std_CubicSpline.html for more details.

Parameters
Name Type Attributes Default Description
x number

the input coordinate (x-axis)

leftBoundaryFlag number <optional>
0

type of boundary condition for the left boundary

leftBoundaryValue number <optional>
0

the value to be used in the left boundary, when leftBoundaryFlag is 1 or 2

rightBoundaryFlag number <optional>
0

type of boundary condition for the right boundary

rightBoundaryValue number <optional>
0

the value to be used in the right boundary, when rightBoundaryFlag is 1 or 2

xPoints Array.<any> <optional>
[0, 1]

the x-coordinates where data is specified (the points must be arranged in ascending order and cannot contain duplicates)

yPoints Array.<any> <optional>
[0, 1]

the y-coordinates to be interpolated (i.e. the known data)

Returns

{y: 'the value of the spline at x', dy: 'the first derivative of the spline at x', ddy: 'the second derivative of the spline at x'}

Details

DCRemoval( signal [, cutoffFrequency [, sampleRate ] ] ) → {object}

Description

This algorithm removes the DC offset from a signal using a 1st order IIR highpass filter. Because of its dependence on IIR, IIR's requirements are inherited. Check https://essentia.upf.edu/reference/std_DCRemoval.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input audio signal

cutoffFrequency number <optional>
40

the cutoff frequency for the filter [Hz]

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

Returns

{signal: 'the filtered signal, with the DC component removed'}

Details

DCT( array [, dctType [, inputSize [, liftering [, outputSize ] ] ] ] ) → {object}

Description

This algorithm computes the Discrete Cosine Transform of an array. It uses the DCT-II form, with the 1/sqrt(2) scaling factor for the first coefficient. Check https://essentia.upf.edu/reference/std_DCT.html for more details.

Parameters
Name Type Attributes Default Description
array VectorFloat

the input array

dctType number <optional>
2

the DCT type

inputSize number <optional>
10

the size of the input array

liftering number <optional>
0

the liftering coefficient. Use '0' to bypass it

outputSize number <optional>
10

the number of output coefficients

Returns

{dct: 'the discrete cosine transform of the input array'}

Details

Danceability( signal [, maxTau [, minTau [, sampleRate [, tauMultiplier ] ] ] ] ) → {object}

Description

This algorithm estimates danceability of a given audio signal. The algorithm is derived from Detrended Fluctuation Analysis (DFA) described in [1]. The parameters minTau and maxTau are used to define the range of time over which DFA will be performed. The output of this algorithm is the danceability of the audio signal. These values usually range from 0 to 3 (higher values meaning more danceable). Check https://essentia.upf.edu/reference/std_Danceability.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input signal

maxTau number <optional>
8800

maximum segment length to consider [ms]

minTau number <optional>
310

minimum segment length to consider [ms]

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

tauMultiplier number <optional>
1.1

multiplier to increment from min to max tau

Returns

{danceability: 'the danceability value. Normal values range from 0 to ~3. The higher, the more danceable.', dfa: 'the DFA exponent vector for considered segment length (tau) values'}

Details

Decrease( array [, range ] ) → {object}

Description

This algorithm computes the decrease of an array defined as the linear regression coefficient. The range parameter is used to normalize the result. For a spectral centroid, the range should be equal to Nyquist and for an audio centroid the range should be equal to (audiosize - 1) / samplerate. The size of the input array must be at least two elements for "decrease" to be computed, otherwise an exception is thrown. References: [1] Least Squares Fitting -- from Wolfram MathWorld, http://mathworld.wolfram.com/LeastSquaresFitting.html Check https://essentia.upf.edu/reference/std_Decrease.html for more details.

Parameters
Name Type Attributes Default Description
array VectorFloat

the input array

range number <optional>
1

the range of the input array, used for normalizing the results

Returns

{decrease: 'the decrease of the input array'}

Details

Derivative( signal ) → {object}

Description

This algorithm returns the first-order derivative of an input signal. That is, for each input value it returns the value minus the previous one. Check https://essentia.upf.edu/reference/std_Derivative.html for more details.

Parameters
Name Type Description
signal VectorFloat

the input signal

Returns

{signal: 'the derivative of the input signal'}

Details

DerivativeSFX( envelope ) → {object}

Description

This algorithm computes two descriptors that are based on the derivative of a signal envelope. Check https://essentia.upf.edu/reference/std_DerivativeSFX.html for more details.

Parameters
Name Type Description
envelope VectorFloat

the envelope of the signal

Returns

{derAvAfterMax: 'the weighted average of the derivative after the maximum amplitude', maxDerBeforeMax: 'the maximum derivative before the maximum amplitude'}

Details

DiscontinuityDetector( frame [, detectionThreshold [, energyThreshold [, frameSize [, hopSize [, kernelSize [, order [, silenceThreshold [, subFrameSize ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm uses LPC and some heuristics to detect discontinuities in an audio signal. [1]. Check https://essentia.upf.edu/reference/std_DiscontinuityDetector.html for more details.

Parameters
Name Type Attributes Default Description
frame VectorFloat

the input frame (must be non-empty)

detectionThreshold number <optional>
8

'detectionThreshold' times the standard deviation plus the median of the frame is used as detection threshold

energyThreshold number <optional>
-60

threshold in dB to detect silent subframes

frameSize number <optional>
512

the expected size of the input audio signal (this is an optional parameter to optimize memory allocation)

hopSize number <optional>
256

hop size used for the analysis. This parameter must be set correctly as it cannot be obtained from the input data

kernelSize number <optional>
7

scalar giving the size of the median filter window. Must be odd

order number <optional>
3

scalar giving the number of LPCs to use

silenceThreshold number <optional>
-50

threshold to skip silent frames

subFrameSize number <optional>
32

size of the window used to compute silent subframes

Returns

{discontinuityLocations: 'the index of the detected discontinuities (if any)', discontinuityAmplitudes: 'the peak values of the prediction error for the discontinuities (if any)'}

Details

Dissonance( frequencies, magnitudes ) → {object}

Description

This algorithm computes the sensory dissonance of an audio signal given its spectral peaks. Sensory dissonance (to be distinguished from musical or theoretical dissonance) measures perceptual roughness of the sound and is based on the roughness of its spectral peaks. Given the spectral peaks, the algorithm estimates total dissonance by summing up the normalized dissonance values for each pair of peaks. These values are computed using dissonance curves, which define dissonace between two spectral peaks according to their frequency and amplitude relations. The dissonance curves are based on perceptual experiments conducted in [1]. Exceptions are thrown when the size of the input vectors are not equal or if input frequencies are not ordered ascendantly References: [1] R. Plomp and W. J. M. Levelt, "Tonal Consonance and Critical Bandwidth," The Journal of the Acoustical Society of America, vol. 38, no. 4, pp. 548–560, 1965. Check https://essentia.upf.edu/reference/std_Dissonance.html for more details.

Parameters
Name Type Description
frequencies VectorFloat

the frequencies of the spectral peaks (must be sorted by frequency)

magnitudes VectorFloat

the magnitudes of the spectral peaks (must be sorted by frequency

Returns

{dissonance: 'the dissonance of the audio signal (0 meaning completely consonant, and 1 meaning completely dissonant)'}

Details

DistributionShape( centralMoments ) → {object}

Description

This algorithm computes the spread (variance), skewness and kurtosis of an array given its central moments. The extracted features are good indicators of the shape of the distribution. For the required input see CentralMoments algorithm. The size of the input array must be at least 5. An exception will be thrown otherwise. Check https://essentia.upf.edu/reference/std_DistributionShape.html for more details.

Parameters
Name Type Description
centralMoments VectorFloat

the central moments of a distribution

Returns

{spread: 'the spread (variance) of the distribution', skewness: 'the skewness of the distribution', kurtosis: 'the kurtosis of the distribution'}

Details

Duration( signal [, sampleRate ] ) → {object}

Description

This algorithm outputs the total duration of an audio signal. Check https://essentia.upf.edu/reference/std_Duration.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input signal

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

Returns

{duration: 'the duration of the signal [s]'}

Details

DynamicComplexity( signal [, frameSize [, sampleRate ] ] ) → {object}

Description

This algorithm computes the dynamic complexity defined as the average absolute deviation from the global loudness level estimate on the dB scale. It is related to the dynamic range and to the amount of fluctuation in loudness present in a recording. Silence at the beginning and at the end of a track are ignored in the computation in order not to deteriorate the results. Check https://essentia.upf.edu/reference/std_DynamicComplexity.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input audio signal

frameSize number <optional>
0.2

the frame size [s]

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

Returns

{dynamicComplexity: 'the dynamic complexity coefficient', loudness: 'an estimate of the loudness [dB]'}

Details

ERBBands( spectrum [, highFrequencyBound [, inputSize [, lowFrequencyBound [, numberBands [, sampleRate [, type [, width ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes energies/magnitudes in ERB bands of a spectrum. The Equivalent Rectangular Bandwidth (ERB) scale is used. The algorithm applies a frequency domain filterbank using gammatone filters. Adapted from matlab code in: D. P. W. Ellis (2009). 'Gammatone-like spectrograms', web resource [1]. Check https://essentia.upf.edu/reference/std_ERBBands.html for more details.

Parameters
Name Type Attributes Default Description
spectrum VectorFloat

the audio spectrum

highFrequencyBound number <optional>
22050

an upper-bound limit for the frequencies to be included in the bands

inputSize number <optional>
1025

the size of the spectrum

lowFrequencyBound number <optional>
50

a lower-bound limit for the frequencies to be included in the bands

numberBands number <optional>
40

the number of output bands

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

type string <optional>
power

use magnitude or power spectrum

width number <optional>
1

filter width with respect to ERB

Returns

{bands: 'the energies/magnitudes of each band'}

Details

EffectiveDuration( signal [, sampleRate [, thresholdRatio ] ] ) → {object}

Description

This algorithm computes the effective duration of an envelope signal. The effective duration is a measure of the time the signal is perceptually meaningful. This is approximated by the time the envelope is above or equal to a given threshold and is above the -90db noise floor. This measure allows to distinguish percussive sounds from sustained sounds but depends on the signal length. By default, this algorithm uses 40% of the envelope maximum as the threshold which is suited for short sounds. Note, that the 0% thresold corresponds to the duration of signal above -90db noise floor, while the 100% thresold corresponds to the number of times the envelope takes its maximum value. References: [1] G. Peeters, "A large set of audio features for sound description (similarity and classification) in the CUIDADO project," CUIDADO I.S.T. Project Report, 2004 Check https://essentia.upf.edu/reference/std_EffectiveDuration.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input signal

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

thresholdRatio number <optional>
0.4

the ratio of the envelope maximum to be used as the threshold

Returns

{effectiveDuration: 'the effective duration of the signal [s]'}

Details

Energy( array ) → {object}

Description

This algorithm computes the energy of an array. Check https://essentia.upf.edu/reference/std_Energy.html for more details.

Parameters
Name Type Description
array VectorFloat

the input array

Returns

{energy: 'the energy of the input array'}

Details

EnergyBand( spectrum [, sampleRate [, startCutoffFrequency [, stopCutoffFrequency ] ] ] ) → {object}

Description

This algorithm computes energy in a given frequency band of a spectrum including both start and stop cutoff frequencies. Note that exceptions will be thrown when input spectrum is empty and if startCutoffFrequency is greater than stopCutoffFrequency. Check https://essentia.upf.edu/reference/std_EnergyBand.html for more details.

Parameters
Name Type Attributes Default Description
spectrum VectorFloat

the input frequency spectrum

sampleRate number <optional>
44100

the audio sampling rate [Hz]

startCutoffFrequency number <optional>
0

the start frequency from which to sum the energy [Hz]

stopCutoffFrequency number <optional>
100

the stop frequency to which to sum the energy [Hz]

Returns

{energyBand: 'the energy in the frequency band'}

Details

EnergyBandRatio( spectrum [, sampleRate [, startFrequency [, stopFrequency ] ] ] ) → {object}

Description

This algorithm computes the ratio of the spectral energy in the range [startFrequency, stopFrequency] over the total energy. Check https://essentia.upf.edu/reference/std_EnergyBandRatio.html for more details.

Parameters
Name Type Attributes Default Description
spectrum VectorFloat

the input audio spectrum

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

startFrequency number <optional>
0

the frequency from which to start summing the energy [Hz]

stopFrequency number <optional>
100

the frequency up to which to sum the energy [Hz]

Returns

{energyBandRatio: 'the energy ratio of the specified band over the total energy'}

Details

Entropy( array ) → {object}

Description

This algorithm computes the Shannon entropy of an array. Entropy can be used to quantify the peakiness of a distribution. This has been used for voiced/unvoiced decision in automatic speech recognition. Check https://essentia.upf.edu/reference/std_Entropy.html for more details.

Parameters
Name Type Description
array VectorFloat

the input array (cannot contain negative values, and must be non-empty)

Returns

{entropy: 'the entropy of the input array'}

Details

Envelope( signal [, applyRectification [, attackTime [, releaseTime [, sampleRate ] ] ] ] ) → {object}

Description

This algorithm computes the envelope of a signal by applying a non-symmetric lowpass filter on a signal. By default it rectifies the signal, but that is optional. Check https://essentia.upf.edu/reference/std_Envelope.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input signal

applyRectification boolean <optional>
true

whether to apply rectification (envelope based on the absolute value of signal)

attackTime number <optional>
10

the attack time of the first order lowpass in the attack phase [ms]

releaseTime number <optional>
1500

the release time of the first order lowpass in the release phase [ms]

sampleRate number <optional>
44100

the audio sampling rate [Hz]

Returns

{signal: 'the resulting envelope of the signal'}

Details

EqualLoudness( signal [, sampleRate ] ) → {object}

Description

This algorithm implements an equal-loudness filter. The human ear does not perceive sounds of all frequencies as having equal loudness, and to account for this, the signal is filtered by an inverted approximation of the equal-loudness curves. Technically, the filter is a cascade of a 10th order Yulewalk filter with a 2nd order Butterworth high pass filter. Check https://essentia.upf.edu/reference/std_EqualLoudness.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input signal

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

Returns

{signal: 'the filtered signal'}

Details

Flatness( array ) → {object}

Description

This algorithm computes the flatness of an array, which is defined as the ratio between the geometric mean and the arithmetic mean. Check https://essentia.upf.edu/reference/std_Flatness.html for more details.

Parameters
Name Type Description
array VectorFloat

the input array

Returns

{flatness: 'the flatness (ratio between the geometric and the arithmetic mean of the input array)'}

Details

FlatnessDB( array ) → {object}

Description

This algorithm computes the flatness of an array, which is defined as the ratio between the geometric mean and the arithmetic mean converted to dB scale. Check https://essentia.upf.edu/reference/std_FlatnessDB.html for more details.

Parameters
Name Type Description
array VectorFloat

the input array

Returns

{flatnessDB: 'the flatness dB'}

Details

FlatnessSFX( envelope ) → {object}

Description

This algorithm calculates the flatness coefficient of a signal envelope. Check https://essentia.upf.edu/reference/std_FlatnessSFX.html for more details.

Parameters
Name Type Description
envelope VectorFloat

the envelope of the signal

Returns

{flatness: 'the flatness coefficient'}

Details

Flux( spectrum [, halfRectify [, norm ] ] ) → {object}

Description

This algorithm computes the spectral flux of a spectrum. Flux is defined as the L2-norm [1] or L1-norm [2] of the difference between two consecutive frames of the magnitude spectrum. The frames have to be of the same size in order to yield a meaningful result. The default L2-norm is used more commonly. Check https://essentia.upf.edu/reference/std_Flux.html for more details.

Parameters
Name Type Attributes Default Description
spectrum VectorFloat

the input spectrum

halfRectify boolean <optional>
false

half-rectify the differences in each spectrum bin

norm string <optional>
L2

the norm to use for difference computation

Returns

{flux: 'the spectral flux of the input spectrum'}

Details

FrameCutter( signal [, frameSize [, hopSize [, lastFrameToEndOfFile [, startFromZero [, validFrameThresholdRatio ] ] ] ] ] ) → {object}

Description

This algorithm slices the input buffer into frames. It returns a frame of a constant size and jumps a constant amount of samples forward in the buffer on every compute() call until no more frames can be extracted; empty frame vectors are returned afterwards. Incomplete frames (frames starting before the beginning of the input buffer or going past its end) are zero-padded or dropped according to the "validFrameThresholdRatio" parameter. Check https://essentia.upf.edu/reference/std_FrameCutter.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the buffer from which to read data

frameSize number <optional>
1024

the output frame size

hopSize number <optional>
512

the hop size between frames

lastFrameToEndOfFile boolean <optional>
false

whether the beginning of the last frame should reach the end of file. Only applicable if startFromZero is true

startFromZero boolean <optional>
false

whether to start the first frame at time 0 (centered at frameSize/2) if true, or -frameSize/2 otherwise (zero-centered)

validFrameThresholdRatio number <optional>
0

frames smaller than this ratio will be discarded, those larger will be zero-padded to a full frame (i.e. a value of 0 will never discard frames and a value of 1 will only keep frames that are of length 'frameSize')

Returns

{frame: 'the frame to write to'}

Details

FrameToReal( signal [, frameSize [, hopSize ] ] ) → {object}

Description

This algorithm converts a sequence of input audio signal frames into a sequence of audio samples. Check https://essentia.upf.edu/reference/std_FrameToReal.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input audio frame

frameSize number <optional>
2048

the frame size for computing the overlap-add process

hopSize number <optional>
128

the hop size with which the overlap-add function is computed

Returns

{signal: 'the output audio samples'}

Details

FrequencyBands( spectrum [, frequencyBands [, sampleRate ] ] ) → {object}

Description

This algorithm computes energy in rectangular frequency bands of a spectrum. The bands are non-overlapping. For each band the power-spectrum (mag-squared) is summed. Check https://essentia.upf.edu/reference/std_FrequencyBands.html for more details.

Parameters
Name Type Attributes Default Description
spectrum VectorFloat

the input spectrum (must be greater than size one)

frequencyBands Array.<any> <optional>
[0, 50, 100, 150, 200, 300, 400, 510, 630, 770, 920, 1080, 1270, 1480, 1720, 2000, 2320, 2700, 3150, 3700, 4400, 5300, 6400, 7700, 9500, 12000, 15500, 20500, 27000]

list of frequency ranges in to which the spectrum is divided (these must be in ascending order and connot contain duplicates)

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

Returns

{bands: 'the energy in each band'}

Details

GFCC( spectrum [, dctType [, highFrequencyBound [, inputSize [, logType [, lowFrequencyBound [, numberBands [, numberCoefficients [, sampleRate [, silenceThreshold [, type ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes the Gammatone-frequency cepstral coefficients of a spectrum. This is an equivalent of MFCCs, but using a gammatone filterbank (ERBBands) scaled on an Equivalent Rectangular Bandwidth (ERB) scale. Check https://essentia.upf.edu/reference/std_GFCC.html for more details.

Parameters
Name Type Attributes Default Description
spectrum VectorFloat

the audio spectrum

dctType number <optional>
2

the DCT type

highFrequencyBound number <optional>
22050

the upper bound of the frequency range [Hz]

inputSize number <optional>
1025

the size of input spectrum

logType string <optional>
dbamp

logarithmic compression type. Use 'dbpow' if working with power and 'dbamp' if working with magnitudes

lowFrequencyBound number <optional>
40

the lower bound of the frequency range [Hz]

numberBands number <optional>
40

the number of bands in the filter

numberCoefficients number <optional>
13

the number of output cepstrum coefficients

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

silenceThreshold number <optional>
1e-10

silence threshold for computing log-energy bands

type string <optional>
power

use magnitude or power spectrum

Returns

{bands: 'the energies in ERB bands', gfcc: 'the gammatone feature cepstrum coefficients'}

Details

GapsDetector( frame [, attackTime [, frameSize [, hopSize [, kernelSize [, maximumTime [, minimumTime [, postpowerTime [, prepowerThreshold [, prepowerTime [, releaseTime [, sampleRate [, silenceThreshold ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm uses energy and time thresholds to detect gaps in the waveform. A median filter is used to remove spurious silent samples. The power of a small audio region before the detected gaps (prepower) is thresholded to detect intentional pauses as described in [1]. This technique isextended to the region after the gap. The algorithm was designed for a framewise use and returns the start and end timestamps related to the first frame processed. Call configure() or reset() in order to restart the count. Check https://essentia.upf.edu/reference/std_GapsDetector.html for more details.

Parameters
Name Type Attributes Default Description
frame VectorFloat

the input frame (must be non-empty)

attackTime number <optional>
0.05

the attack time of the first order lowpass in the attack phase [ms]

frameSize number <optional>
2048

frame size used for the analysis. Should match the input frame size. Otherwise, an exception will be thrown

hopSize number <optional>
1024

hop size used for the analysis

kernelSize number <optional>
11

scalar giving the size of the median filter window. Must be odd

maximumTime number <optional>
3500

time of the maximum gap duration [ms]

minimumTime number <optional>
10

time of the minimum gap duration [ms]

postpowerTime number <optional>
40

time for the postpower calculation [ms]

prepowerThreshold number <optional>
-30

prepower threshold [dB].

prepowerTime number <optional>
40

time for the prepower calculation [ms]

releaseTime number <optional>
0.05

the release time of the first order lowpass in the release phase [ms]

sampleRate number <optional>
44100

sample rate used for the analysis

silenceThreshold number <optional>
-50

silence threshold [dB]

Returns

{starts: 'the start indexes of the detected gaps (if any) in seconds', ends: 'the end indexes of the detected gaps (if any) in seconds'}

Details

GeometricMean( array ) → {object}

Description

This algorithm computes the geometric mean of an array of positive values. Check https://essentia.upf.edu/reference/std_GeometricMean.html for more details.

Parameters
Name Type Description
array VectorFloat

the input array

Returns

{geometricMean: 'the geometric mean of the input array'}

Details

HFC( spectrum [, sampleRate [, type ] ] ) → {object}

Description

This algorithm computes the High Frequency Content of a spectrum. It can be computed according to the following techniques: - 'Masri' (default) which does: sum |X(n)|^2*k, - 'Jensen' which does: sum |X(n)|*k^2 - 'Brossier' which does: sum |X(n)|*k Check https://essentia.upf.edu/reference/std_HFC.html for more details.

Parameters
Name Type Attributes Default Description
spectrum VectorFloat

the input audio spectrum

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

type string <optional>
Masri

the type of HFC coefficient to be computed

Returns

{hfc: 'the high-frequency coefficient'}

Details

HPCP( frequencies, magnitudes [, bandPreset [, bandSplitFrequency [, harmonics [, maxFrequency [, maxShifted [, minFrequency [, nonLinear [, normalized [, referenceFrequency [, sampleRate [, size [, weightType [, windowSize ] ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

Computes a Harmonic Pitch Class Profile (HPCP) from the spectral peaks of a signal. HPCP is a k*12 dimensional vector which represents the intensities of the twelve (k==1) semitone pitch classes (corresponsing to notes from A to G#), or subdivisions of these (k>1). Check https://essentia.upf.edu/reference/std_HPCP.html for more details.

Parameters
Name Type Attributes Default Description
frequencies VectorFloat

the frequencies of the spectral peaks [Hz]

magnitudes VectorFloat

the magnitudes of the spectral peaks

bandPreset boolean <optional>
true

enables whether to use a band preset

bandSplitFrequency number <optional>
500

the split frequency for low and high bands, not used if bandPreset is false [Hz]

harmonics number <optional>
0

number of harmonics for frequency contribution, 0 indicates exclusive fundamental frequency contribution

maxFrequency number <optional>
5000

the maximum frequency that contributes to the HPCP [Hz] (the difference between the max and split frequencies must not be less than 200.0 Hz)

maxShifted boolean <optional>
false

whether to shift the HPCP vector so that the maximum peak is at index 0

minFrequency number <optional>
40

the minimum frequency that contributes to the HPCP [Hz] (the difference between the min and split frequencies must not be less than 200.0 Hz)

nonLinear boolean <optional>
false

apply non-linear post-processing to the output (use with normalized='unitMax'). Boosts values close to 1, decreases values close to 0.

normalized string <optional>
unitMax

whether to normalize the HPCP vector

referenceFrequency number <optional>
440

the reference frequency for semitone index calculation, corresponding to A3 [Hz]

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

size number <optional>
12

the size of the output HPCP (must be a positive nonzero multiple of 12)

weightType string <optional>
squaredCosine

type of weighting function for determining frequency contribution

windowSize number <optional>
1

the size, in semitones, of the window used for the weighting

Returns

{hpcp: 'the resulting harmonic pitch class profile'}

Details

HarmonicBpm( bpms [, bpm [, threshold [, tolerance ] ] ] ) → {object}

Description

This algorithm extracts bpms that are harmonically related to the tempo given by the 'bpm' parameter. The algorithm assumes a certain bpm is harmonically related to parameter bpm, when the greatest common divisor between both bpms is greater than threshold. The 'tolerance' parameter is needed in order to consider if two bpms are related. For instance, 120, 122 and 236 may be related or not depending on how much tolerance is given Check https://essentia.upf.edu/reference/std_HarmonicBpm.html for more details.

Parameters
Name Type Attributes Default Description
bpms VectorFloat

list of bpm candidates

bpm number <optional>
60

the bpm used to find its harmonics

threshold number <optional>
20

bpm threshold below which greatest common divisors are discarded

tolerance number <optional>
5

percentage tolerance to consider two bpms are equal or equal to a harmonic

Returns

{harmonicBpms: 'a list of bpms which are harmonically related to the bpm parameter '}

Details

HarmonicPeaks( frequencies, magnitudes, pitch [, maxHarmonics [, tolerance ] ] ) → {object}

Description

This algorithm finds the harmonic peaks of a signal given its spectral peaks and its fundamental frequency. Note: - "tolerance" parameter defines the allowed fixed deviation from ideal harmonics, being a percentage over the F0. For example: if the F0 is 100Hz you may decide to allow a deviation of 20%, that is a fixed deviation of 20Hz; for the harmonic series it is: [180-220], [280-320], [380-420], etc. - If "pitch" is zero, it means its value is unknown, or the sound is unpitched, and in that case the HarmonicPeaks algorithm returns an empty vector. - The output frequency and magnitude vectors are of size "maxHarmonics". If a particular harmonic was not found among spectral peaks, its ideal frequency value is output together with 0 magnitude. This algorithm is intended to receive its "frequencies" and "magnitudes" inputs from the SpectralPeaks algorithm. - When input vectors differ in size or are empty, an exception is thrown. Input vectors must be ordered by ascending frequency excluding DC components and not contain duplicates, otherwise an exception is thrown. Check https://essentia.upf.edu/reference/std_HarmonicPeaks.html for more details.

Parameters
Name Type Attributes Default Description
frequencies VectorFloat

the frequencies of the spectral peaks [Hz] (ascending order)

magnitudes VectorFloat

the magnitudes of the spectral peaks (ascending frequency order)

pitch number

an estimate of the fundamental frequency of the signal [Hz]

maxHarmonics number <optional>
20

the number of harmonics to return including F0

tolerance number <optional>
0.2

the allowed ratio deviation from ideal harmonics

Returns

{harmonicFrequencies: 'the frequencies of harmonic peaks [Hz]', harmonicMagnitudes: 'the magnitudes of harmonic peaks'}

Details

HighPass( signal [, cutoffFrequency [, sampleRate ] ] ) → {object}

Description

This algorithm implements a 1st order IIR high-pass filter. Because of its dependence on IIR, IIR's requirements are inherited. Check https://essentia.upf.edu/reference/std_HighPass.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input audio signal

cutoffFrequency number <optional>
1500

the cutoff frequency for the filter [Hz]

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

Returns

{signal: 'the filtered signal'}

Details

HighResolutionFeatures( hpcp [, maxPeaks ] ) → {object}

Description

This algorithm computes high-resolution chroma features from an HPCP vector. The vector's size must be a multiple of 12 and it is recommended that it be larger than 120. In otherwords, the HPCP's resolution should be 10 Cents or more. The high-resolution features being computed are: Check https://essentia.upf.edu/reference/std_HighResolutionFeatures.html for more details.

Parameters
Name Type Attributes Default Description
hpcp VectorFloat

the HPCPs, preferably of size >= 120

maxPeaks number <optional>
24

maximum number of HPCP peaks to consider when calculating outputs

Returns

{equalTemperedDeviation: 'measure of the deviation of HPCP local maxima with respect to equal-tempered bins', nonTemperedEnergyRatio: 'ratio between the energy on non-tempered bins and the total energy', nonTemperedPeaksEnergyRatio: 'ratio between the energy on non-tempered peaks and the total energy'}

Details

Histogram( array [, maxValue [, minValue [, normalize [, numberBins ] ] ] ] ) → {object}

Description

This algorithm computes a histogram. Values outside the range are ignored Check https://essentia.upf.edu/reference/std_Histogram.html for more details.

Parameters
Name Type Attributes Default Description
array VectorFloat

the input array

maxValue number <optional>
1

the max value of the histogram

minValue number <optional>
0

the min value of the histogram

normalize string <optional>
none

the normalization setting.

numberBins number <optional>
10

the number of bins

Returns

{histogram: 'the values in the equally-spaced bins', binEdges: 'the edges of the equally-spaced bins. Size is _histogram.size() + 1'}

Details

HprModelAnal( frame, pitch [, fftSize [, freqDevOffset [, freqDevSlope [, harmDevSlope [, hopSize [, magnitudeThreshold [, maxFrequency [, maxPeaks [, maxnSines [, minFrequency [, nHarmonics [, orderBy [, sampleRate [, stocf ] ] ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes the harmonic plus residual model analysis. Check https://essentia.upf.edu/reference/std_HprModelAnal.html for more details.

Parameters
Name Type Attributes Default Description
frame VectorFloat

the input frame

pitch number

external pitch input [Hz].

fftSize number <optional>
2048

the size of the internal FFT size (full spectrum size)

freqDevOffset number <optional>
20

minimum frequency deviation at 0Hz

freqDevSlope number <optional>
0.01

slope increase of minimum frequency deviation

harmDevSlope number <optional>
0.01

slope increase of minimum frequency deviation

hopSize number <optional>
512

the hop size between frames

magnitudeThreshold number <optional>
0

peaks below this given threshold are not outputted

maxFrequency number <optional>
5000

the maximum frequency of the range to evaluate [Hz]

maxPeaks number <optional>
100

the maximum number of returned peaks

maxnSines number <optional>
100

maximum number of sines per frame

minFrequency number <optional>
20

the minimum frequency of the range to evaluate [Hz]

nHarmonics number <optional>
100

maximum number of harmonics per frame

orderBy string <optional>
frequency

the ordering type of the outputted peaks (ascending by frequency or descending by magnitude)

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

stocf number <optional>
0.2

decimation factor used for the stochastic approximation

Returns

{frequencies: 'the frequencies of the sinusoidal peaks [Hz]', magnitudes: 'the magnitudes of the sinusoidal peaks', phases: 'the phases of the sinusoidal peaks', res: 'output residual frame'}

Details

HpsModelAnal( frame, pitch [, fftSize [, freqDevOffset [, freqDevSlope [, harmDevSlope [, hopSize [, magnitudeThreshold [, maxFrequency [, maxPeaks [, maxnSines [, minFrequency [, nHarmonics [, orderBy [, sampleRate [, stocf ] ] ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes the harmonic plus stochastic model analysis. Check https://essentia.upf.edu/reference/std_HpsModelAnal.html for more details.

Parameters
Name Type Attributes Default Description
frame VectorFloat

the input frame

pitch number

external pitch input [Hz].

fftSize number <optional>
2048

the size of the internal FFT size (full spectrum size)

freqDevOffset number <optional>
20

minimum frequency deviation at 0Hz

freqDevSlope number <optional>
0.01

slope increase of minimum frequency deviation

harmDevSlope number <optional>
0.01

slope increase of minimum frequency deviation

hopSize number <optional>
512

the hop size between frames

magnitudeThreshold number <optional>
0

peaks below this given threshold are not outputted

maxFrequency number <optional>
5000

the maximum frequency of the range to evaluate [Hz]

maxPeaks number <optional>
100

the maximum number of returned peaks

maxnSines number <optional>
100

maximum number of sines per frame

minFrequency number <optional>
20

the minimum frequency of the range to evaluate [Hz]

nHarmonics number <optional>
100

maximum number of harmonics per frame

orderBy string <optional>
frequency

the ordering type of the outputted peaks (ascending by frequency or descending by magnitude)

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

stocf number <optional>
0.2

decimation factor used for the stochastic approximation

Returns

{frequencies: 'the frequencies of the sinusoidal peaks [Hz]', magnitudes: 'the magnitudes of the sinusoidal peaks', phases: 'the phases of the sinusoidal peaks', stocenv: 'the stochastic envelope'}

Details

IDCT( dct [, dctType [, inputSize [, liftering [, outputSize ] ] ] ] ) → {object}

Description

This algorithm computes the Inverse Discrete Cosine Transform of an array. It can be configured to perform the inverse DCT-II form, with the 1/sqrt(2) scaling factor for the first coefficient or the inverse DCT-III form based on the HTK implementation. Check https://essentia.upf.edu/reference/std_IDCT.html for more details.

Parameters
Name Type Attributes Default Description
dct VectorFloat

the discrete cosine transform

dctType number <optional>
2

the DCT type

inputSize number <optional>
10

the size of the input array

liftering number <optional>
0

the liftering coefficient. Use '0' to bypass it

outputSize number <optional>
10

the number of output coefficients

Returns

{idct: 'the inverse cosine transform of the input array'}

Details

IIR( signal [, denominator [, numerator ] ] ) → {object}

Description

This algorithm implements a standard IIR filter. It filters the data in the input vector with the filter described by parameter vectors 'numerator' and 'denominator' to create the output filtered vector. In the litterature, the numerator is often referred to as the 'B' coefficients and the denominator as the 'A' coefficients. Check https://essentia.upf.edu/reference/std_IIR.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input signal

denominator Array.<any> <optional>
[1]

the list of coefficients of the denominator. Often referred to as the A coefficient vector.

numerator Array.<any> <optional>
[1]

the list of coefficients of the numerator. Often referred to as the B coefficient vector.

Returns

{signal: 'the filtered signal'}

Details

Inharmonicity( frequencies, magnitudes ) → {object}

Description

This algorithm calculates the inharmonicity of a signal given its spectral peaks. The inharmonicity value is computed as an energy weighted divergence of the spectral components from their closest multiple of the fundamental frequency. The fundamental frequency is taken as the first spectral peak from the input. The inharmonicity value ranges from 0 (purely harmonic signal) to 1 (inharmonic signal). Check https://essentia.upf.edu/reference/std_Inharmonicity.html for more details.

Parameters
Name Type Description
frequencies VectorFloat

the frequencies of the harmonic peaks [Hz] (in ascending order)

magnitudes VectorFloat

the magnitudes of the harmonic peaks (in frequency ascending order

Returns

{inharmonicity: 'the inharmonicity of the audio signal'}

Details

InstantPower( array ) → {object}

Description

This algorithm computes the instant power of an array. That is, the energy of the array over its size. Check https://essentia.upf.edu/reference/std_InstantPower.html for more details.

Parameters
Name Type Description
array VectorFloat

the input array

Returns

{power: 'the instant power of the input array'}

Details

Intensity( signal [, sampleRate ] ) → {object}

Description

This algorithm classifies the input audio signal as either relaxed (-1), moderate (0), or aggressive (1). Check https://essentia.upf.edu/reference/std_Intensity.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input audio signal

sampleRate number <optional>
44100

the input audio sampling rate [Hz]

Returns

{intensity: 'the intensity value'}

Details

Key( pcp [, numHarmonics [, pcpSize [, profileType [, slope [, useMajMin [, usePolyphony [, useThreeChords ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes key estimate given a pitch class profile (HPCP). The algorithm was severely adapted and changed from the original implementation for readability and speed. Check https://essentia.upf.edu/reference/std_Key.html for more details.

Parameters
Name Type Attributes Default Description
pcp VectorFloat

the input pitch class profile

numHarmonics number <optional>
4

number of harmonics that should contribute to the polyphonic profile (1 only considers the fundamental harmonic)

pcpSize number <optional>
36

number of array elements used to represent a semitone times 12 (this parameter is only a hint, during computation, the size of the input PCP is used instead)

profileType string <optional>
bgate

the type of polyphic profile to use for correlation calculation

slope number <optional>
0.6

value of the slope of the exponential harmonic contribution to the polyphonic profile

useMajMin boolean <optional>
false

use a third profile called 'majmin' for ambiguous tracks [4]. Only avalable for the edma, bgate and braw profiles

usePolyphony boolean <optional>
true

enables the use of polyphonic profiles to define key profiles (this includes the contributions from triads as well as pitch harmonics)

useThreeChords boolean <optional>
true

consider only the 3 main triad chords of the key (T, D, SD) to build the polyphonic profiles

Returns

{key: 'the estimated key, from A to G', scale: 'the scale of the key (major or minor)', strength: 'the strength of the estimated key', firstToSecondRelativeStrength: 'the relative strength difference between the best estimate and second best estimate of the key'}

Details

KeyExtractor( audio [, averageDetuningCorrection [, frameSize [, hopSize [, hpcpSize [, maxFrequency [, maximumSpectralPeaks [, minFrequency [, pcpThreshold [, profileType [, sampleRate [, spectralPeaksThreshold [, tuningFrequency [, weightType [, windowType ] ] ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm extracts key/scale for an audio signal. It computes HPCP frames for the input signal and applies key estimation using the Key algorithm. Check https://essentia.upf.edu/reference/std_KeyExtractor.html for more details.

Parameters
Name Type Attributes Default Description
audio VectorFloat

the audio input signal

averageDetuningCorrection boolean <optional>
true

shifts a pcp to the nearest tempered bin

frameSize number <optional>
4096

the framesize for computing tonal features

hopSize number <optional>
4096

the hopsize for computing tonal features

hpcpSize number <optional>
12

the size of the output HPCP (must be a positive nonzero multiple of 12)

maxFrequency number <optional>
3500

max frequency to apply whitening to [Hz]

maximumSpectralPeaks number <optional>
60

the maximum number of spectral peaks

minFrequency number <optional>
25

min frequency to apply whitening to [Hz]

pcpThreshold number <optional>
0.2

pcp bins below this value are set to 0

profileType string <optional>
bgate

the type of polyphic profile to use for correlation calculation

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

spectralPeaksThreshold number <optional>
0.0001

the threshold for the spectral peaks

tuningFrequency number <optional>
440

the tuning frequency of the input signal

weightType string <optional>
cosine

type of weighting function for determining frequency contribution

windowType string <optional>
hann

the window type, which can be 'hamming', 'hann', 'triangular', 'square' or 'blackmanharrisXX'

Returns

{key: 'See Key algorithm documentation', scale: 'See Key algorithm documentation', strength: 'See Key algorithm documentation'}

Details

LPC( frame [, order [, sampleRate [, type ] ] ] ) → {object}

Description

This algorithm computes Linear Predictive Coefficients and associated reflection coefficients of a signal. Check https://essentia.upf.edu/reference/std_LPC.html for more details.

Parameters
Name Type Attributes Default Description
frame VectorFloat

the input audio frame

order number <optional>
10

the order of the LPC analysis (typically [8,14])

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

type string <optional>
regular

the type of LPC (regular or warped)

Returns

{lpc: 'the LPC coefficients', reflection: 'the reflection coefficients'}

Details

Larm( signal [, attackTime [, power [, releaseTime [, sampleRate ] ] ] ] ) → {object}

Description

This algorithm estimates the long-term loudness of an audio signal. The LARM model is based on the asymmetrical low-pass filtering of the Peak Program Meter (PPM), combined with Revised Low-frequency B-weighting (RLB) and power mean calculations. LARM has shown to be a reliable and objective loudness estimate of music and speech. Check https://essentia.upf.edu/reference/std_Larm.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the audio input signal

attackTime number <optional>
10

the attack time of the first order lowpass in the attack phase [ms]

power number <optional>
1.5

the power used for averaging

releaseTime number <optional>
1500

the release time of the first order lowpass in the release phase [ms]

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

Returns

{larm: 'the LARM loudness estimate [dB]'}

Details

Leq( signal ) → {object}

Description

This algorithm computes the Equivalent sound level (Leq) of an audio signal. The Leq measure can be derived from the Revised Low-frequency B-weighting (RLB) or from the raw signal as described in [1]. If the signal contains no energy, Leq defaults to essentias definition of silence which is -90dB. This algorithm will throw an exception on empty input. Check https://essentia.upf.edu/reference/std_Leq.html for more details.

Parameters
Name Type Description
signal VectorFloat

the input signal (must be non-empty)

Returns

{leq: 'the equivalent sound level estimate [dB]'}

Details

LevelExtractor( signal [, frameSize [, hopSize ] ] ) → {object}

Description

This algorithm extracts the loudness of an audio signal in frames using Loudness algorithm. Check https://essentia.upf.edu/reference/std_LevelExtractor.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the audio input signal

frameSize number <optional>
88200

frame size to compute loudness

hopSize number <optional>
44100

hop size to compute loudness

Returns

{loudness: 'the loudness values'}

Details

LogAttackTime( signal [, sampleRate [, startAttackThreshold [, stopAttackThreshold ] ] ] ) → {object}

Description

This algorithm computes the log (base 10) of the attack time of a signal envelope. The attack time is defined as the time duration from when the sound becomes perceptually audible to when it reaches its maximum intensity. By default, the start of the attack is estimated as the point where the signal envelope reaches 20% of its maximum value in order to account for possible noise presence. Also by default, the end of the attack is estimated as as the point where the signal envelope has reached 90% of its maximum value, in order to account for the possibility that the max value occurres after the logAttack, as in trumpet sounds. Check https://essentia.upf.edu/reference/std_LogAttackTime.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input signal envelope (must be non-empty)

sampleRate number <optional>
44100

the audio sampling rate [Hz]

startAttackThreshold number <optional>
0.2

the percentage of the input signal envelope at which the starting point of the attack is considered

stopAttackThreshold number <optional>
0.9

the percentage of the input signal envelope at which the ending point of the attack is considered

Returns

{logAttackTime: 'the log (base 10) of the attack time [log10(s)]', attackStart: 'the attack start time [s]', attackStop: 'the attack end time [s]'}

Details

LogSpectrum( spectrum [, binsPerSemitone [, frameSize [, rollOn [, sampleRate ] ] ] ] ) → {object}

Description

This algorithm computes spectrum with logarithmically distributed frequency bins. This code is ported from NNLS Chroma [1, 2].This algorithm also returns a local tuning that is retrieved for input frame and a global tuning that is updated with a moving average. Check https://essentia.upf.edu/reference/std_LogSpectrum.html for more details.

Parameters
Name Type Attributes Default Description
spectrum VectorFloat

spectrum frame

binsPerSemitone number <optional>
3

bins per semitone

frameSize number <optional>
1025

the input frame size of the spectrum vector

rollOn number <optional>
0

this removes low-frequency noise - useful in quiet recordings

sampleRate number <optional>
44100

the input sample rate

Returns

{logFreqSpectrum: 'log frequency spectrum frame', meanTuning: 'normalized mean tuning frequency', localTuning: 'normalized local tuning frequency'}

Details

LoopBpmConfidence( signal, bpmEstimate [, sampleRate ] ) → {object}

Description

This algorithm takes an audio signal and a BPM estimate for that signal and predicts the reliability of the BPM estimate in a value from 0 to 1. The audio signal is assumed to be a musical loop with constant tempo. The confidence returned is based on comparing the duration of the signal with multiples of the BPM estimate (see [1] for more details). Check https://essentia.upf.edu/reference/std_LoopBpmConfidence.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

loop audio signal

bpmEstimate number

estimated BPM for the audio signal

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

Returns

{confidence: 'confidence value for the BPM estimation'}

Details

LoopBpmEstimator( signal [, confidenceThreshold ] ) → {object}

Description

This algorithm estimates the BPM of audio loops. It internally uses PercivalBpmEstimator algorithm to produce a BPM estimate and LoopBpmConfidence to asses the reliability of the estimate. If the provided estimate is below the given confidenceThreshold, the algorithm outputs a BPM 0.0, otherwise it outputs the estimated BPM. For more details on the BPM estimation method and the confidence measure please check the used algorithms. Check https://essentia.upf.edu/reference/std_LoopBpmEstimator.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input signal

confidenceThreshold number <optional>
0.95

confidence threshold below which bpm estimate will be considered unreliable

Returns

{bpm: 'the estimated bpm (will be 0 if unsure)'}

Details

Loudness( signal ) → {object}

Description

This algorithm computes the loudness of an audio signal defined by Steven's power law. It computes loudness as the energy of the signal raised to the power of 0.67. Check https://essentia.upf.edu/reference/std_Loudness.html for more details.

Parameters
Name Type Description
signal VectorFloat

the input signal

Returns

{loudness: 'the loudness of the input signal'}

Details

LoudnessVickers( signal [, sampleRate ] ) → {object}

Description

This algorithm computes Vickers's loudness of an audio signal. Currently, this algorithm only works for signals with a 44100Hz sampling rate. This algorithm is meant to be given frames of audio as input (not entire audio signals). The algorithm described in the paper performs a weighted average of the loudness value computed for each of the given frames, this step is left as a post processing step and is not performed by this algorithm. Check https://essentia.upf.edu/reference/std_LoudnessVickers.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input signal

sampleRate number <optional>
44100

the audio sampling rate of the input signal which is used to create the weight vector [Hz] (currently, this algorithm only works on signals with a sampling rate of 44100Hz)

Returns

{loudness: 'the Vickers loudness [dB]'}

Details

LowLevelSpectralEqloudExtractor( signal [, frameSize [, hopSize [, sampleRate ] ] ] ) → {object}

Description

This algorithm extracts a set of level spectral features for which it is recommended to apply a preliminary equal-loudness filter over an input audio signal (according to the internal evaluations conducted at Music Technology Group). To this end, you are expected to provide the output of EqualLoudness algorithm as an input for this algorithm. Still, you are free to provide an unprocessed audio input in the case you want to compute these features without equal-loudness filter. Check https://essentia.upf.edu/reference/std_LowLevelSpectralEqloudExtractor.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input audio signal

frameSize number <optional>
2048

the frame size for computing low level features

hopSize number <optional>
1024

the hop size for computing low level features

sampleRate number <optional>
44100

the audio sampling rate

Returns

{dissonance: 'See Dissonance algorithm documentation', sccoeffs: 'See SpectralContrast algorithm documentation', scvalleys: 'See SpectralContrast algorithm documentation', spectral_centroid: 'See Centroid algorithm documentation', spectral_kurtosis: 'See DistributionShape algorithm documentation', spectral_skewness: 'See DistributionShape algorithm documentation', spectral_spread: 'See DistributionShape algorithm documentation'}

Details

LowLevelSpectralExtractor( signal [, frameSize [, hopSize [, sampleRate ] ] ] ) → {object}

Description

This algorithm extracts all low-level spectral features, which do not require an equal-loudness filter for their computation, from an audio signal Check https://essentia.upf.edu/reference/std_LowLevelSpectralExtractor.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the audio input signal

frameSize number <optional>
2048

the frame size for computing low level features

hopSize number <optional>
1024

the hop size for computing low level features

sampleRate number <optional>
44100

the audio sampling rate

Returns

{barkbands: 'spectral energy at each bark band. See BarkBands alogithm', barkbands_kurtosis: 'kurtosis from bark bands. See DistributionShape algorithm documentation', barkbands_skewness: 'skewness from bark bands. See DistributionShape algorithm documentation', barkbands_spread: 'spread from barkbands. See DistributionShape algorithm documentation', hfc: 'See HFC algorithm documentation', mfcc: 'See MFCC algorithm documentation', pitch: 'See PitchYinFFT algorithm documentation', pitch_instantaneous_confidence: 'See PitchYinFFT algorithm documentation', pitch_salience: 'See PitchSalience algorithm documentation', silence_rate_20dB: 'See SilenceRate algorithm documentation', silence_rate_30dB: 'See SilenceRate algorithm documentation', silence_rate_60dB: 'See SilenceRate algorithm documentation', spectral_complexity: 'See Spectral algorithm documentation', spectral_crest: 'See Crest algorithm documentation', spectral_decrease: 'See Decrease algorithm documentation', spectral_energy: 'See Energy algorithm documentation', spectral_energyband_low: 'Energy in band (20,150] Hz. See EnergyBand algorithm documentation', spectral_energyband_middle_low: 'Energy in band (150,800] Hz.See EnergyBand algorithm documentation', spectral_energyband_middle_high: 'Energy in band (800,4000] Hz. See EnergyBand algorithm documentation', spectral_energyband_high: 'Energy in band (4000,20000] Hz. See EnergyBand algorithm documentation', spectral_flatness_db: 'See flatnessDB algorithm documentation', spectral_flux: 'See Flux algorithm documentation', spectral_rms: 'See RMS algorithm documentation', spectral_rolloff: 'See RollOff algorithm documentation', spectral_strongpeak: 'See StrongPeak algorithm documentation', zerocrossingrate: 'See ZeroCrossingRate algorithm documentation', inharmonicity: 'See Inharmonicity algorithm documentation', tristimulus: 'See Tristimulus algorithm documentation', oddtoevenharmonicenergyratio: 'See OddToEvenHarmonicEnergyRatio algorithm documentation'}

Details

LowPass( signal [, cutoffFrequency [, sampleRate ] ] ) → {object}

Description

This algorithm implements a 1st order IIR low-pass filter. Because of its dependence on IIR, IIR's requirements are inherited. References: [1] U. Zölzer, DAFX - Digital Audio Effects, p. 40, John Wiley & Sons, 2002 Check https://essentia.upf.edu/reference/std_LowPass.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input audio signal

cutoffFrequency number <optional>
1500

the cutoff frequency for the filter [Hz]

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

Returns

{signal: 'the filtered signal'}

Details

MFCC( spectrum [, dctType [, highFrequencyBound [, inputSize [, liftering [, logType [, lowFrequencyBound [, normalize [, numberBands [, numberCoefficients [, sampleRate [, silenceThreshold [, type [, warpingFormula [, weighting ] ] ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes the mel-frequency cepstrum coefficients of a spectrum. As there is no standard implementation, the MFCC-FB40 is used by default: - filterbank of 40 bands from 0 to 11000Hz - take the log value of the spectrum energy in each mel band. Bands energy values below silence threshold will be clipped to its value before computing log-energies - DCT of the 40 bands down to 13 mel coefficients There is a paper describing various MFCC implementations [1]. Check https://essentia.upf.edu/reference/std_MFCC.html for more details.

Parameters
Name Type Attributes Default Description
spectrum VectorFloat

the audio spectrum

dctType number <optional>
2

the DCT type

highFrequencyBound number <optional>
11000

the upper bound of the frequency range [Hz]

inputSize number <optional>
1025

the size of input spectrum

liftering number <optional>
0

the liftering coefficient. Use '0' to bypass it

logType string <optional>
dbamp

logarithmic compression type. Use 'dbpow' if working with power and 'dbamp' if working with magnitudes

lowFrequencyBound number <optional>
0

the lower bound of the frequency range [Hz]

normalize string <optional>
unit_sum

spectrum bin weights to use for each mel band: 'unit_max' to make each mel band vertex equal to 1, 'unit_sum' to make each mel band area equal to 1 summing the actual weights of spectrum bins, 'unit_area' to make each triangle mel band area equal to 1 normalizing the weights of each triangle by its bandwidth

numberBands number <optional>
40

the number of mel-bands in the filter

numberCoefficients number <optional>
13

the number of output mel coefficients

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

silenceThreshold number <optional>
1e-10

silence threshold for computing log-energy bands

type string <optional>
power

use magnitude or power spectrum

warpingFormula string <optional>
htkMel

The scale implementation type: 'htkMel' scale from the HTK toolkit [2, 3] (default) or 'slaneyMel' scale from the Auditory toolbox [4]

weighting string <optional>
warping

type of weighting function for determining triangle area

Returns

{bands: 'the energies in mel bands', mfcc: 'the mel frequency cepstrum coefficients'}

Details

MaxFilter( signal [, causal [, width ] ] ) → {object}

Description

This algorithm implements a maximum filter for 1d signal using van Herk/Gil-Werman (HGW) algorithm. Check https://essentia.upf.edu/reference/std_MaxFilter.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

signal to be filtered

causal boolean <optional>
true

use casual filter (window is behind current element otherwise it is centered around)

width number <optional>
3

the window size, has to be odd if the window is centered

Returns

{signal: 'filtered output'}

Details

MaxMagFreq( spectrum [, sampleRate ] ) → {object}

Description

This algorithm computes the frequency with the largest magnitude in a spectrum. Note that a spectrum must contain at least two elements otherwise an exception is thrown Check https://essentia.upf.edu/reference/std_MaxMagFreq.html for more details.

Parameters
Name Type Attributes Default Description
spectrum VectorFloat

the input spectrum (must have more than 1 element)

sampleRate number <optional>
44100

the audio sampling rate [Hz]

Returns

{maxMagFreq: 'the frequency with the largest magnitude [Hz]'}

Details

MaxToTotal( envelope ) → {object}

Description

This algorithm computes the ratio between the index of the maximum value of the envelope of a signal and the total length of the envelope. This ratio shows how much the maximum amplitude is off-center. Its value is close to 0 if the maximum is close to the beginning (e.g. Decrescendo or Impulsive sounds), close to 0.5 if it is close to the middle (e.g. Delta sounds) and close to 1 if it is close to the end of the sound (e.g. Crescendo sounds). This algorithm is intended to be fed by the output of the Envelope algorithm Check https://essentia.upf.edu/reference/std_MaxToTotal.html for more details.

Parameters
Name Type Description
envelope VectorFloat

the envelope of the signal

Returns

{maxToTotal: 'the maximum amplitude position to total length ratio'}

Details

Mean( array ) → {object}

Description

This algorithm computes the mean of an array. Check https://essentia.upf.edu/reference/std_Mean.html for more details.

Parameters
Name Type Description
array VectorFloat

the input array

Returns

{mean: 'the mean of the input array'}

Details

Median( array ) → {object}

Description

This algorithm computes the median of an array. When there is an odd number of numbers, the median is simply the middle number. For example, the median of 2, 4, and 7 is 4. When there is an even number of numbers, the median is the mean of the two middle numbers. Thus, the median of the numbers 2, 4, 7, 12 is (4+7)/2 = 5.5. See [1] for more info. Check https://essentia.upf.edu/reference/std_Median.html for more details.

Parameters
Name Type Description
array VectorFloat

the input array (must be non-empty)

Returns

{median: 'the median of the input array'}

Details

MedianFilter( array [, kernelSize ] ) → {object}

Description

This algorithm computes the median filtered version of the input signal giving the kernel size as detailed in [1]. Check https://essentia.upf.edu/reference/std_MedianFilter.html for more details.

Parameters
Name Type Attributes Default Description
array VectorFloat

the input array (must be non-empty)

kernelSize number <optional>
11

scalar giving the size of the median filter window. Must be odd

Returns

{filteredArray: 'the median-filtered input array'}

Details

MelBands( spectrum [, highFrequencyBound [, inputSize [, log [, lowFrequencyBound [, normalize [, numberBands [, sampleRate [, type [, warpingFormula [, weighting ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes energy in mel bands of a spectrum. It applies a frequency-domain filterbank (MFCC FB-40, [1]), which consists of equal area triangular filters spaced according to the mel scale. The filterbank is normalized in such a way that the sum of coefficients for every filter equals one. It is recommended that the input "spectrum" be calculated by the Spectrum algorithm. Check https://essentia.upf.edu/reference/std_MelBands.html for more details.

Parameters
Name Type Attributes Default Description
spectrum VectorFloat

the audio spectrum

highFrequencyBound number <optional>
22050

an upper-bound limit for the frequencies to be included in the bands

inputSize number <optional>
1025

the size of the spectrum

log boolean <optional>
false

compute log-energies (log10 (1 + energy))

lowFrequencyBound number <optional>
0

a lower-bound limit for the frequencies to be included in the bands

normalize string <optional>
unit_sum

spectrum bin weights to use for each mel band: 'unit_max' to make each mel band vertex equal to 1, 'unit_sum' to make each mel band area equal to 1 summing the actual weights of spectrum bins, 'unit_area' to make each triangle mel band area equal to 1 normalizing the weights of each triangle by its bandwidth

numberBands number <optional>
24

the number of output bands

sampleRate number <optional>
44100

the sample rate

type string <optional>
power

'power' to output squared units, 'magnitude' to keep it as the input

warpingFormula string <optional>
htkMel

The scale implementation type: 'htkMel' scale from the HTK toolkit [2, 3] (default) or 'slaneyMel' scale from the Auditory toolbox [4]

weighting string <optional>
warping

type of weighting function for determining triangle area

Returns

{bands: 'the energy in mel bands'}

Details

Meter( beatogram ) → {object}

Description

This algorithm estimates the time signature of a given beatogram by finding the highest correlation between beats. Check https://essentia.upf.edu/reference/std_Meter.html for more details.

Parameters
Name Type Description
beatogram VectorVectorFloat

filtered matrix loudness

Returns

{meter: 'the time signature'}

Details

MinMax( array [, type ] ) → {object}

Description

This algorithm calculates the minimum or maximum value of an array. If the array has more than one minimum or maximum value, the index of the first one is returned Check https://essentia.upf.edu/reference/std_MinMax.html for more details.

Parameters
Name Type Attributes Default Description
array VectorFloat

the input array

type string <optional>
min

the type of the operation

Returns

{real: 'the minimum or maximum of the input array, according to the type parameter', int: 'the index of the value'}

Details

MinToTotal( envelope ) → {object}

Description

This algorithm computes the ratio between the index of the minimum value of the envelope of a signal and the total length of the envelope. Check https://essentia.upf.edu/reference/std_MinToTotal.html for more details.

Parameters
Name Type Description
envelope VectorFloat

the envelope of the signal

Returns

{minToTotal: 'the minimum amplitude position to total length ratio'}

Details

MovingAverage( signal [, size ] ) → {object}

Description

This algorithm implements a FIR Moving Average filter. Because of its dependece on IIR, IIR's requirements are inherited. Check https://essentia.upf.edu/reference/std_MovingAverage.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input audio signal

size number <optional>
6

the size of the window [audio samples]

Returns

{signal: 'the filtered signal'}

Details

MultiPitchKlapuri( signal [, binResolution [, frameSize [, harmonicWeight [, hopSize [, magnitudeCompression [, magnitudeThreshold [, maxFrequency [, minFrequency [, numberHarmonics [, referenceFrequency [, sampleRate ] ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm estimates multiple pitch values corresponding to the melodic lines present in a polyphonic music signal (for example, string quartet, piano). This implementation is based on the algorithm in [1]: In each frame, a set of possible fundamental frequency candidates is extracted based on the principle of harmonic summation. In an optimization stage, the number of harmonic sources (polyphony) is estimated and the final set of fundamental frequencies determined. In contrast to the pich salience function proposed in [2], this implementation uses the pitch salience function described in [1]. The output is a vector for each frame containing the estimated melody pitch values. Check https://essentia.upf.edu/reference/std_MultiPitchKlapuri.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input signal

binResolution number <optional>
10

salience function bin resolution [cents]

frameSize number <optional>
2048

the frame size for computing pitch saliecnce

harmonicWeight number <optional>
0.8

harmonic weighting parameter (weight decay ratio between two consequent harmonics, =1 for no decay)

hopSize number <optional>
128

the hop size with which the pitch salience function was computed

magnitudeCompression number <optional>
1

magnitude compression parameter for the salience function (=0 for maximum compression, =1 for no compression)

magnitudeThreshold number <optional>
40

spectral peak magnitude threshold (maximum allowed difference from the highest peak in dBs)

maxFrequency number <optional>
1760

the maximum allowed frequency for salience function peaks (ignore peaks above) [Hz]

minFrequency number <optional>
80

the minimum allowed frequency for salience function peaks (ignore peaks below) [Hz]

numberHarmonics number <optional>
10

number of considered harmonics

referenceFrequency number <optional>
55

the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

Returns

{pitch: 'the estimated pitch values [Hz]'}

Details

MultiPitchMelodia( signal [, binResolution [, filterIterations [, frameSize [, guessUnvoiced [, harmonicWeight [, hopSize [, magnitudeCompression [, magnitudeThreshold [, maxFrequency [, minDuration [, minFrequency [, numberHarmonics [, peakDistributionThreshold [, peakFrameThreshold [, pitchContinuity [, referenceFrequency [, sampleRate [, timeContinuity ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm estimates multiple fundamental frequency contours from an audio signal. It is a multi pitch version of the MELODIA algorithm described in [1]. While the algorithm is originally designed to extract melody in polyphonic music, this implementation is adapted for multiple sources. The approach is based on the creation and characterization of pitch contours, time continuous sequences of pitch candidates grouped using auditory streaming cues. To this end, PitchSalienceFunction, PitchSalienceFunctionPeaks, PitchContours, and PitchContoursMonoMelody algorithms are employed. It is strongly advised to use the default parameter values which are optimized according to [1] (where further details are provided) except for minFrequency, maxFrequency, and voicingTolerance, which will depend on your application. Check https://essentia.upf.edu/reference/std_MultiPitchMelodia.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input signal

binResolution number <optional>
10

salience function bin resolution [cents]

filterIterations number <optional>
3

number of iterations for the octave errors / pitch outlier filtering process

frameSize number <optional>
2048

the frame size for computing pitch saliecnce

guessUnvoiced boolean <optional>
false

estimate pitch for non-voiced segments by using non-salient contours when no salient ones are present in a frame

harmonicWeight number <optional>
0.8

harmonic weighting parameter (weight decay ratio between two consequent harmonics, =1 for no decay)

hopSize number <optional>
128

the hop size with which the pitch salience function was computed

magnitudeCompression number <optional>
1

magnitude compression parameter for the salience function (=0 for maximum compression, =1 for no compression)

magnitudeThreshold number <optional>
40

spectral peak magnitude threshold (maximum allowed difference from the highest peak in dBs)

maxFrequency number <optional>
20000

the minimum allowed frequency for salience function peaks (ignore contours with peaks above) [Hz]

minDuration number <optional>
100

the minimum allowed contour duration [ms]

minFrequency number <optional>
40

the minimum allowed frequency for salience function peaks (ignore contours with peaks below) [Hz]

numberHarmonics number <optional>
20

number of considered harmonics

peakDistributionThreshold number <optional>
0.9

allowed deviation below the peak salience mean over all frames (fraction of the standard deviation)

peakFrameThreshold number <optional>
0.9

per-frame salience threshold factor (fraction of the highest peak salience in a frame)

pitchContinuity number <optional>
27.5625

pitch continuity cue (maximum allowed pitch change during 1 ms time period) [cents]

referenceFrequency number <optional>
55

the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

timeContinuity number <optional>
100

time continuity cue (the maximum allowed gap duration for a pitch contour) [ms]

Returns

{pitch: 'the estimated pitch values [Hz]'}

Details

Multiplexer( [ numberRealInputs [, numberVectorRealInputs ] ] ) → {object}

Description

This algorithm returns a single vector from a given number of real values and/or frames. Frames from different inputs are multiplexed onto a single stream in an alternating fashion. Check https://essentia.upf.edu/reference/std_Multiplexer.html for more details.

Parameters
Name Type Attributes Default Description
numberRealInputs number <optional>
0

the number of inputs of type Real to multiplex

numberVectorRealInputs number <optional>
0

the number of inputs of type vector to multiplex

Returns

{data: 'the frame containing the input values and/or input frames'}

Details

NNLSChroma( logSpectrogram, meanTuning, localTuning [, chromaNormalization [, frameSize [, sampleRate [, spectralShape [, spectralWhitening [, tuningMode [, useNNLS ] ] ] ] ] ] ] ) → {object}

Description

This algorithm extracts treble and bass chromagrams from a sequence of log-frequency spectrum frames. On this representation, two processing steps are performed: -tuning, after which each centre bin (i.e. bin 2, 5, 8, ...) corresponds to a semitone, even if the tuning of the piece deviates from 440 Hz standard pitch. -running standardisation: subtraction of the running mean, division by the running standard deviation. This has a spectral whitening effect. This code is ported from NNLS Chroma [1, 2]. To achieve similar results follow this processing chain: frame slicing with sample rate = 44100, frame size = 16384, hop size = 2048 -> Windowing with Hann and no normalization -> Spectrum -> LogSpectrum. Check https://essentia.upf.edu/reference/std_NNLSChroma.html for more details.

Parameters
Name Type Attributes Default Description
logSpectrogram VectorVectorFloat

log spectrum frames

meanTuning VectorFloat

mean tuning frames

localTuning VectorFloat

local tuning frames

chromaNormalization string <optional>
none

determines whether or how the chromagrams are normalised

frameSize number <optional>
1025

the input frame size of the spectrum vector

sampleRate number <optional>
44100

the input sample rate

spectralShape number <optional>
0.7

the shape of the notes in the NNLS dictionary

spectralWhitening number <optional>
1

determines how much the log-frequency spectrum is whitened

tuningMode string <optional>
global

local uses a local average for tuning, global uses all audio frames. Local tuning is only advisable when the tuning is likely to change over the audio

useNNLS boolean <optional>
true

toggle between NNLS approximate transcription and linear spectral mapping

Returns

{tunedLogfreqSpectrum: 'Log frequency spectrum after tuning', semitoneSpectrum: 'a spectral representation with one bin per semitone', bassChromagram: ' a 12-dimensional chromagram, restricted to the bass range', chromagram: 'a 12-dimensional chromagram, restricted with mid-range emphasis'}

Details

NoiseAdder( signal [, fixSeed [, level ] ] ) → {object}

Description

This algorithm adds noise to an input signal. The average energy of the noise in dB is defined by the level parameter, and is generated using the Mersenne Twister random number generator. Check https://essentia.upf.edu/reference/std_NoiseAdder.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input signal

fixSeed boolean <optional>
false

if true, 0 is used as the seed for generating random values

level number <optional>
-100

power level of the noise generator [dB]

Returns

{signal: 'the output signal with the added noise'}

Details

NoiseBurstDetector( frame [, alpha [, silenceThreshold [, threshold ] ] ] ) → {object}

Description

This algorithm detects noise bursts in the waveform by thresholding the peaks of the second derivative. The threshold is computed using an Exponential Moving Average filter over the RMS of the second derivative of the input frame. Check https://essentia.upf.edu/reference/std_NoiseBurstDetector.html for more details.

Parameters
Name Type Attributes Default Description
frame VectorFloat

the input frame (must be non-empty)

alpha number <optional>
0.9

alpha coefficient for the Exponential Moving Average threshold estimation.

silenceThreshold number <optional>
-50

threshold to skip silent frames

threshold number <optional>
8

factor to control the dynamic theshold

Returns

{indexes: 'indexes of the noisy samples'}

Details

NoveltyCurve( frequencyBands [, frameRate [, normalize [, weightCurve [, weightCurveType ] ] ] ] ) → {object}

Description

This algorithm computes the "novelty curve" (Grosche & Müller, 2009) onset detection function. The algorithm expects as an input a frame-wise sequence of frequency-bands energies or spectrum magnitudes as originally proposed in [1] (see FrequencyBands and Spectrum algorithms). Novelty in each band (or frequency bin) is computed as a derivative between log-compressed energy (magnitude) values in consequent frames. The overall novelty value is then computed as a weighted sum that can be configured using 'weightCurve' parameter. The resulting novelty curve can be used for beat tracking and onset detection (see BpmHistogram and Onsets). Check https://essentia.upf.edu/reference/std_NoveltyCurve.html for more details.

Parameters
Name Type Attributes Default Description
frequencyBands VectorVectorFloat

the frequency bands

frameRate number <optional>
344.531

the sampling rate of the input audio

normalize boolean <optional>
false

whether to normalize each band's energy

weightCurve Array.<any> <optional>
[]

vector containing the weights for each frequency band. Only if weightCurveType==supplied

weightCurveType string <optional>
hybrid

the type of weighting to be used for the bands novelty

Returns

{novelty: 'the novelty curve as a single vector'}

Details

NoveltyCurveFixedBpmEstimator( novelty [, hopSize [, maxBpm [, minBpm [, sampleRate [, tolerance ] ] ] ] ] ) → {object}

Description

This algorithm outputs a histogram of the most probable bpms assuming the signal has constant tempo given the novelty curve. This algorithm is based on the autocorrelation of the novelty curve (see NoveltyCurve algorithm) and should only be used for signals that have a constant tempo or as a first tempo estimator to be used in conjunction with other algorithms such as BpmHistogram.It is a simplified version of the algorithm described in [1] as, in order to predict the best BPM candidate, it computes autocorrelation of the entire novelty curve instead of analyzing it on frames and histogramming the peaks over frames. Check https://essentia.upf.edu/reference/std_NoveltyCurveFixedBpmEstimator.html for more details.

Parameters
Name Type Attributes Default Description
novelty VectorFloat

the novelty curve of the audio signal

hopSize number <optional>
512

the hopSize used to computeh the novelty curve from the original signal

maxBpm number <optional>
560

the maximum bpm to look for

minBpm number <optional>
30

the minimum bpm to look for

sampleRate number <optional>
44100

the sampling rate original audio signal [Hz]

tolerance number <optional>
3

tolerance (in percentage) for considering bpms to be equal

Returns

{bpms: 'the bpm candidates sorted by magnitude', amplitudes: 'the magnitude of each bpm candidate'}

Details

OddToEvenHarmonicEnergyRatio( frequencies, magnitudes ) → {object}

Description

This algorithm computes the ratio between a signal's odd and even harmonic energy given the signal's harmonic peaks. The odd to even harmonic energy ratio is a measure allowing to distinguish odd-harmonic-energy predominant sounds (such as from a clarinet) from equally important even-harmonic-energy sounds (such as from a trumpet). The required harmonic frequencies and magnitudes can be computed by the HarmonicPeaks algorithm. In the case when the even energy is zero, which may happen when only even harmonics where found or when only one peak was found, the algorithm outputs the maximum real number possible. Therefore, this algorithm should be used in conjunction with the harmonic peaks algorithm. If no peaks are supplied, the algorithm outputs a value of one, assuming either the spectrum was flat or it was silent. Check https://essentia.upf.edu/reference/std_OddToEvenHarmonicEnergyRatio.html for more details.

Parameters
Name Type Description
frequencies VectorFloat

the frequencies of the harmonic peaks (at least two frequencies in frequency ascending order)

magnitudes VectorFloat

the magnitudes of the harmonic peaks (at least two magnitudes in frequency ascending order)

Returns

{oddToEvenHarmonicEnergyRatio: 'the ratio between the odd and even harmonic energies of the given harmonic peaks'}

Details

OnsetDetection( spectrum, phase [, method [, sampleRate ] ] ) → {object}

Description

This algorithm computes various onset detection functions. The output of this algorithm should be post-processed in order to determine whether the frame contains an onset or not. Namely, it could be fed to the Onsets algorithm. It is recommended that the input "spectrum" is generated by the Spectrum algorithm. Four methods are available: - 'HFC', the High Frequency Content detection function which accurately detects percussive events (see HFC algorithm for details). - 'complex', the Complex-Domain spectral difference function [1] taking into account changes in magnitude and phase. It emphasizes note onsets either as a result of significant change in energy in the magnitude spectrum, and/or a deviation from the expected phase values in the phase spectrum, caused by a change in pitch. - 'complex_phase', the simplified Complex-Domain spectral difference function [2] taking into account phase changes, weighted by magnitude. TODO:It reacts better on tonal sounds such as bowed string, but tends to over-detect percussive events. - 'flux', the Spectral Flux detection function which characterizes changes in magnitude spectrum. See Flux algorithm for details. - 'melflux', the spectral difference function, similar to spectral flux, but using half-rectified energy changes in Mel-frequency bands of the spectrum [3]. - 'rms', the difference function, measuring the half-rectified change of the RMS of the magnitude spectrum (i.e., measuring overall energy flux) [4]. Check https://essentia.upf.edu/reference/std_OnsetDetection.html for more details.

Parameters
Name Type Attributes Default Description
spectrum VectorFloat

the input spectrum

phase VectorFloat

the phase vector corresponding to this spectrum (used only by the "complex" method)

method string <optional>
hfc

the method used for onset detection

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

Returns

{onsetDetection: 'the value of the detection function in the current frame'}

Details

OnsetDetectionGlobal( signal [, frameSize [, hopSize [, method [, sampleRate ] ] ] ] ) → {object}

Description

This algorithm computes various onset detection functions. Detection values are computed frame-wisely given an input signal. The output of this algorithm should be post-processed in order to determine whether the frame contains an onset or not. Namely, it could be fed to the Onsets algorithm. The following method are available: - 'infogain', the spectral difference measured by the modified information gain [1]. For each frame, it accounts for energy change in between preceding and consecutive frames, histogrammed together, in order to suppress short-term variations on frame-by-frame basis. - 'beat_emphasis', the beat emphasis function [1]. This function is a linear combination of onset detection functions (complex spectral differences) in a number of sub-bands, weighted by their beat strength computed over the entire input signal. Note: - 'infogain' onset detection has been optimized for the default sampleRate=44100Hz, frameSize=2048, hopSize=512. - 'beat_emphasis' is optimized for a fixed resolution of 11.6ms, which corresponds to the default sampleRate=44100Hz, frameSize=1024, hopSize=512. Optimal performance of beat detection with TempoTapDegara is not guaranteed for other settings. Check https://essentia.upf.edu/reference/std_OnsetDetectionGlobal.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input signal

frameSize number <optional>
2048

the frame size for computing onset detection function

hopSize number <optional>
512

the hop size for computing onset detection function

method string <optional>
infogain

the method used for onset detection

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

Returns

{onsetDetections: 'the frame-wise values of the detection function'}

Details

OnsetRate( signal ) → {object}

Description

This algorithm computes the number of onsets per second and their position in time for an audio signal. Onset detection functions are computed using both high frequency content and complex-domain methods available in OnsetDetection algorithm. See OnsetDetection for more information. Please note that due to a dependence on the Onsets algorithm, this algorithm is only valid for audio signals with a sampling rate of 44100Hz. This algorithm throws an exception if the input signal is empty. Check https://essentia.upf.edu/reference/std_OnsetRate.html for more details.

Parameters
Name Type Description
signal VectorFloat

the input signal

Returns

{onsets: 'the positions of detected onsets [s]', onsetRate: 'the number of onsets per second'}

Details

OverlapAdd( signal [, frameSize [, gain [, hopSize ] ] ] ) → {object}

Description

This algorithm returns the output of an overlap-add process for a sequence of frames of an audio signal. It considers that the input audio frames are windowed audio signals. Giving the size of the frame and the hop size, overlapping and adding consecutive frames will produce a continuous signal. A normalization gain can be passed as a parameter. Check https://essentia.upf.edu/reference/std_OverlapAdd.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the windowed input audio frame

frameSize number <optional>
2048

the frame size for computing the overlap-add process

gain number <optional>
1

the normalization gain that scales the output signal. Useful for IFFT output

hopSize number <optional>
128

the hop size with which the overlap-add function is computed

Returns

{signal: 'the output overlap-add audio signal frame'}

Details

PeakDetection( array [, interpolate [, maxPeaks [, maxPosition [, minPeakDistance [, minPosition [, orderBy [, range [, threshold ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm detects local maxima (peaks) in an array. The algorithm finds positive slopes and detects a peak when the slope changes sign and the peak is above the threshold. It optionally interpolates using parabolic curve fitting. When two consecutive peaks are closer than the minPeakDistance parameter, the smallest one is discarded. A value of 0 bypasses this feature. Check https://essentia.upf.edu/reference/std_PeakDetection.html for more details.

Parameters
Name Type Attributes Default Description
array VectorFloat

the input array

interpolate boolean <optional>
true

boolean flag to enable interpolation

maxPeaks number <optional>
100

the maximum number of returned peaks

maxPosition number <optional>
1

the maximum value of the range to evaluate

minPeakDistance number <optional>
0

minimum distance between consecutive peaks (0 to bypass this feature)

minPosition number <optional>
0

the minimum value of the range to evaluate

orderBy string <optional>
position

the ordering type of the output peaks (ascending by position or descending by value)

range number <optional>
1

the input range

threshold number <optional>
-1e+06

peaks below this given threshold are not output

Returns

{positions: 'the positions of the peaks', amplitudes: 'the amplitudes of the peaks'}

Details

PercivalBpmEstimator( signal [, frameSize [, frameSizeOSS [, hopSize [, hopSizeOSS [, maxBPM [, minBPM [, sampleRate ] ] ] ] ] ] ] ) → {object}

Description

This algorithm estimates the tempo in beats per minute (BPM) from an input signal as described in [1]. Check https://essentia.upf.edu/reference/std_PercivalBpmEstimator.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

input signal

frameSize number <optional>
1024

frame size for the analysis of the input signal

frameSizeOSS number <optional>
2048

frame size for the analysis of the Onset Strength Signal

hopSize number <optional>
128

hop size for the analysis of the input signal

hopSizeOSS number <optional>
128

hop size for the analysis of the Onset Strength Signal

maxBPM number <optional>
210

maximum BPM to detect

minBPM number <optional>
50

minimum BPM to detect

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

Returns

{bpm: 'the tempo estimation [bpm]'}

Details

PercivalEnhanceHarmonics( array ) → {object}

Description

This algorithm implements the 'Enhance Harmonics' step as described in [1].Given an input autocorrelation signal, two time-stretched versions of it (by factors of 2 and 4) are added to the original.In this way, peaks with an harmonic relation are boosted. For more details check the referenced paper. Check https://essentia.upf.edu/reference/std_PercivalEnhanceHarmonics.html for more details.

Parameters
Name Type Description
array VectorFloat

the input signal

Returns

{array: 'the input signal with enhanced harmonics'}

Details

PercivalEvaluatePulseTrains( oss, positions ) → {object}

Description

This algorithm implements the 'Evaluate Pulse Trains' step as described in [1].Given an input onset strength signal (OSS) and a number of candidate tempo lag positions, the OSS is correlated with ideal expected pulse trains (for each candidate tempo lag) shifted in time by different amounts. The candidate tempo lag which generates the pulse train that better correlates with the OSS is returned as the preferred tempo candidate. For more details check the referenced paper. Check https://essentia.upf.edu/reference/std_PercivalEvaluatePulseTrains.html for more details.

Parameters
Name Type Description
oss VectorFloat

onset strength signal (or other novelty curve)

positions VectorFloat

peak positions of BPM candidates

Returns

{lag: 'best tempo lag estimate'}

Details

PitchContourSegmentation( pitch, signal [, hopSize [, minDuration [, pitchDistanceThreshold [, rmsThreshold [, sampleRate [, tuningFrequency ] ] ] ] ] ] ) → {object}

Description

This algorithm converts a pitch sequence estimated from an audio signal into a set of discrete note events. Each note is defined by its onset time, duration and MIDI pitch value, quantized to the equal tempered scale. Check https://essentia.upf.edu/reference/std_PitchContourSegmentation.html for more details.

Parameters
Name Type Attributes Default Description
pitch VectorFloat

estimated pitch contour [Hz]

signal VectorFloat

input audio signal

hopSize number <optional>
128

hop size of the extracted pitch

minDuration number <optional>
0.1

minimum note duration [s]

pitchDistanceThreshold number <optional>
60

pitch threshold for note segmentation [cents]

rmsThreshold number <optional>
-2

zscore threshold for note segmentation

sampleRate number <optional>
44100

sample rate of the audio signal

tuningFrequency number <optional>
440

tuning reference frequency [Hz]

Returns

{onset: 'note onset times [s]', duration: 'note durations [s]', MIDIpitch: 'quantized MIDI pitch value'}

Details

PitchContours( peakBins, peakSaliences [, binResolution [, hopSize [, minDuration [, peakDistributionThreshold [, peakFrameThreshold [, pitchContinuity [, sampleRate [, timeContinuity ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm tracks a set of predominant pitch contours of an audio signal. This algorithm is intended to receive its "frequencies" and "magnitudes" inputs from the PitchSalienceFunctionPeaks algorithm outputs aggregated over all frames in the sequence. The output is a vector of estimated melody pitch values. Check https://essentia.upf.edu/reference/std_PitchContours.html for more details.

Parameters
Name Type Attributes Default Description
peakBins VectorVectorFloat

frame-wise array of cent bins corresponding to pitch salience function peaks

peakSaliences VectorVectorFloat

frame-wise array of values of salience function peaks

binResolution number <optional>
10

salience function bin resolution [cents]

hopSize number <optional>
128

the hop size with which the pitch salience function was computed

minDuration number <optional>
100

the minimum allowed contour duration [ms]

peakDistributionThreshold number <optional>
0.9

allowed deviation below the peak salience mean over all frames (fraction of the standard deviation)

peakFrameThreshold number <optional>
0.9

per-frame salience threshold factor (fraction of the highest peak salience in a frame)

pitchContinuity number <optional>
27.5625

pitch continuity cue (maximum allowed pitch change durig 1 ms time period) [cents]

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

timeContinuity number <optional>
100

time continuity cue (the maximum allowed gap duration for a pitch contour) [ms]

Returns

{contoursBins: 'array of frame-wise vectors of cent bin values representing each contour', contoursSaliences: 'array of frame-wise vectors of pitch saliences representing each contour', contoursStartTimes: 'array of start times of each contour [s]', duration: 'time duration of the input signal [s]'}

Details

PitchContoursMelody( contoursBins, contoursSaliences, contoursStartTimes, duration [, binResolution [, filterIterations [, guessUnvoiced [, hopSize [, maxFrequency [, minFrequency [, referenceFrequency [, sampleRate [, voiceVibrato [, voicingTolerance ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm converts a set of pitch contours into a sequence of predominant f0 values in Hz by taking the value of the most predominant contour in each frame. This algorithm is intended to receive its "contoursBins", "contoursSaliences", and "contoursStartTimes" inputs from the PitchContours algorithm. The "duration" input corresponds to the time duration of the input signal. The output is a vector of estimated pitch values and a vector of confidence values. Check https://essentia.upf.edu/reference/std_PitchContoursMelody.html for more details.

Parameters
Name Type Attributes Default Description
contoursBins VectorVectorFloat

array of frame-wise vectors of cent bin values representing each contour

contoursSaliences VectorVectorFloat

array of frame-wise vectors of pitch saliences representing each contour

contoursStartTimes VectorFloat

array of the start times of each contour [s]

duration number

time duration of the input signal [s]

binResolution number <optional>
10

salience function bin resolution [cents]

filterIterations number <optional>
3

number of interations for the octave errors / pitch outlier filtering process

guessUnvoiced boolean <optional>
false

Estimate pitch for non-voiced segments by using non-salient contours when no salient ones are present in a frame

hopSize number <optional>
128

the hop size with which the pitch salience function was computed

maxFrequency number <optional>
20000

the minimum allowed frequency for salience function peaks (ignore contours with peaks above) [Hz]

minFrequency number <optional>
80

the minimum allowed frequency for salience function peaks (ignore contours with peaks below) [Hz]

referenceFrequency number <optional>
55

the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin

sampleRate number <optional>
44100

the sampling rate of the audio signal (Hz)

voiceVibrato boolean <optional>
false

detect voice vibrato

voicingTolerance number <optional>
0.2

allowed deviation below the average contour mean salience of all contours (fraction of the standard deviation)

Returns

{pitch: 'vector of estimated pitch values (i.e., melody) [Hz]', pitchConfidence: 'confidence with which the pitch was detected'}

Details

PitchContoursMonoMelody( contoursBins, contoursSaliences, contoursStartTimes, duration [, binResolution [, filterIterations [, guessUnvoiced [, hopSize [, maxFrequency [, minFrequency [, referenceFrequency [, sampleRate ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm converts a set of pitch contours into a sequence of f0 values in Hz by taking the value of the most salient contour in each frame. In contrast to pitchContoursMelody, it assumes a single source. This algorithm is intended to receive its "contoursBins", "contoursSaliences", and "contoursStartTimes" inputs from the PitchContours algorithm. The "duration" input corresponds to the time duration of the input signal. The output is a vector of estimated pitch values and a vector of confidence values. Check https://essentia.upf.edu/reference/std_PitchContoursMonoMelody.html for more details.

Parameters
Name Type Attributes Default Description
contoursBins VectorVectorFloat

array of frame-wise vectors of cent bin values representing each contour

contoursSaliences VectorVectorFloat

array of frame-wise vectors of pitch saliences representing each contour

contoursStartTimes VectorFloat

array of the start times of each contour [s]

duration number

time duration of the input signal [s]

binResolution number <optional>
10

salience function bin resolution [cents]

filterIterations number <optional>
3

number of interations for the octave errors / pitch outlier filtering process

guessUnvoiced boolean <optional>
false

Estimate pitch for non-voiced segments by using non-salient contours when no salient ones are present in a frame

hopSize number <optional>
128

the hop size with which the pitch salience function was computed

maxFrequency number <optional>
20000

the minimum allowed frequency for salience function peaks (ignore contours with peaks above) [Hz]

minFrequency number <optional>
80

the minimum allowed frequency for salience function peaks (ignore contours with peaks below) [Hz]

referenceFrequency number <optional>
55

the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin

sampleRate number <optional>
44100

the sampling rate of the audio signal (Hz)

Returns

{pitch: 'vector of estimated pitch values (i.e., melody) [Hz]', pitchConfidence: 'confidence with which the pitch was detected'}

Details

PitchContoursMultiMelody( contoursBins, contoursSaliences, contoursStartTimes, duration [, binResolution [, filterIterations [, guessUnvoiced [, hopSize [, maxFrequency [, minFrequency [, referenceFrequency [, sampleRate ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm post-processes a set of pitch contours into a sequence of mutliple f0 values in Hz. This algorithm is intended to receive its "contoursBins", "contoursSaliences", and "contoursStartTimes" inputs from the PitchContours algorithm. The "duration" input corresponds to the time duration of the input signal. The output is a vector of estimated pitch values Check https://essentia.upf.edu/reference/std_PitchContoursMultiMelody.html for more details.

Parameters
Name Type Attributes Default Description
contoursBins VectorVectorFloat

array of frame-wise vectors of cent bin values representing each contour

contoursSaliences VectorVectorFloat

array of frame-wise vectors of pitch saliences representing each contour

contoursStartTimes VectorFloat

array of the start times of each contour [s]

duration number

time duration of the input signal [s]

binResolution number <optional>
10

salience function bin resolution [cents]

filterIterations number <optional>
3

number of interations for the octave errors / pitch outlier filtering process

guessUnvoiced boolean <optional>
false

Estimate pitch for non-voiced segments by using non-salient contours when no salient ones are present in a frame

hopSize number <optional>
128

the hop size with which the pitch salience function was computed

maxFrequency number <optional>
20000

the minimum allowed frequency for salience function peaks (ignore contours with peaks above) [Hz]

minFrequency number <optional>
80

the minimum allowed frequency for salience function peaks (ignore contours with peaks below) [Hz]

referenceFrequency number <optional>
55

the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin

sampleRate number <optional>
44100

the sampling rate of the audio signal (Hz)

Returns

{pitch: 'vector of estimated pitch values (i.e., melody) [Hz]'}

Details

PitchFilter( pitch, pitchConfidence [, confidenceThreshold [, minChunkSize [, useAbsolutePitchConfidence ] ] ] ) → {object}

Description

This algorithm corrects the fundamental frequency estimations for a sequence of frames given pitch values together with their confidence values. In particular, it removes non-confident parts and spurious jumps in pitch and applies octave corrections. Check https://essentia.upf.edu/reference/std_PitchFilter.html for more details.

Parameters
Name Type Attributes Default Description
pitch VectorFloat

vector of pitch values for the input frames [Hz]

pitchConfidence VectorFloat

vector of pitch confidence values for the input frames

confidenceThreshold number <optional>
36

ratio between the average confidence of the most confident chunk and the minimum allowed average confidence of a chunk

minChunkSize number <optional>
30

minumum number of frames in non-zero pitch chunks

useAbsolutePitchConfidence boolean <optional>
false

treat negative pitch confidence values as positive (use with melodia guessUnvoiced=True)

Returns

{pitchFiltered: 'vector of corrected pitch values [Hz]'}

Details

PitchMelodia( signal [, binResolution [, filterIterations [, frameSize [, guessUnvoiced [, harmonicWeight [, hopSize [, magnitudeCompression [, magnitudeThreshold [, maxFrequency [, minDuration [, minFrequency [, numberHarmonics [, peakDistributionThreshold [, peakFrameThreshold [, pitchContinuity [, referenceFrequency [, sampleRate [, timeContinuity ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm estimates the fundamental frequency corresponding to the melody of a monophonic music signal based on the MELODIA algorithm. While the algorithm is originally designed to extract the predominant melody from polyphonic music [1], this implementation is adapted for monophonic signals. The approach is based on the creation and characterization of pitch contours, time continuous sequences of pitch candidates grouped using auditory streaming cues. To this end, PitchSalienceFunction, PitchSalienceFunctionPeaks, PitchContours, and PitchContoursMonoMelody algorithms are employed. It is strongly advised to use the default parameter values which are optimized according to [1] (where further details are provided) except for minFrequency and maxFrequency, which will depend on your application. Check https://essentia.upf.edu/reference/std_PitchMelodia.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input signal

binResolution number <optional>
10

salience function bin resolution [cents]

filterIterations number <optional>
3

number of iterations for the octave errors / pitch outlier filtering process

frameSize number <optional>
2048

the frame size for computing pitch saliecnce

guessUnvoiced boolean <optional>
false

estimate pitch for non-voiced segments by using non-salient contours when no salient ones are present in a frame

harmonicWeight number <optional>
0.8

harmonic weighting parameter (weight decay ratio between two consequent harmonics, =1 for no decay)

hopSize number <optional>
128

the hop size with which the pitch salience function was computed

magnitudeCompression number <optional>
1

magnitude compression parameter for the salience function (=0 for maximum compression, =1 for no compression)

magnitudeThreshold number <optional>
40

spectral peak magnitude threshold (maximum allowed difference from the highest peak in dBs)

maxFrequency number <optional>
20000

the minimum allowed frequency for salience function peaks (ignore contours with peaks above) [Hz]

minDuration number <optional>
100

the minimum allowed contour duration [ms]

minFrequency number <optional>
40

the minimum allowed frequency for salience function peaks (ignore contours with peaks below) [Hz]

numberHarmonics number <optional>
20

number of considered harmonics

peakDistributionThreshold number <optional>
0.9

allowed deviation below the peak salience mean over all frames (fraction of the standard deviation)

peakFrameThreshold number <optional>
0.9

per-frame salience threshold factor (fraction of the highest peak salience in a frame)

pitchContinuity number <optional>
27.5625

pitch continuity cue (maximum allowed pitch change during 1 ms time period) [cents]

referenceFrequency number <optional>
55

the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

timeContinuity number <optional>
100

time continuity cue (the maximum allowed gap duration for a pitch contour) [ms]

Returns

{pitch: 'the estimated pitch values [Hz]', pitchConfidence: 'confidence with which the pitch was detected'}

Details

PitchSalience( spectrum [, highBoundary [, lowBoundary [, sampleRate ] ] ] ) → {object}

Description

This algorithm computes the pitch salience of a spectrum. The pitch salience is given by the ratio of the highest auto correlation value of the spectrum to the non-shifted auto correlation value. Pitch salience was designed as quick measure of tone sensation. Unpitched sounds (non-musical sound effects) and pure tones have an average pitch salience value close to 0 whereas sounds containing several harmonics in the spectrum tend to have a higher value. Check https://essentia.upf.edu/reference/std_PitchSalience.html for more details.

Parameters
Name Type Attributes Default Description
spectrum VectorFloat

the input audio spectrum

highBoundary number <optional>
5000

until which frequency we are looking for the minimum (must be smaller than half sampleRate) [Hz]

lowBoundary number <optional>
100

from which frequency we are looking for the maximum (must not be larger than highBoundary) [Hz]

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

Returns

{pitchSalience: 'the pitch salience (normalized from 0 to 1)'}

Details

PitchSalienceFunction( frequencies, magnitudes [, binResolution [, harmonicWeight [, magnitudeCompression [, magnitudeThreshold [, numberHarmonics [, referenceFrequency ] ] ] ] ] ] ) → {object}

Description

This algorithm computes the pitch salience function of a signal frame given its spectral peaks. The salience function covers a pitch range of nearly five octaves (i.e., 6000 cents), starting from the "referenceFrequency", and is quantized into cent bins according to the specified "binResolution". The salience of a given frequency is computed as the sum of the weighted energies found at integer multiples (harmonics) of that frequency. Check https://essentia.upf.edu/reference/std_PitchSalienceFunction.html for more details.

Parameters
Name Type Attributes Default Description
frequencies VectorFloat

the frequencies of the spectral peaks [Hz]

magnitudes VectorFloat

the magnitudes of the spectral peaks

binResolution number <optional>
10

salience function bin resolution [cents]

harmonicWeight number <optional>
0.8

harmonic weighting parameter (weight decay ratio between two consequent harmonics, =1 for no decay)

magnitudeCompression number <optional>
1

magnitude compression parameter (=0 for maximum compression, =1 for no compression)

magnitudeThreshold number <optional>
40

peak magnitude threshold (maximum allowed difference from the highest peak in dBs)

numberHarmonics number <optional>
20

number of considered harmonics

referenceFrequency number <optional>
55

the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin

Returns

{salienceFunction: 'array of the quantized pitch salience values'}

Details

PitchSalienceFunctionPeaks( salienceFunction [, binResolution [, maxFrequency [, minFrequency [, referenceFrequency ] ] ] ] ) → {object}

Description

This algorithm computes the peaks of a given pitch salience function. Check https://essentia.upf.edu/reference/std_PitchSalienceFunctionPeaks.html for more details.

Parameters
Name Type Attributes Default Description
salienceFunction VectorFloat

the array of salience function values corresponding to cent frequency bins

binResolution number <optional>
10

salience function bin resolution [cents]

maxFrequency number <optional>
1760

the maximum frequency to evaluate (ignore peaks above) [Hz]

minFrequency number <optional>
55

the minimum frequency to evaluate (ignore peaks below) [Hz]

referenceFrequency number <optional>
55

the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin

Returns

{salienceBins: 'the cent bins corresponding to salience function peaks', salienceValues: 'the values of salience function peaks'}

Details

PitchYin( signal [, frameSize [, interpolate [, maxFrequency [, minFrequency [, sampleRate [, tolerance ] ] ] ] ] ] ) → {object}

Description

This algorithm estimates the fundamental frequency given the frame of a monophonic music signal. It is an implementation of the Yin algorithm [1] for computations in the time domain. Check https://essentia.upf.edu/reference/std_PitchYin.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input signal frame

frameSize number <optional>
2048

number of samples in the input frame (this is an optional parameter to optimize memory allocation)

interpolate boolean <optional>
true

enable interpolation

maxFrequency number <optional>
22050

the maximum allowed frequency [Hz]

minFrequency number <optional>
20

the minimum allowed frequency [Hz]

sampleRate number <optional>
44100

sampling rate of the input audio [Hz]

tolerance number <optional>
0.15

tolerance for peak detection

Returns

{pitch: 'detected pitch [Hz]', pitchConfidence: 'confidence with which the pitch was detected [0,1]'}

Details

PitchYinFFT( spectrum [, frameSize [, interpolate [, maxFrequency [, minFrequency [, sampleRate [, tolerance ] ] ] ] ] ] ) → {object}

Description

This algorithm estimates the fundamental frequency given the spectrum of a monophonic music signal. It is an implementation of YinFFT algorithm [1], which is an optimized version of Yin algorithm for computation in the frequency domain. It is recommended to window the input spectrum with a Hann window. The raw spectrum can be computed with the Spectrum algorithm. Check https://essentia.upf.edu/reference/std_PitchYinFFT.html for more details.

Parameters
Name Type Attributes Default Description
spectrum VectorFloat

the input spectrum (preferably created with a hann window)

frameSize number <optional>
2048

number of samples in the input spectrum

interpolate boolean <optional>
true

boolean flag to enable interpolation

maxFrequency number <optional>
22050

the maximum allowed frequency [Hz]

minFrequency number <optional>
20

the minimum allowed frequency [Hz]

sampleRate number <optional>
44100

sampling rate of the input spectrum [Hz]

tolerance number <optional>
1

tolerance for peak detection

Returns

{pitch: 'detected pitch [Hz]', pitchConfidence: 'confidence with which the pitch was detected [0,1]'}

Details

PitchYinProbabilistic( signal [, frameSize [, hopSize [, lowRMSThreshold [, outputUnvoiced [, preciseTime [, sampleRate ] ] ] ] ] ] ) → {object}

Description

This algorithm computes the pitch track of a mono audio signal using probabilistic Yin algorithm. Check https://essentia.upf.edu/reference/std_PitchYinProbabilistic.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input mono audio signal

frameSize number <optional>
2048

the frame size of FFT

hopSize number <optional>
256

the hop size with which the pitch is computed

lowRMSThreshold number <optional>
0.1

the low RMS amplitude threshold

outputUnvoiced string <optional>
negative

whether output unvoiced frame, zero: output non-voiced pitch as 0.; abs: output non-voiced pitch as absolute values; negative: output non-voiced pitch as negative values

preciseTime boolean <optional>
false

use non-standard precise YIN timing (slow).

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

Returns

{pitch: 'the output pitch estimations', voicedProbabilities: 'the voiced probabilities'}

Details

PitchYinProbabilities( signal [, frameSize [, lowAmp [, preciseTime [, sampleRate ] ] ] ] ) → {object}

Description

This algorithm estimates the fundamental frequencies, their probabilities given the frame of a monophonic music signal. It is a part of the implementation of the probabilistic Yin algorithm [1]. Check https://essentia.upf.edu/reference/std_PitchYinProbabilities.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input signal frame

frameSize number <optional>
2048

number of samples in the input frame

lowAmp number <optional>
0.1

the low RMS amplitude threshold

preciseTime boolean <optional>
false

use non-standard precise YIN timing (slow).

sampleRate number <optional>
44100

sampling rate of the input audio [Hz]

Returns

{pitch: 'the output pitch candidate frequencies in cents', probabilities: 'the output pitch candidate probabilities', RMS: 'the output RMS value'}

Details

PitchYinProbabilitiesHMM( pitchCandidates, probabilities [, minFrequency [, numberBinsPerSemitone [, selfTransition [, yinTrust ] ] ] ] ) → {object}

Description

This algorithm estimates the smoothed fundamental frequency given the pitch candidates and probabilities using hidden Markov models. It is a part of the implementation of the probabilistic Yin algorithm [1]. Check https://essentia.upf.edu/reference/std_PitchYinProbabilitiesHMM.html for more details.

Parameters
Name Type Attributes Default Description
pitchCandidates VectorVectorFloat

the pitch candidates

probabilities VectorVectorFloat

the pitch probabilities

minFrequency number <optional>
61.735

minimum detected frequency

numberBinsPerSemitone number <optional>
5

number of bins per semitone

selfTransition number <optional>
0.99

the self transition probabilities

yinTrust number <optional>
0.5

the yin trust parameter

Returns

{pitch: 'pitch frequencies in Hz'}

Details

PowerMean( array [, power ] ) → {object}

Description

This algorithm computes the power mean of an array. It accepts one parameter, p, which is the power (or order or degree) of the Power Mean. Note that if p=-1, the Power Mean is equal to the Harmonic Mean, if p=0, the Power Mean is equal to the Geometric Mean, if p=1, the Power Mean is equal to the Arithmetic Mean, if p=2, the Power Mean is equal to the Root Mean Square. Check https://essentia.upf.edu/reference/std_PowerMean.html for more details.

Parameters
Name Type Attributes Default Description
array VectorFloat

the input array (must contain only positive real numbers)

power number <optional>
1

the power to which to elevate each element before taking the mean

Returns

{powerMean: 'the power mean of the input array'}

Details

PowerSpectrum( signal [, size ] ) → {object}

Description

This algorithm computes the power spectrum of an array of Reals. The resulting power spectrum has a size which is half the size of the input array plus one. Bins contain squared magnitude values. Check https://essentia.upf.edu/reference/std_PowerSpectrum.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input signal

size number <optional>
2048

the expected size of the input frame (this is purely optional and only targeted at optimizing the creation time of the FFT object)

Returns

{powerSpectrum: 'power spectrum of the input signal'}

Details

PredominantPitchMelodia( signal [, binResolution [, filterIterations [, frameSize [, guessUnvoiced [, harmonicWeight [, hopSize [, magnitudeCompression [, magnitudeThreshold [, maxFrequency [, minDuration [, minFrequency [, numberHarmonics [, peakDistributionThreshold [, peakFrameThreshold [, pitchContinuity [, referenceFrequency [, sampleRate [, timeContinuity [, voiceVibrato [, voicingTolerance ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm estimates the fundamental frequency of the predominant melody from polyphonic music signals using the MELODIA algorithm. It is specifically suited for music with a predominent melodic element, for example the singing voice melody in an accompanied singing recording. The approach [1] is based on the creation and characterization of pitch contours, time continuous sequences of pitch candidates grouped using auditory streaming cues. It furthermore determines for each frame, if the predominant melody is present or not. To this end, PitchSalienceFunction, PitchSalienceFunctionPeaks, PitchContours, and PitchContoursMelody algorithms are employed. It is strongly advised to use the default parameter values which are optimized according to [1] (where further details are provided) except for minFrequency, maxFrequency, and voicingTolerance, which will depend on your application. Check https://essentia.upf.edu/reference/std_PredominantPitchMelodia.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input signal

binResolution number <optional>
10

salience function bin resolution [cents]

filterIterations number <optional>
3

number of iterations for the octave errors / pitch outlier filtering process

frameSize number <optional>
2048

the frame size for computing pitch salience

guessUnvoiced boolean <optional>
false

estimate pitch for non-voiced segments by using non-salient contours when no salient ones are present in a frame

harmonicWeight number <optional>
0.8

harmonic weighting parameter (weight decay ratio between two consequent harmonics, =1 for no decay)

hopSize number <optional>
128

the hop size with which the pitch salience function was computed

magnitudeCompression number <optional>
1

magnitude compression parameter for the salience function (=0 for maximum compression, =1 for no compression)

magnitudeThreshold number <optional>
40

spectral peak magnitude threshold (maximum allowed difference from the highest peak in dBs)

maxFrequency number <optional>
20000

the minimum allowed frequency for salience function peaks (ignore contours with peaks above) [Hz]

minDuration number <optional>
100

the minimum allowed contour duration [ms]

minFrequency number <optional>
80

the minimum allowed frequency for salience function peaks (ignore contours with peaks below) [Hz]

numberHarmonics number <optional>
20

number of considered harmonics

peakDistributionThreshold number <optional>
0.9

allowed deviation below the peak salience mean over all frames (fraction of the standard deviation)

peakFrameThreshold number <optional>
0.9

per-frame salience threshold factor (fraction of the highest peak salience in a frame)

pitchContinuity number <optional>
27.5625

pitch continuity cue (maximum allowed pitch change during 1 ms time period) [cents]

referenceFrequency number <optional>
55

the reference frequency for Hertz to cent conversion [Hz], corresponding to the 0th cent bin

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

timeContinuity number <optional>
100

time continuity cue (the maximum allowed gap duration for a pitch contour) [ms]

voiceVibrato boolean <optional>
false

detect voice vibrato

voicingTolerance number <optional>
0.2

allowed deviation below the average contour mean salience of all contours (fraction of the standard deviation)

Returns

{pitch: 'the estimated pitch values [Hz]', pitchConfidence: 'confidence with which the pitch was detected'}

Details

RMS( array ) → {object}

Description

This algorithm computes the root mean square (quadratic mean) of an array. RMS is not defined for empty arrays. In such case, an exception will be thrown . References: [1] Root mean square - Wikipedia, the free encyclopedia, http://en.wikipedia.org/wiki/Root_mean_square Check https://essentia.upf.edu/reference/std_RMS.html for more details.

Parameters
Name Type Description
array VectorFloat

the input array

Returns

{rms: 'the root mean square of the input array'}

Details

RawMoments( array [, range ] ) → {object}

Description

This algorithm computes the first 5 raw moments of an array. The output array is of size 6 because the zero-ith moment is used for padding so that the first moment corresponds to index 1. Check https://essentia.upf.edu/reference/std_RawMoments.html for more details.

Parameters
Name Type Attributes Default Description
array VectorFloat

the input array

range number <optional>
22050

the range of the input array, used for normalizing the results

Returns

{rawMoments: 'the (raw) moments of the input array'}

Details

ReplayGain( signal [, sampleRate ] ) → {object}

Description

This algorithm computes the Replay Gain loudness value of an audio signal. The algorithm is described in detail in [1]. The value returned is the 'standard' ReplayGain value, not the value with 6dB preamplification as computed by lame, mp3gain, vorbisgain, and all widely used ReplayGain programs. Check https://essentia.upf.edu/reference/std_ReplayGain.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input audio signal (must be longer than 0.05ms)

sampleRate number <optional>
44100

the sampling rate of the input audio signal [Hz]

Returns

{replayGain: 'the distance to the suitable average replay level (~-31dbB) defined by SMPTE [dB]'}

Details

Resample( signal [, inputSampleRate [, outputSampleRate [, quality ] ] ] ) → {object}

Description

This algorithm resamples the input signal to the desired sampling rate. Check https://essentia.upf.edu/reference/std_Resample.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input signal

inputSampleRate number <optional>
44100

the sampling rate of the input signal [Hz]

outputSampleRate number <optional>
44100

the sampling rate of the output signal [Hz]

quality number <optional>
1

the quality of the conversion, 0 for best quality

Returns

{signal: 'the resampled signal'}

Details

ResampleFFT( input [, inSize [, outSize ] ] ) → {object}

Description

This algorithm resamples a sequence using FFT / IFFT. The input and output sizes must be an even number. (It is meant to be eqivalent to the resample function in Numpy). Check https://essentia.upf.edu/reference/std_ResampleFFT.html for more details.

Parameters
Name Type Attributes Default Description
input VectorFloat

input array

inSize number <optional>
128

the size of the input sequence. It needss to be even-sized.

outSize number <optional>
128

the size of the output sequence. It needss to be even-sized.

Returns

{output: 'output resample array'}

Details

RhythmDescriptors( signal ) → {object}

Description

This algorithm computes rhythm features (bpm, beat positions, beat histogram peaks) for an audio signal. It combines RhythmExtractor2013 for beat tracking and BPM estimation with BpmHistogramDescriptors algorithms. Check https://essentia.upf.edu/reference/std_RhythmDescriptors.html for more details.

Parameters
Name Type Description
signal VectorFloat

the audio input signal

Returns

{beats_position: 'See RhythmExtractor2013 algorithm documentation', confidence: 'See RhythmExtractor2013 algorithm documentation', bpm: 'See RhythmExtractor2013 algorithm documentation', bpm_estimates: 'See RhythmExtractor2013 algorithm documentation', bpm_intervals: 'See RhythmExtractor2013 algorithm documentation', first_peak_bpm: 'See BpmHistogramDescriptors algorithm documentation', first_peak_spread: 'See BpmHistogramDescriptors algorithm documentation', first_peak_weight: 'See BpmHistogramDescriptors algorithm documentation', second_peak_bpm: 'See BpmHistogramDescriptors algorithm documentation', second_peak_spread: 'See BpmHistogramDescriptors algorithm documentation', second_peak_weight: 'See BpmHistogramDescriptors algorithm documentation', histogram: 'bpm histogram [bpm]'}

Details

RhythmExtractor( signal [, frameHop [, frameSize [, hopSize [, lastBeatInterval [, maxTempo [, minTempo [, numberFrames [, sampleRate [, tempoHints [, tolerance [, useBands [, useOnset ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm estimates the tempo in bpm and beat positions given an audio signal. The algorithm combines several periodicity functions and estimates beats using TempoTap and TempoTapTicks. It combines: - onset detection functions based on high-frequency content (see OnsetDetection) - complex-domain spectral difference function (see OnsetDetection) - periodicity function based on energy bands (see FrequencyBands, TempoScaleBands) Check https://essentia.upf.edu/reference/std_RhythmExtractor.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the audio input signal

frameHop number <optional>
1024

the number of feature frames separating two evaluations

frameSize number <optional>
1024

the number audio samples used to compute a feature

hopSize number <optional>
256

the number of audio samples per features

lastBeatInterval number <optional>
0.1

the minimum interval between last beat and end of file [s]

maxTempo number <optional>
208

the fastest tempo to detect [bpm]

minTempo number <optional>
40

the slowest tempo to detect [bpm]

numberFrames number <optional>
1024

the number of feature frames to buffer on

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

tempoHints Array.<any> <optional>
[]

the optional list of initial beat locations, to favor the detection of pre-determined tempo period and beats alignment [s]

tolerance number <optional>
0.24

the minimum interval between two consecutive beats [s]

useBands boolean <optional>
true

whether or not to use band energy as periodicity function

useOnset boolean <optional>
true

whether or not to use onsets as periodicity function

Returns

{bpm: 'the tempo estimation [bpm]', ticks: ' the estimated tick locations [s]', estimates: 'the bpm estimation per frame [bpm]', bpmIntervals: 'list of beats interval [s]'}

Details

RhythmExtractor2013( signal [, maxTempo [, method [, minTempo ] ] ] ) → {object}

Description

This algorithm extracts the beat positions and estimates their confidence as well as tempo in bpm for an audio signal. The beat locations can be computed using: - 'multifeature', the BeatTrackerMultiFeature algorithm - 'degara', the BeatTrackerDegara algorithm (note that there is no confidence estimation for this method, the output confidence value is always 0) Check https://essentia.upf.edu/reference/std_RhythmExtractor2013.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the audio input signal

maxTempo number <optional>
208

the fastest tempo to detect [bpm]

method string <optional>
multifeature

the method used for beat tracking

minTempo number <optional>
40

the slowest tempo to detect [bpm]

Returns

{bpm: 'the tempo estimation [bpm]', ticks: ' the estimated tick locations [s]', confidence: 'confidence with which the ticks are detected (ignore this value if using 'degara' method)', estimates: 'the list of bpm estimates characterizing the bpm distribution for the signal [bpm]', bpmIntervals: 'list of beats interval [s]'}

Details

RhythmTransform( melBands [, frameSize [, hopSize ] ] ) → {object}

Description

This algorithm implements the rhythm transform. It computes a tempogram, a representation of rhythmic periodicities in the input signal in the rhythm domain, by using FFT similarly to computation of spectrum in the frequency domain [1]. Additional features, including rhythmic centroid and a rhythmic counterpart of MFCCs, can be derived from this rhythmic representation. Check https://essentia.upf.edu/reference/std_RhythmTransform.html for more details.

Parameters
Name Type Attributes Default Description
melBands VectorVectorFloat

the energies in the mel bands

frameSize number <optional>
256

the frame size to compute the rhythm trasform

hopSize number <optional>
32

the hop size to compute the rhythm transform

Returns

{rhythm: 'consecutive frames in the rhythm domain'}

Details

RollOff( spectrum [, cutoff [, sampleRate ] ] ) → {object}

Description

This algorithm computes the roll-off frequency of a spectrum. The roll-off frequency is defined as the frequency under which some percentage (cutoff) of the total energy of the spectrum is contained. The roll-off frequency can be used to distinguish between harmonic (below roll-off) and noisy sounds (above roll-off). Check https://essentia.upf.edu/reference/std_RollOff.html for more details.

Parameters
Name Type Attributes Default Description
spectrum VectorFloat

the input audio spectrum (must have more than one elements)

cutoff number <optional>
0.85

the ratio of total energy to attain before yielding the roll-off frequency

sampleRate number <optional>
44100

the sampling rate of the audio signal (used to normalize rollOff) [Hz]

Returns

{rollOff: 'the roll-off frequency [Hz]'}

Details

SNR( frame [, MAAlpha [, MMSEAlpha [, NoiseAlpha [, frameSize [, noiseThreshold [, sampleRate [, useBroadbadNoiseCorrection ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes the SNR of the input audio in a frame-wise manner. The algorithm assumes that: 1. The noise is gaussian. 2. There is a region of noise (without signal) at the beginning of the stream in order to estimate the PSD of the noise.[1] Once the noise PSD is estimated, the algorithm relies on the Ephraim-Malah [2] recursion to estimate the SNR for each frequency bin. The algorithm also returns an overall (a single value for the whole spectrum) SNR estimation and an averaged overall SNR estimation using Exponential Moving Average filtering. This algorithm throws a Warning if less than 15 frames are used to estimte the noise PSD. Check https://essentia.upf.edu/reference/std_SNR.html for more details.

Parameters
Name Type Attributes Default Description
frame VectorFloat

the input audio frame

MAAlpha number <optional>
0.95

Alpha coefficient for the EMA SNR estimation [2]

MMSEAlpha number <optional>
0.98

Alpha coefficient for the MMSE estimation [1].

NoiseAlpha number <optional>
0.9

Alpha coefficient for the EMA noise estimation [2]

frameSize number <optional>
512

the size of the input frame

noiseThreshold number <optional>
-40

Threshold to detect frames without signal

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

useBroadbadNoiseCorrection boolean <optional>
true

flag to apply the -10 * log10(BW) broadband noise correction factor

Returns

{instantSNR: 'SNR value for the the current frame', averagedSNR: 'averaged SNR through an Exponential Moving Average filter', spectralSNR: 'instant SNR for each frequency bin'}

Details

SaturationDetector( frame [, differentialThreshold [, energyThreshold [, frameSize [, hopSize [, minimumDuration [, sampleRate ] ] ] ] ] ] ) → {object}

Description

this algorithm outputs the staring/ending locations of the saturated regions in seconds. Saturated regions are found by means of a tripe criterion: 1. samples in a saturated region should have more energy than a given threshold. 2. the difference between the samples in a saturated region should be smaller than a given threshold. 3. the duration of the saturated region should be longer than a given threshold. Check https://essentia.upf.edu/reference/std_SaturationDetector.html for more details.

Parameters
Name Type Attributes Default Description
frame VectorFloat

the input audio frame

differentialThreshold number <optional>
0.001

minimum difference between contiguous samples of the salturated regions

energyThreshold number <optional>
-1

mininimum energy of the samples in the saturated regions [dB]

frameSize number <optional>
512

expected input frame size

hopSize number <optional>
256

hop size used for the analysis

minimumDuration number <optional>
0.005

minimum duration of the saturated regions [ms]

sampleRate number <optional>
44100

sample rate used for the analysis

Returns

{starts: 'starting times of the detected saturated regions [s]', ends: 'ending times of the detected saturated regions [s]'}

Details

Scale( signal [, clipping [, factor [, maxAbsValue ] ] ] ) → {object}

Description

This algorithm scales the audio by the specified factor using clipping if required. Check https://essentia.upf.edu/reference/std_Scale.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input audio signal

clipping boolean <optional>
true

boolean flag whether to apply clipping or not

factor number <optional>
10

the multiplication factor by which the audio will be scaled

maxAbsValue number <optional>
1

the maximum value above which to apply clipping

Returns

{signal: 'the output audio signal'}

Details

SineSubtraction( frame, magnitudes, frequencies, phases [, fftSize [, hopSize [, sampleRate ] ] ] ) → {object}

Description

This algorithm subtracts the sinusoids computed with the sine model analysis from an input audio signal. It ouputs an audio signal. Check https://essentia.upf.edu/reference/std_SineSubtraction.html for more details.

Parameters
Name Type Attributes Default Description
frame VectorFloat

the input audio frame to subtract from

magnitudes VectorFloat

the magnitudes of the sinusoidal peaks

frequencies VectorFloat

the frequencies of the sinusoidal peaks [Hz]

phases VectorFloat

the phases of the sinusoidal peaks

fftSize number <optional>
512

the size of the FFT internal process (full spectrum size) and output frame. Minimum twice the hopsize.

hopSize number <optional>
128

the hop size between frames

sampleRate number <optional>
44100

the audio sampling rate [Hz]

Returns

{frame: 'the output audio frame'}

Details

SingleBeatLoudness( beat [, beatDuration [, beatWindowDuration [, frequencyBands [, onsetStart [, sampleRate ] ] ] ] ] ) → {object}

Description

This algorithm computes the spectrum energy of a single beat across the whole frequency range and on each specified frequency band given an audio segment. It detects the onset of the beat within the input segment, computes spectrum on a window starting on this onset, and estimates energy (see Energy and EnergyBandRatio algorithms). The frequency bands used by default are: 0-200 Hz, 200-400 Hz, 400-800 Hz, 800-1600 Hz, 1600-3200 Hz, 3200-22000Hz, following E. Scheirer [1]. Check https://essentia.upf.edu/reference/std_SingleBeatLoudness.html for more details.

Parameters
Name Type Attributes Default Description
beat VectorFloat

audio segement containing a beat

beatDuration number <optional>
0.05

window size for the beat's energy computation (the window starts at the onset) [s]

beatWindowDuration number <optional>
0.1

window size for the beat's onset detection [s]

frequencyBands Array.<any> <optional>
[0, 200, 400, 800, 1600, 3200, 22000]

frequency bands

onsetStart string <optional>
sumEnergy

criteria for finding the start of the beat

sampleRate number <optional>
44100

the audio sampling rate [Hz]

Returns

{loudness: 'the beat's energy across the whole spectrum', loudnessBandRatio: 'the beat's energy ratio for each band'}

Details

Slicer( audio [, endTimes [, sampleRate [, startTimes [, timeUnits ] ] ] ] ) → {object}

Description

This algorithm splits an audio signal into segments given their start and end times. Check https://essentia.upf.edu/reference/std_Slicer.html for more details.

Parameters
Name Type Attributes Default Description
audio VectorFloat

the input audio signal

endTimes Array.<any> <optional>
[]

the list of end times for the slices you want to extract

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

startTimes Array.<any> <optional>
[]

the list of start times for the slices you want to extract

timeUnits string <optional>
seconds

the units of time of the start and end times

Returns

{frame: 'the frames of the sliced input signal'}

Details

SpectralCentroidTime( array [, sampleRate ] ) → {object}

Description

This algorithm computes the spectral centroid of a signal in time domain. A first difference filter is applied to the input signal. Then the centroid is computed by dividing the norm of the resulting signal by the norm of the input signal. The centroid is given in hertz. References: [1] Udo Zölzer (2002). DAFX Digital Audio Effects pag.364-365 Check https://essentia.upf.edu/reference/std_SpectralCentroidTime.html for more details.

Parameters
Name Type Attributes Default Description
array VectorFloat

the input array

sampleRate number <optional>
44100

sampling rate of the input spectrum [Hz]

Returns

{centroid: 'the spectral centroid of the signal'}

Details

SpectralComplexity( spectrum [, magnitudeThreshold [, sampleRate ] ] ) → {object}

Description

This algorithm computes the spectral complexity of a spectrum. The spectral complexity is based on the number of peaks in the input spectrum. Check https://essentia.upf.edu/reference/std_SpectralComplexity.html for more details.

Parameters
Name Type Attributes Default Description
spectrum VectorFloat

the input spectrum

magnitudeThreshold number <optional>
0.005

the minimum spectral-peak magnitude that contributes to spectral complexity

sampleRate number <optional>
44100

the audio sampling rate [Hz]

Returns

{spectralComplexity: 'the spectral complexity of the input spectrum'}

Details

SpectralContrast( spectrum [, frameSize [, highFrequencyBound [, lowFrequencyBound [, neighbourRatio [, numberBands [, sampleRate [, staticDistribution ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes the Spectral Contrast feature of a spectrum. It is based on the Octave Based Spectral Contrast feature as described in [1]. The version implemented here is a modified version to improve discriminative power and robustness. The modifications are described in [2]. Check https://essentia.upf.edu/reference/std_SpectralContrast.html for more details.

Parameters
Name Type Attributes Default Description
spectrum VectorFloat

the audio spectrum

frameSize number <optional>
2048

the size of the fft frames

highFrequencyBound number <optional>
11000

the upper bound of the highest band

lowFrequencyBound number <optional>
20

the lower bound of the lowest band

neighbourRatio number <optional>
0.4

the ratio of the bins in the sub band used to calculate the peak and valley

numberBands number <optional>
6

the number of bands in the filter

sampleRate number <optional>
22050

the sampling rate of the audio signal

staticDistribution number <optional>
0.15

the ratio of the bins to distribute equally

Returns

{spectralContrast: 'the spectral contrast coefficients', spectralValley: 'the magnitudes of the valleys'}

Details

SpectralPeaks( spectrum [, magnitudeThreshold [, maxFrequency [, maxPeaks [, minFrequency [, orderBy [, sampleRate ] ] ] ] ] ] ) → {object}

Description

This algorithm extracts peaks from a spectrum. It is important to note that the peak algorithm is independent of an input that is linear or in dB, so one has to adapt the threshold to fit with the type of data fed to it. The algorithm relies on PeakDetection algorithm which is run with parabolic interpolation [1]. The exactness of the peak-searching depends heavily on the windowing type. It gives best results with dB input, a blackman-harris 92dB window and interpolation set to true. According to [1], spectral peak frequencies tend to be about twice as accurate when dB magnitude is used rather than just linear magnitude. For further information about the peak detection, see the description of the PeakDetection algorithm. Check https://essentia.upf.edu/reference/std_SpectralPeaks.html for more details.

Parameters
Name Type Attributes Default Description
spectrum VectorFloat

the input spectrum

magnitudeThreshold number <optional>
0

peaks below this given threshold are not outputted

maxFrequency number <optional>
5000

the maximum frequency of the range to evaluate [Hz]

maxPeaks number <optional>
100

the maximum number of returned peaks

minFrequency number <optional>
0

the minimum frequency of the range to evaluate [Hz]

orderBy string <optional>
frequency

the ordering type of the outputted peaks (ascending by frequency or descending by magnitude)

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

Returns

{frequencies: 'the frequencies of the spectral peaks [Hz]', magnitudes: 'the magnitudes of the spectral peaks'}

Details

SpectralWhitening( spectrum, frequencies, magnitudes [, maxFrequency [, sampleRate ] ] ) → {object}

Description

Performs spectral whitening of spectral peaks of a spectrum. The algorithm works in dB scale, but the conversion is done by the algorithm so input should be in linear scale. The concept of 'whitening' refers to 'white noise' or a non-zero flat spectrum. It first computes a spectral envelope similar to the 'true envelope' in [1], and then modifies the amplitude of each peak relative to the envelope. For example, the predominant peaks will have a value close to 0dB because they are very close to the envelope. On the other hand, minor peaks between significant peaks will have lower amplitudes such as -30dB. Check https://essentia.upf.edu/reference/std_SpectralWhitening.html for more details.

Parameters
Name Type Attributes Default Description
spectrum VectorFloat

the audio linear spectrum

frequencies VectorFloat

the spectral peaks' linear frequencies

magnitudes VectorFloat

the spectral peaks' linear magnitudes

maxFrequency number <optional>
5000

max frequency to apply whitening to [Hz]

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

Returns

{magnitudes: 'the whitened spectral peaks' linear magnitudes'}

Details

Spectrum( frame [, size ] ) → {object}

Description

This algorithm computes the magnitude spectrum of an array of Reals. The resulting magnitude spectrum has a size which is half the size of the input array plus one. Bins contain raw (linear) magnitude values. Check https://essentia.upf.edu/reference/std_Spectrum.html for more details.

Parameters
Name Type Attributes Default Description
frame VectorFloat

the input audio frame

size number <optional>
2048

the expected size of the input audio signal (this is an optional parameter to optimize memory allocation)

Returns

{spectrum: 'magnitude spectrum of the input audio signal'}

Details

SpectrumCQ( frame [, binsPerOctave [, minFrequency [, minimumKernelSize [, numberBins [, sampleRate [, scale [, threshold [, windowType [, zeroPhase ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes the magnitude of the Constant-Q spectrum. See ConstantQ algorithm for more details. Check https://essentia.upf.edu/reference/std_SpectrumCQ.html for more details.

Parameters
Name Type Attributes Default Description
frame VectorFloat

the input audio frame

binsPerOctave number <optional>
12

number of bins per octave

minFrequency number <optional>
32.7

minimum frequency [Hz]

minimumKernelSize number <optional>
4

minimum size allowed for frequency kernels

numberBins number <optional>
84

number of frequency bins, starting at minFrequency

sampleRate number <optional>
44100

FFT sampling rate [Hz]

scale number <optional>
1

filters scale. Larger values use longer windows

threshold number <optional>
0.01

bins whose magnitude is below this quantile are discarded

windowType string <optional>
hann

the window type, which can be 'hamming', 'hann', 'triangular', 'square' or 'blackmanharrisXX'

zeroPhase boolean <optional>
true

a boolean value that enables zero-phase windowing. Input audio frames should be windowed with the same phase mode

Returns

{spectrumCQ: 'the magnitude constant-Q spectrum'}

Details

SpectrumToCent( spectrum [, bands [, centBinResolution [, inputSize [, log [, minimumFrequency [, normalize [, sampleRate [, type ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes energy in triangular frequency bands of a spectrum equally spaced on the cent scale. Each band is computed to have a constant wideness in the cent scale. For each band the power-spectrum (mag-squared) is summed. Check https://essentia.upf.edu/reference/std_SpectrumToCent.html for more details.

Parameters
Name Type Attributes Default Description
spectrum VectorFloat

the input spectrum (must be greater than size one)

bands number <optional>
720

number of bins to compute. Default is 720 (6 octaves with the default 'centBinResolution')

centBinResolution number <optional>
10

Width of each band in cents. Default is 10 cents

inputSize number <optional>
32768

the size of the spectrum

log boolean <optional>
true

compute log-energies (log10 (1 + energy))

minimumFrequency number <optional>
164

central frequency of the first band of the bank [Hz]

normalize string <optional>
unit_sum

use unit area or vertex equal to 1 triangles.

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

type string <optional>
power

use magnitude or power spectrum

Returns

{bands: 'the energy in each band', frequencies: 'the central frequency of each band'}

Details

Spline( x [, beta1 [, beta2 [, type [, xPoints [, yPoints ] ] ] ] ] ) → {object}

Description

Evaluates a piecewise spline of type b, beta or quadratic. The input value, i.e. the point at which the spline is to be evaluated typically should be between xPoins[0] and xPoinst[size-1]. If the value lies outside this range, extrapolation is used. Regarding spline types: - B: evaluates a cubic B spline approximant. - Beta: evaluates a cubic beta spline approximant. For beta splines parameters 'beta1' and 'beta2' can be supplied. For no bias set beta1 to 1 and for no tension set beta2 to 0. Note that if beta1=1 and beta2=0, the cubic beta becomes a cubic B spline. On the other hand if beta1=1 and beta2 is large the beta spline turns into a linear spline. - Quadratic: evaluates a piecewise quadratic spline at a point. Note that size of input must be odd. Check https://essentia.upf.edu/reference/std_Spline.html for more details.

Parameters
Name Type Attributes Default Description
x number

the input coordinate (x-axis)

beta1 number <optional>
1

the skew or bias parameter (only available for type beta)

beta2 number <optional>
0

the tension parameter

type string <optional>
b

the type of spline to be computed

xPoints Array.<any> <optional>
[0, 1]

the x-coordinates where data is specified (the points must be arranged in ascending order and cannot contain duplicates)

yPoints Array.<any> <optional>
[0, 1]

the y-coordinates to be interpolated (i.e. the known data)

Returns

{y: 'the value of the spline at x'}

Details

SprModelAnal( frame [, fftSize [, freqDevOffset [, freqDevSlope [, hopSize [, magnitudeThreshold [, maxFrequency [, maxPeaks [, maxnSines [, minFrequency [, orderBy [, sampleRate ] ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes the sinusoidal plus residual model analysis. Check https://essentia.upf.edu/reference/std_SprModelAnal.html for more details.

Parameters
Name Type Attributes Default Description
frame VectorFloat

the input frame

fftSize number <optional>
2048

the size of the internal FFT size (full spectrum size)

freqDevOffset number <optional>
20

minimum frequency deviation at 0Hz

freqDevSlope number <optional>
0.01

slope increase of minimum frequency deviation

hopSize number <optional>
512

the hop size between frames

magnitudeThreshold number <optional>
0

peaks below this given threshold are not outputted

maxFrequency number <optional>
5000

the maximum frequency of the range to evaluate [Hz]

maxPeaks number <optional>
100

the maximum number of returned peaks

maxnSines number <optional>
100

maximum number of sines per frame

minFrequency number <optional>
0

the minimum frequency of the range to evaluate [Hz]

orderBy string <optional>
frequency

the ordering type of the outputted peaks (ascending by frequency or descending by magnitude)

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

Returns

{frequencies: 'the frequencies of the sinusoidal peaks [Hz]', magnitudes: 'the magnitudes of the sinusoidal peaks', phases: 'the phases of the sinusoidal peaks', res: 'output residual frame'}

Details

SprModelSynth( magnitudes, frequencies, phases, res [, fftSize [, hopSize [, sampleRate ] ] ] ) → {object}

Description

This algorithm computes the sinusoidal plus residual model synthesis from SPS model analysis. Check https://essentia.upf.edu/reference/std_SprModelSynth.html for more details.

Parameters
Name Type Attributes Default Description
magnitudes VectorFloat

the magnitudes of the sinusoidal peaks

frequencies VectorFloat

the frequencies of the sinusoidal peaks [Hz]

phases VectorFloat

the phases of the sinusoidal peaks

res VectorFloat

the residual frame

fftSize number <optional>
2048

the size of the output FFT frame (full spectrum size)

hopSize number <optional>
512

the hop size between frames

sampleRate number <optional>
44100

the audio sampling rate [Hz]

Returns

{frame: 'the output audio frame of the Sinusoidal Plus Stochastic model', sineframe: 'the output audio frame for sinusoidal component ', resframe: 'the output audio frame for stochastic component '}

Details

SpsModelAnal( frame [, fftSize [, freqDevOffset [, freqDevSlope [, hopSize [, magnitudeThreshold [, maxFrequency [, maxPeaks [, maxnSines [, minFrequency [, orderBy [, sampleRate [, stocf ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes the stochastic model analysis. Check https://essentia.upf.edu/reference/std_SpsModelAnal.html for more details.

Parameters
Name Type Attributes Default Description
frame VectorFloat

the input frame

fftSize number <optional>
2048

the size of the internal FFT size (full spectrum size)

freqDevOffset number <optional>
20

minimum frequency deviation at 0Hz

freqDevSlope number <optional>
0.01

slope increase of minimum frequency deviation

hopSize number <optional>
512

the hop size between frames

magnitudeThreshold number <optional>
0

peaks below this given threshold are not outputted

maxFrequency number <optional>
5000

the maximum frequency of the range to evaluate [Hz]

maxPeaks number <optional>
100

the maximum number of returned peaks

maxnSines number <optional>
100

maximum number of sines per frame

minFrequency number <optional>
0

the minimum frequency of the range to evaluate [Hz]

orderBy string <optional>
frequency

the ordering type of the outputted peaks (ascending by frequency or descending by magnitude)

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

stocf number <optional>
0.2

decimation factor used for the stochastic approximation

Returns

{frequencies: 'the frequencies of the sinusoidal peaks [Hz]', magnitudes: 'the magnitudes of the sinusoidal peaks', phases: 'the phases of the sinusoidal peaks', stocenv: 'the stochastic envelope'}

Details

SpsModelSynth( magnitudes, frequencies, phases, stocenv [, fftSize [, hopSize [, sampleRate [, stocf ] ] ] ] ) → {object}

Description

This algorithm computes the sinusoidal plus stochastic model synthesis from SPS model analysis. Check https://essentia.upf.edu/reference/std_SpsModelSynth.html for more details.

Parameters
Name Type Attributes Default Description
magnitudes VectorFloat

the magnitudes of the sinusoidal peaks

frequencies VectorFloat

the frequencies of the sinusoidal peaks [Hz]

phases VectorFloat

the phases of the sinusoidal peaks

stocenv VectorFloat

the stochastic envelope

fftSize number <optional>
2048

the size of the output FFT frame (full spectrum size)

hopSize number <optional>
512

the hop size between frames

sampleRate number <optional>
44100

the audio sampling rate [Hz]

stocf number <optional>
0.2

decimation factor used for the stochastic approximation

Returns

{frame: 'the output audio frame of the Sinusoidal Plus Stochastic model', sineframe: 'the output audio frame for sinusoidal component ', stocframe: 'the output audio frame for stochastic component '}

Details

StartStopCut( audio [, frameSize [, hopSize [, maximumStartTime [, maximumStopTime [, sampleRate [, threshold ] ] ] ] ] ] ) → {object}

Description

This algorithm outputs if there is a cut at the beginning or at the end of the audio by locating the first and last non-silent frames and comparing their positions to the actual beginning and end of the audio. The input audio is considered to be cut at the beginning (or the end) and the corresponding flag is activated if the first (last) non-silent frame occurs before (after) the configurable time threshold. Check https://essentia.upf.edu/reference/std_StartStopCut.html for more details.

Parameters
Name Type Attributes Default Description
audio VectorFloat

the input audio

frameSize number <optional>
256

the frame size for the internal power analysis

hopSize number <optional>
256

the hop size for the internal power analysis

maximumStartTime number <optional>
10

if the first non-silent frame occurs before maximumStartTime startCut is activated [ms]

maximumStopTime number <optional>
10

if the last non-silent frame occurs after maximumStopTime to the end stopCut is activated [ms]

sampleRate number <optional>
44100

the sample rate

threshold number <optional>
-60

the threshold below which average energy is defined as silence [dB]

Returns

{startCut: '1 if there is a cut at the begining of the audio', stopCut: '1 if there is a cut at the end of the audio'}

Details

StartStopSilence( frame [, threshold ] ) → {object}

Description

This algorithm outputs the frame at which sound begins and the frame at which sound ends. Check https://essentia.upf.edu/reference/std_StartStopSilence.html for more details.

Parameters
Name Type Attributes Default Description
frame VectorFloat

the input audio frames

threshold number <optional>
-60

the threshold below which average energy is defined as silence [dB]

Returns

{startFrame: 'number of the first non-silent frame', stopFrame: 'number of the last non-silent frame'}

Details

StochasticModelAnal( frame [, fftSize [, hopSize [, sampleRate [, stocf ] ] ] ] ) → {object}

Description

This algorithm computes the stochastic model analysis. It gets the resampled spectral envelope of the stochastic component. Check https://essentia.upf.edu/reference/std_StochasticModelAnal.html for more details.

Parameters
Name Type Attributes Default Description
frame VectorFloat

the input frame

fftSize number <optional>
2048

the size of the internal FFT size (full spectrum size)

hopSize number <optional>
512

the hop size between frames

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

stocf number <optional>
0.2

decimation factor used for the stochastic approximation

Returns

{stocenv: 'the stochastic envelope'}

Details

StochasticModelSynth( stocenv [, fftSize [, hopSize [, sampleRate [, stocf ] ] ] ] ) → {object}

Description

This algorithm computes the stochastic model synthesis. It generates the noisy spectrum from a resampled spectral envelope of the stochastic component. Check https://essentia.upf.edu/reference/std_StochasticModelSynth.html for more details.

Parameters
Name Type Attributes Default Description
stocenv VectorFloat

the stochastic envelope input

fftSize number <optional>
2048

the size of the internal FFT size (full spectrum size)

hopSize number <optional>
512

the hop size between frames

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

stocf number <optional>
0.2

decimation factor used for the stochastic approximation

Returns

{frame: 'the output frame'}

Details

StrongDecay( signal [, sampleRate ] ) → {object}

Description

This algorithm computes the Strong Decay of an audio signal. The Strong Decay is built from the non-linear combination of the signal energy and the signal temporal centroid, the latter being the balance of the absolute value of the signal. A signal containing a temporal centroid near its start boundary and a strong energy is said to have a strong decay. Check https://essentia.upf.edu/reference/std_StrongDecay.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input audio signal

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

Returns

{strongDecay: 'the strong decay'}

Details

StrongPeak( spectrum ) → {object}

Description

This algorithm computes the Strong Peak of a spectrum. The Strong Peak is defined as the ratio between the spectrum's maximum peak's magnitude and the "bandwidth" of the peak above a threshold (half its amplitude). This ratio reveals whether the spectrum presents a very "pronounced" maximum peak (i.e. the thinner and the higher the maximum of the spectrum is, the higher the ratio value). Check https://essentia.upf.edu/reference/std_StrongPeak.html for more details.

Parameters
Name Type Description
spectrum VectorFloat

the input spectrum (must be greater than one element and cannot contain negative values)

Returns

{strongPeak: 'the Strong Peak ratio'}

Details

SuperFluxExtractor( signal [, combine [, frameSize [, hopSize [, ratioThreshold [, sampleRate [, threshold ] ] ] ] ] ] ) → {object}

Description

This algorithm detects onsets given an audio signal using SuperFlux algorithm. This implementation is based on the available reference implementation in python [2]. The algorithm computes spectrum of the input signal, summarizes it into triangular band energies, and computes a onset detection function based on spectral flux tracking spectral trajectories with a maximum filter (SuperFluxNovelty). The peaks of the function are then detected (SuperFluxPeaks). Check https://essentia.upf.edu/reference/std_SuperFluxExtractor.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the audio input signal

combine number <optional>
20

time threshold for double onsets detections (ms)

frameSize number <optional>
2048

the frame size for computing low-level features

hopSize number <optional>
256

the hop size for computing low-level features

ratioThreshold number <optional>
16

ratio threshold for peak picking with respect to novelty_signal/novelty_average rate, use 0 to disable it (for low-energy onsets)

sampleRate number <optional>
44100

the audio sampling rate [Hz]

threshold number <optional>
0.05

threshold for peak peaking with respect to the difference between novelty_signal and average_signal (for onsets in ambient noise)

Returns

{onsets: 'the onsets times'}

Details

SuperFluxNovelty( bands [, binWidth [, frameWidth ] ] ) → {object}

Description

Onset detection function for Superflux algorithm. See SuperFluxExtractor for more details. Check https://essentia.upf.edu/reference/std_SuperFluxNovelty.html for more details.

Parameters
Name Type Attributes Default Description
bands VectorVectorFloat

the input bands spectrogram

binWidth number <optional>
3

filter width (number of frequency bins)

frameWidth number <optional>
2

differentiation offset (compute the difference with the N-th previous frame)

Returns

{differences: 'SuperFlux novelty curve'}

Details

SuperFluxPeaks( novelty [, combine [, frameRate [, pre_avg [, pre_max [, ratioThreshold [, threshold ] ] ] ] ] ] ) → {object}

Description

This algorithm detects peaks of an onset detection function computed by the SuperFluxNovelty algorithm. See SuperFluxExtractor for more details. Check https://essentia.upf.edu/reference/std_SuperFluxPeaks.html for more details.

Parameters
Name Type Attributes Default Description
novelty VectorFloat

the input onset detection function

combine number <optional>
30

time threshold for double onsets detections (ms)

frameRate number <optional>
172

frameRate

pre_avg number <optional>
100

look back duration for moving average filter [ms]

pre_max number <optional>
30

look back duration for moving maximum filter [ms]

ratioThreshold number <optional>
16

ratio threshold for peak picking with respect to novelty_signal/novelty_average rate, use 0 to disable it (for low-energy onsets)

threshold number <optional>
0.05

threshold for peak peaking with respect to the difference between novelty_signal and average_signal (for onsets in ambient noise)

Returns

{peaks: 'detected peaks' instants [s]'}

Details

TCToTotal( envelope ) → {object}

Description

This algorithm calculates the ratio of the temporal centroid to the total length of a signal envelope. This ratio shows how the sound is 'balanced'. Its value is close to 0 if most of the energy lies at the beginning of the sound (e.g. decrescendo or impulsive sounds), close to 0.5 if the sound is symetric (e.g. 'delta unvarying' sounds), and close to 1 if most of the energy lies at the end of the sound (e.g. crescendo sounds). Check https://essentia.upf.edu/reference/std_TCToTotal.html for more details.

Parameters
Name Type Description
envelope VectorFloat

the envelope of the signal (its length must be greater than 1

Returns

{TCToTotal: 'the temporal centroid to total length ratio'}

Details

TempoScaleBands( bands [, bandsGain [, frameTime ] ] ) → {object}

Description

This algorithm computes features for tempo tracking to be used with the TempoTap algorithm. See standard_rhythmextractor_tempotap in examples folder. Check https://essentia.upf.edu/reference/std_TempoScaleBands.html for more details.

Parameters
Name Type Attributes Default Description
bands VectorFloat

the audio power spectrum divided into bands

bandsGain Array.<any> <optional>
[2, 3, 2, 1, 1.20000004768, 2, 3, 2.5]

gain for each bands

frameTime number <optional>
512

the frame rate in samples

Returns

{scaledBands: 'the output bands after scaling', cumulativeBands: 'cumulative sum of the output bands before scaling'}

Details

TempoTap( featuresFrame [, frameHop [, frameSize [, maxTempo [, minTempo [, numberFrames [, sampleRate [, tempoHints ] ] ] ] ] ] ] ) → {object}

Description

This algorithm estimates the periods and phases of a periodic signal, represented by a sequence of values of any number of detection functions, such as energy bands, onsets locations, etc. It requires to be sequentially run on a vector of such values ("featuresFrame") for each particular audio frame in order to get estimations related to that frames. The estimations are done for each detection function separately, utilizing the latest "frameHop" frames, including the present one, to compute autocorrelation. Empty estimations will be returned until enough frames are accumulated in the algorithm's buffer. The algorithm uses elements of the following beat-tracking methods: - BeatIt, elaborated by Fabien Gouyon and Simon Dixon (input features) [1] - Multi-comb filter with Rayleigh weighting, Mathew Davies [2] Check https://essentia.upf.edu/reference/std_TempoTap.html for more details.

Parameters
Name Type Attributes Default Description
featuresFrame VectorFloat

input temporal features of a frame

frameHop number <optional>
1024

number of feature frames separating two evaluations

frameSize number <optional>
256

number of audio samples in a frame

maxTempo number <optional>
208

fastest tempo allowed to be detected [bpm]

minTempo number <optional>
40

slowest tempo allowed to be detected [bpm]

numberFrames number <optional>
1024

number of feature frames to buffer on

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

tempoHints Array.<any> <optional>
[]

optional list of initial beat locations, to favor the detection of pre-determined tempo period and beats alignment [s]

Returns

{periods: 'list of tempo estimates found for each input feature, in frames', phases: 'list of initial phase candidates found for each input feature, in frames'}

Details

TempoTapDegara( onsetDetections [, maxTempo [, minTempo [, resample [, sampleRateODF ] ] ] ] ) → {object}

Description

This algorithm estimates beat positions given an onset detection function. The detection function is partitioned into 6-second frames with a 1.5-second increment, and the autocorrelation is computed for each frame, and is weighted by a tempo preference curve [2]. Periodicity estimations are done frame-wisely, searching for the best match with the Viterbi algorith [3]. The estimated periods are then passed to the probabilistic beat tracking algorithm [1], which computes beat positions. Check https://essentia.upf.edu/reference/std_TempoTapDegara.html for more details.

Parameters
Name Type Attributes Default Description
onsetDetections VectorFloat

the input frame-wise vector of onset detection values

maxTempo number <optional>
208

fastest tempo allowed to be detected [bpm]

minTempo number <optional>
40

slowest tempo allowed to be detected [bpm]

resample string <optional>
none

use upsampling of the onset detection function (may increase accuracy)

sampleRateODF number <optional>
86.1328

the sampling rate of the onset detection function [Hz]

Returns

{ticks: 'the list of resulting ticks [s]'}

Details

TempoTapMaxAgreement( tickCandidates ) → {object}

Description

This algorithm outputs beat positions and confidence of their estimation based on the maximum mutual agreement between beat candidates estimated by different beat trackers (or using different features). Check https://essentia.upf.edu/reference/std_TempoTapMaxAgreement.html for more details.

Parameters
Name Type Description
tickCandidates VectorVectorFloat

the tick candidates estimated using different beat trackers (or features) [s]

Returns

{ticks: 'the list of resulting ticks [s]', confidence: 'confidence with which the ticks were detected [0, 5.32]'}

Details

TempoTapTicks( periods, phases [, frameHop [, hopSize [, sampleRate ] ] ] ) → {object}

Description

This algorithm builds the list of ticks from the period and phase candidates given by the TempoTap algorithm. Check https://essentia.upf.edu/reference/std_TempoTapTicks.html for more details.

Parameters
Name Type Attributes Default Description
periods VectorFloat

tempo period candidates for the current frame, in frames

phases VectorFloat

tempo ticks phase candidates for the current frame, in frames

frameHop number <optional>
512

number of feature frames separating two evaluations

hopSize number <optional>
256

number of audio samples per features

sampleRate number <optional>
44100

sampling rate of the audio signal [Hz]

Returns

{ticks: 'the list of resulting ticks [s]', matchingPeriods: 'list of matching periods [s]'}

Details

TensorflowInputMusiCNN( frame ) → {object}

Description

This algorithm computes mel-bands with a particular parametrization specific to MusiCNN based models. Check https://essentia.upf.edu/reference/std_TensorflowInputMusiCNN.html for more details.

Parameters
Name Type Description
frame VectorFloat

the audio frame

Returns

{bands: 'the log compressed mel bands'}

Details

TensorflowInputVGGish( frame ) → {object}

Description

This algorithm computes mel-bands with a particular parametrization specific to VGGish based models. Check https://essentia.upf.edu/reference/std_TensorflowInputVGGish.html for more details.

Parameters
Name Type Description
frame VectorFloat

the audio frame

Returns

{bands: 'the log compressed mel bands'}

Details

TonalExtractor( signal [, frameSize [, hopSize [, tuningFrequency ] ] ] ) → {object}

Description

This algorithm computes tonal features for an audio signal Check https://essentia.upf.edu/reference/std_TonalExtractor.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the audio input signal

frameSize number <optional>
4096

the framesize for computing tonal features

hopSize number <optional>
2048

the hopsize for computing tonal features

tuningFrequency number <optional>
440

the tuning frequency of the input signal

Returns

{chords_changes_rate: 'See ChordsDescriptors algorithm documentation', chords_histogram: 'See ChordsDescriptors algorithm documentation', chords_key: 'See ChordsDescriptors algorithm documentation', chords_number_rate: 'See ChordsDescriptors algorithm documentation', chords_progression: 'See ChordsDetection algorithm documentation', chords_scale: 'See ChordsDetection algorithm documentation', chords_strength: 'See ChordsDetection algorithm documentation', hpcp: 'See HPCP algorithm documentation', hpcp_highres: 'See HPCP algorithm documentation', key_key: 'See Key algorithm documentation', key_scale: 'See Key algorithm documentation', key_strength: 'See Key algorithm documentation'}

Details

TonicIndianArtMusic( signal [, binResolution [, frameSize [, harmonicWeight [, hopSize [, magnitudeCompression [, magnitudeThreshold [, maxTonicFrequency [, minTonicFrequency [, numberHarmonics [, numberSaliencePeaks [, referenceFrequency [, sampleRate ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm estimates the tonic frequency of the lead artist in Indian art music. It uses multipitch representation of the audio signal (pitch salience) to compute a histogram using which the tonic is identified as one of its peak. The decision is made based on the distance between the prominent peaks, the classification is done using a decision tree. Check https://essentia.upf.edu/reference/std_TonicIndianArtMusic.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input signal

binResolution number <optional>
10

salience function bin resolution [cents]

frameSize number <optional>
2048

the frame size for computing pitch saliecnce

harmonicWeight number <optional>
0.85

harmonic weighting parameter (weight decay ratio between two consequent harmonics, =1 for no decay)

hopSize number <optional>
512

the hop size with which the pitch salience function was computed

magnitudeCompression number <optional>
1

magnitude compression parameter (=0 for maximum compression, =1 for no compression)

magnitudeThreshold number <optional>
40

peak magnitude threshold (maximum allowed difference from the highest peak in dBs)

maxTonicFrequency number <optional>
375

the maximum allowed tonic frequency [Hz]

minTonicFrequency number <optional>
100

the minimum allowed tonic frequency [Hz]

numberHarmonics number <optional>
20

number of considered hamonics

numberSaliencePeaks number <optional>
5

number of top peaks of the salience function which should be considered for constructing histogram

referenceFrequency number <optional>
55

the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

Returns

{tonic: 'the estimated tonic frequency [Hz]'}

Details

TriangularBands( spectrum [, frequencyBands [, inputSize [, log [, normalize [, sampleRate [, type [, weighting ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes energy in triangular frequency bands of a spectrum. The arbitrary number of overlapping bands can be specified. For each band the power-spectrum (mag-squared) is summed. Check https://essentia.upf.edu/reference/std_TriangularBands.html for more details.

Parameters
Name Type Attributes Default Description
spectrum VectorFloat

the input spectrum (must be greater than size one)

frequencyBands Array.<any> <optional>
[21.533203125, 43.06640625, 64.599609375, 86.1328125, 107.666015625, 129.19921875, 150.732421875, 172.265625, 193.798828125, 215.33203125, 236.865234375, 258.3984375, 279.931640625, 301.46484375, 322.998046875, 344.53125, 366.064453125, 387.59765625, 409.130859375, 430.6640625, 452.197265625, 473.73046875, 495.263671875, 516.796875, 538.330078125, 559.86328125, 581.396484375, 602.9296875, 624.462890625, 645.99609375, 667.529296875, 689.0625, 710.595703125, 732.12890625, 753.662109375, 775.1953125, 796.728515625, 839.794921875, 861.328125, 882.861328125, 904.39453125, 925.927734375, 968.994140625, 990.52734375, 1012.06054688, 1055.12695312, 1076.66015625, 1098.19335938, 1141.25976562, 1184.32617188, 1205.859375, 1248.92578125, 1270.45898438, 1313.52539062, 1356.59179688, 1399.65820312, 1442.72460938, 1485.79101562, 1528.85742188, 1571.92382812, 1614.99023438, 1658.05664062, 1701.12304688, 1765.72265625, 1808.7890625, 1873.38867188, 1916.45507812, 1981.0546875, 2024.12109375, 2088.72070312, 2153.3203125, 2217.91992188, 2282.51953125, 2347.11914062, 2411.71875, 2497.8515625, 2562.45117188, 2627.05078125, 2713.18359375, 2799.31640625, 2885.44921875, 2950.04882812, 3036.18164062, 3143.84765625, 3229.98046875, 3316.11328125, 3423.77929688, 3509.91210938, 3617.578125, 3725.24414062, 3832.91015625, 3940.57617188, 4069.77539062, 4177.44140625, 4306.640625, 4435.83984375, 4565.0390625, 4694.23828125, 4844.97070312, 4974.16992188, 5124.90234375, 5275.63476562, 5426.3671875, 5577.09960938, 5749.36523438, 5921.63085938, 6093.89648438, 6266.16210938, 6459.9609375, 6653.75976562, 6847.55859375, 7041.35742188, 7256.68945312, 7450.48828125, 7687.35351562, 7902.68554688, 8139.55078125, 8376.41601562, 8613.28125, 8871.6796875, 9130.078125, 9388.4765625, 9668.40820312, 9948.33984375, 10249.8046875, 10551.2695312, 10852.734375, 11175.7324219, 11498.7304688, 11843.2617188, 12187.7929688, 12553.8574219, 12919.921875, 13285.9863281, 13673.5839844, 14082.7148438, 14491.8457031, 14922.5097656, 15353.1738281, 15805.3710938, 16257.5683594]

list of frequency ranges into which the spectrum is divided (these must be in ascending order and connot contain duplicates),each triangle is build as x(i-1)=0, x(i)=1, x(i+1)=0 over i, the resulting number of bands is size of input array - 2

inputSize number <optional>
1025

the size of the spectrum

log boolean <optional>
true

compute log-energies (log10 (1 + energy))

normalize string <optional>
unit_sum

spectrum bin weights to use for each triangular band: 'unit_max' to make each triangle vertex equal to 1, 'unit_sum' to make each triangle area equal to 1 summing the actual weights of spectrum bins, 'unit_area' to make each triangle area equal to 1 normalizing the weights of each triangle by its bandwidth

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

type string <optional>
power

use magnitude or power spectrum

weighting string <optional>
linear

type of weighting function for determining triangle area

Returns

{bands: 'the energy in each band'}

Details

TriangularBarkBands( spectrum [, highFrequencyBound [, inputSize [, log [, lowFrequencyBound [, normalize [, numberBands [, sampleRate [, type [, weighting ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes energy in the bark bands of a spectrum. It is different to the regular BarkBands algorithm in that is more configurable so that it can be used in the BFCC algorithm to produce output similar to Rastamat (http://www.ee.columbia.edu/ln/rosa/matlab/rastamat/) See the BFCC algorithm documentation for more information as to why you might want to choose this over Mel frequency analysis It is recommended that the input "spectrum" be calculated by the Spectrum algorithm. Check https://essentia.upf.edu/reference/std_TriangularBarkBands.html for more details.

Parameters
Name Type Attributes Default Description
spectrum VectorFloat

the audio spectrum

highFrequencyBound number <optional>
22050

an upper-bound limit for the frequencies to be included in the bands

inputSize number <optional>
1025

the size of the spectrum

log boolean <optional>
false

compute log-energies (log10 (1 + energy))

lowFrequencyBound number <optional>
0

a lower-bound limit for the frequencies to be included in the bands

normalize string <optional>
unit_sum

'unit_max' makes the vertex of all the triangles equal to 1, 'unit_sum' makes the area of all the triangles equal to 1

numberBands number <optional>
24

the number of output bands

sampleRate number <optional>
44100

the sample rate

type string <optional>
power

'power' to output squared units, 'magnitude' to keep it as the input

weighting string <optional>
warping

type of weighting function for determining triangle area

Returns

{bands: 'the energy in bark bands'}

Details

Trimmer( signal [, checkRange [, endTime [, sampleRate [, startTime ] ] ] ] ) → {object}

Description

This algorithm extracts a segment of an audio signal given its start and end times. Giving "startTime" greater than "endTime" will raise an exception. Check https://essentia.upf.edu/reference/std_Trimmer.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input signal

checkRange boolean <optional>
false

check whether the specified time range for a slice fits the size of input signal (throw exception if not)

endTime number <optional>
1e+06

the end time of the slice you want to extract [s]

sampleRate number <optional>
44100

the sampling rate of the input audio signal [Hz]

startTime number <optional>
0

the start time of the slice you want to extract [s]

Returns

{signal: 'the trimmed signal'}

Details

Tristimulus( frequencies, magnitudes ) → {object}

Description

This algorithm calculates the tristimulus of a signal given its harmonic peaks. The tristimulus has been introduced as a timbre equivalent to the color attributes in the vision. Tristimulus measures the mixture of harmonics in a given sound, grouped into three sections. The first tristimulus measures the relative weight of the first harmonic; the second tristimulus measures the relative weight of the second, third, and fourth harmonics taken together; and the third tristimulus measures the relative weight of all the remaining harmonics. Check https://essentia.upf.edu/reference/std_Tristimulus.html for more details.

Parameters
Name Type Description
frequencies VectorFloat

the frequencies of the harmonic peaks ordered by frequency

magnitudes VectorFloat

the magnitudes of the harmonic peaks ordered by frequency

Returns

{tristimulus: 'a three-element vector that measures the mixture of harmonics of the given spectrum'}

Details

TruePeakDetector( signal [, blockDC [, emphasise [, oversamplingFactor [, quality [, sampleRate [, threshold [, version ] ] ] ] ] ] ] ) → {object}

Description

This algorithm implements a “true-peak” level meter for clipping detection. According to the ITU-R recommendations, “true-peak” values overcoming the full-scale range are potential sources of “clipping in subsequent processes, such as within particular D/A converters or during sample-rate conversion”. The ITU-R BS.1770-4[1] (by default) and the ITU-R BS.1770-2[2] signal-flows can be used. Go to the references for information about the differences. Only the peaks (if any) exceeding the configurable amplitude threshold are returned. Note: the parameters 'blockDC' and 'emphasise' work only when 'version' is set to 2. References: [1] Series, B. S. (2011). Recommendation ITU-R BS.1770-4. Algorithms to measure audio programme loudness and true-peak audio level, https://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.1770-4-201510-I!!PDF-E.pdf [2] Series, B. S. (2011). Recommendation ITU-R BS.1770-2. Algorithms to measure audio programme loudness and true-peak audio level, https://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.1770-2-201103-S!!PDF-E.pdf Check https://essentia.upf.edu/reference/std_TruePeakDetector.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input audio signal

blockDC boolean <optional>
false

flag to activate the optional DC blocker

emphasise boolean <optional>
false

flag to activate the optional emphasis filter

oversamplingFactor number <optional>
4

times the signal is oversapled

quality number <optional>
1

type of interpolation applied (see libresmple)

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

threshold number <optional>
-0.0002

threshold to detect peaks [dB]

version number <optional>
4

algorithm version

Returns

{peakLocations: 'the peak locations in the ouput signal', output: 'the processed signal'}

Details

TuningFrequency( frequencies, magnitudes [, resolution ] ) → {object}

Description

This algorithm estimates the tuning frequency give a sequence/set of spectral peaks. The result is the tuning frequency in Hz, and its distance from 440Hz in cents. This version is slightly adapted from the original algorithm [1], but gives the same results. Check https://essentia.upf.edu/reference/std_TuningFrequency.html for more details.

Parameters
Name Type Attributes Default Description
frequencies VectorFloat

the frequencies of the spectral peaks [Hz]

magnitudes VectorFloat

the magnitudes of the spectral peaks

resolution number <optional>
1

resolution in cents (logarithmic scale, 100 cents = 1 semitone) for tuning frequency determination

Returns

{tuningFrequency: 'the tuning frequency [Hz]', tuningCents: 'the deviation from 440 Hz (between -35 to 65 cents)'}

Details

TuningFrequencyExtractor( signal [, frameSize [, hopSize ] ] ) → {object}

Description

This algorithm extracts the tuning frequency of an audio signal Check https://essentia.upf.edu/reference/std_TuningFrequencyExtractor.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the audio input signal

frameSize number <optional>
4096

the frameSize for computing tuning frequency

hopSize number <optional>
2048

the hopsize for computing tuning frequency

Returns

{tuningFrequency: 'the computed tuning frequency'}

Details

UnaryOperator( array [, scale [, shift [, type ] ] ] ) → {object}

Description

This algorithm performs basic arithmetical operations element by element given an array. Note: - log and ln are equivalent to the natural logarithm - for log, ln, log10 and lin2db, x is clipped to 1e-30 for x<1e-30 - for x<0, sqrt(x) is invalid - scale and shift parameters define linear transformation to be applied to the resulting elements Check https://essentia.upf.edu/reference/std_UnaryOperator.html for more details.

Parameters
Name Type Attributes Default Description
array VectorFloat

the input array

scale number <optional>
1

multiply result by factor

shift number <optional>
0

shift result by value (add value)

type string <optional>
identity

the type of the unary operator to apply to input array

Returns

{array: 'the input array transformed by unary operation'}

Details

UnaryOperatorStream( array [, scale [, shift [, type ] ] ] ) → {object}

Description

This algorithm performs basic arithmetical operations element by element given an array. Note: - log and ln are equivalent to the natural logarithm - for log, ln, log10 and lin2db, x is clipped to 1e-30 for x<1e-30 - for x<0, sqrt(x) is invalid - scale and shift parameters define linear transformation to be applied to the resulting elements Check https://essentia.upf.edu/reference/std_UnaryOperatorStream.html for more details.

Parameters
Name Type Attributes Default Description
array VectorFloat

the input array

scale number <optional>
1

multiply result by factor

shift number <optional>
0

shift result by value (add value)

type string <optional>
identity

the type of the unary operator to apply to input array

Returns

{array: 'the input array transformed by unary operation'}

Details

Variance( array ) → {object}

Description

This algorithm computes the variance of an array. Check https://essentia.upf.edu/reference/std_Variance.html for more details.

Parameters
Name Type Description
array VectorFloat

the input array

Returns

{variance: 'the variance of the input array'}

Details

Vibrato( pitch [, maxExtend [, maxFrequency [, minExtend [, minFrequency [, sampleRate ] ] ] ] ] ) → {object}

Description

This algorithm detects the presence of vibrato and estimates its parameters given a pitch contour [Hz]. The result is the vibrato frequency in Hz and the extent (peak to peak) in cents. If no vibrato is detected in a frame, the output of both values is zero. Check https://essentia.upf.edu/reference/std_Vibrato.html for more details.

Parameters
Name Type Attributes Default Description
pitch VectorFloat

the pitch trajectory [Hz].

maxExtend number <optional>
250

maximum considered vibrato extent [cents]

maxFrequency number <optional>
8

maximum considered vibrato frequency [Hz]

minExtend number <optional>
50

minimum considered vibrato extent [cents]

minFrequency number <optional>
4

minimum considered vibrato frequency [Hz]

sampleRate number <optional>
344.531

sample rate of the input pitch contour

Returns

{vibratoFrequency: 'estimated vibrato frequency (or speed) [Hz]; zero if no vibrato was detected.', vibratoExtend: 'estimated vibrato extent (or depth) [cents]; zero if no vibrato was detected.'}

Details

WarpedAutoCorrelation( array [, maxLag [, sampleRate ] ] ) → {object}

Description

This algorithm computes the warped auto-correlation of an audio signal. The implementation is an adapted version of K. Schmidt's implementation of the matlab algorithm from the 'warped toolbox' by Aki Harma and Matti Karjalainen found [2]. For a detailed explanation of the algorithm, see [1]. This algorithm is only defined for positive lambda = 1.0674sqrt(2.0atan(0.00006583*sampleRate)/PI) - 0.1916, thus it will throw an exception when the supplied sampling rate does not pass the requirements. If maxLag is larger than the size of the input array, an exception is thrown. Check https://essentia.upf.edu/reference/std_WarpedAutoCorrelation.html for more details.

Parameters
Name Type Attributes Default Description
array VectorFloat

the array to be analyzed

maxLag number <optional>
1

the maximum lag for which the auto-correlation is computed (inclusive) (must be smaller than signal size)

sampleRate number <optional>
44100

the audio sampling rate [Hz]

Returns

{warpedAutoCorrelation: 'the warped auto-correlation vector'}

Details

Welch( frame [, averagingFrames [, fftSize [, frameSize [, sampleRate [, scaling [, windowType ] ] ] ] ] ] ) → {object}

Description

This algorithm estimates the Power Spectral Density of the input signal using the Welch's method [1]. The input should be fed with the overlapped audio frames. The algorithm stores internally therequired past frames to compute each output. Call reset() to clear the buffers. This implentation is based on Scipy [2] Check https://essentia.upf.edu/reference/std_Welch.html for more details.

Parameters
Name Type Attributes Default Description
frame VectorFloat

the input stereo audio signal

averagingFrames number <optional>
10

amount of frames to average

fftSize number <optional>
1024

size of the FFT. Zero padding is added if this is larger the input frame size.

frameSize number <optional>
512

the expected size of the input audio signal (this is an optional parameter to optimize memory allocation)

sampleRate number <optional>
44100

the sampling rate of the audio signal [Hz]

scaling string <optional>
density

'density' normalizes the result to the bandwidth while 'power' outputs the unnormalized power spectrum

windowType string <optional>
hann

the window type, which can be 'hamming', 'hann', 'triangular', 'square' or 'blackmanharrisXX'

Returns

{psd: 'Power Spectral Density [dB] or [dB/Hz]'}

Details

Windowing( frame [, normalized [, size [, type [, zeroPadding [, zeroPhase ] ] ] ] ] ) → {object}

Description

This algorithm applies windowing to an audio signal. It optionally applies zero-phase windowing and optionally adds zero-padding. The resulting windowed frame size is equal to the incoming frame size plus the number of padded zeros. By default, the available windows are normalized (to have an area of 1) and then scaled by a factor of 2. Check https://essentia.upf.edu/reference/std_Windowing.html for more details.

Parameters
Name Type Attributes Default Description
frame VectorFloat

the input audio frame

normalized boolean <optional>
true

a boolean value to specify whether to normalize windows (to have an area of 1) and then scale by a factor of 2

size number <optional>
1024

the window size

type string <optional>
hann

the window type, which can be 'hamming', 'hann', 'triangular', 'square' or 'blackmanharrisXX'

zeroPadding number <optional>
0

the size of the zero-padding

zeroPhase boolean <optional>
true

a boolean value that enables zero-phase windowing

Returns

{frame: 'the windowed audio frame'}

Details

ZeroCrossingRate( signal [, threshold ] ) → {object}

Description

This algorithm computes the zero-crossing rate of an audio signal. It is the number of sign changes between consecutive signal values divided by the total number of values. Noisy signals tend to have higher zero-crossing rate. In order to avoid small variations around zero caused by noise, a threshold around zero is given to consider a valid zerocrosing whenever the boundary is crossed. Check https://essentia.upf.edu/reference/std_ZeroCrossingRate.html for more details.

Parameters
Name Type Attributes Default Description
signal VectorFloat

the input signal

threshold number <optional>
0

the threshold which will be taken as the zero axis in both positive and negative sign

Returns

{zeroCrossingRate: 'the zero-crossing rate'}

Details