EssentiaExtractor

Name	Type	Attributes	Default	Description
`EssentiaWASM`	*
`isDebug`	boolean	<optional>	false

Methods

<async> getAudioBufferFromURL( audioURL, webAudioCtx ) → {AudioBuffer}

Description

Decode and returns the audio buffer of a given audio url or blob uri using Web Audio API. (NOTE: This method doesn't works on Safari browser)

Parameters

Name	Type	Description
`audioURL`	string	web url or blob uri of a audio file
`webAudioCtx`	AudioContext	an instance of Web Audio API `AudioContext`

Returns

Details

<async> getAudioChannelDataFromURL( audioURL, webAudioCtx [, channel ] ) → {Float32Array}

Description

Decode and returns the audio channel data from an given audio url or blob uri using Web Audio API. (NOTE: This method doesn't works on Safari browser)

Parameters

Name	Type	Attributes	Default	Description
`audioURL`	string			web url or blob uri of a audio file
`webAudioCtx`	AudioContext			an instance of Web Audio API `AudioContext`
`channel`	number	<optional>	0	audio channel number

Returns

Details

melSpectrumExtractor( audioFrame, sampleRate [, asVector [, config ] ] ) → {Array}

Description

Compute log-scaled mel spectrogram for a given audio signal frame along with an optional extractor profile configuration

Parameters

Name	Type	Attributes	Default	Description
`audioFrame`	Float32Array			a frame of decoded audio signal as Float32 typed array.
`sampleRate`	number			Sample rate of the input audio signal.
`asVector`	boolean	<optional>	false	whether to output the spectrogram as a vector float type for chaining with other essentia algorithms.
`config`	*	<optional>	this.profile

Returns

Details

audioBufferToMonoSignal( buffer ) → {Float32Array}

Description

Convert an AudioBuffer object to a Mono audio signal array. The audio signal is downmixed to mono using essentia MonoMixer algorithm if the audio buffer has 2 channels of audio. Throws an expection if the input AudioBuffer object has more than 2 channels of audio.

Parameters

Name	Type	Description
`buffer`	AudioBuffer	`AudioBuffer` object decoded from an audio file.

Returns

Details

shutdown()

Description

Method to shutdown essentia algorithm instance after it's use

Details

hpcpExtractor( audioFrame, sampleRate [, asVector [, config ] ] ) → {Array}

Description

Compute HPCP chroma feature for a given audio signal frame along with an optional extractor profile configuration

Parameters

Name	Type	Attributes	Default	Description
`audioFrame`	Float32Array			a decoded audio signal frame as Float32 typed array.
`sampleRate`	number			Sample rate of the input audio signal.
`asVector`	boolean	<optional>	false	whether to output the hpcpgram as a vector float type for chaining with other essentia algorithms.
`config`	*	<optional>	this.profile

Returns

Details

reinstantiate()

Description

Method for re-instantiating essentia algorithms instance after using the shutdown method

Details

"delete"()

Description

Delete essentiajs class instance

Details

arrayToVector( inputArray ) → {VectorFloat}

Description

Convert an input JS array into VectorFloat type

Parameters

Name	Type	Description
`inputArray`	Float32Array	input JS typed array

Returns

Details

vectorToArray( inputVector ) → {Float32Array}

Description

Convert an input VectorFloat array into typed JS Float32Array

Parameters

Name	Type	Description
`inputVector`	VectorFloat	input VectorFloat array

Returns

Details

FrameGenerator( inputAudioData [, frameSize [, hopSize ] ] ) → {VectorVectorFloat}

Description

Cuts an audio signal data into overlapping frames given frame size and hop size

Parameters

Name	Type	Attributes	Default	Description
`inputAudioData`	Float32Array			a single channel audio channel data
`frameSize`	number	<optional>	2048	frame size for cutting the audio signal
`hopSize`	number	<optional>	1024	size of overlapping frame

Returns

Details

MonoMixer( leftChannel, rightChannel ) → {object}

Description

This algorithm downmixes the signal into a single channel given a stereo signal. It is a wrapper around https://essentia.upf.edu/reference/std_MonoMixer.html.

Parameters

Name	Type	Description
`leftChannel`	VectorFloat	the left channel of the stereo audio signal
`rightChannel`	VectorFloat	the right channel of the stereo audio signal

Returns

Details

LoudnessEBUR128( leftChannel, rightChannel [, hopSize [, sampleRate [, startAtZero ] ] ] ) → {object}

Description

This algorithm computes the EBUR128 loudness descriptors of an audio signal. It is a wrapper around https://essentia.upf.edu/reference/std_LoudnessEBUR128.html.

Parameters

Name	Type	Attributes	Default	Description
`leftChannel`	VectorFloat			the left channel of the stereo audio signal
`rightChannel`	VectorFloat			the right channel of the stereo audio signal
`hopSize`	number	<optional>	0.1	the hop size with which the loudness is computed [s]
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]
`startAtZero`	boolean	<optional>	false	start momentary/short-term loudness estimation at time 0 (zero-centered loudness estimation windows) if true; otherwise start both windows at time 0 (time positions for momentary and short-term values will not be syncronized)

Returns

Details

AfterMaxToBeforeMaxEnergyRatio( pitch ) → {object}

Description

This algorithm computes the ratio between the pitch energy after the pitch maximum and the pitch energy before the pitch maximum. Sounds having an monotonically ascending pitch or one unique pitch will show a value of (0,1], while sounds having a monotonically descending pitch will show a value of [1,inf). In case there is no energy before the max pitch, the algorithm will return the energy after the maximum pitch. Check https://essentia.upf.edu/reference/std_AfterMaxToBeforeMaxEnergyRatio.html for more details.

Parameters

Name	Type	Description
`pitch`	VectorFloat	the array of pitch values [Hz]

Returns

Details

AllPass( signal [, bandwidth [, cutoffFrequency [, order [, sampleRate ] ] ] ] ) → {object}

Description

This algorithm implements a IIR all-pass filter of order 1 or 2. Because of its dependence on IIR, IIR's requirements are inherited. Check https://essentia.upf.edu/reference/std_AllPass.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input signal
`bandwidth`	number	<optional>	500	the bandwidth of the filter [Hz] (used only for 2nd-order filters)
`cutoffFrequency`	number	<optional>	1500	the cutoff frequency for the filter [Hz]
`order`	number	<optional>	1	the order of the filter
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]

Returns

Details

AudioOnsetsMarker( signal [, onsets [, sampleRate [, type ] ] ] ) → {object}

Description

This algorithm creates a wave file in which a given audio signal is mixed with a series of time onsets. The sonification of the onsets can be heard as beeps, or as short white noise pulses if configured to do so. Check https://essentia.upf.edu/reference/std_AudioOnsetsMarker.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input signal
`onsets`	Array.<any>	<optional>	[]	the list of onset locations [s]
`sampleRate`	number	<optional>	44100	the sampling rate of the output signal [Hz]
`type`	string	<optional>	beep	the type of sound to be added on the event

Returns

Details

AutoCorrelation( array [, frequencyDomainCompression [, generalized [, normalization ] ] ] ) → {object}

Description

This algorithm computes the autocorrelation vector of a signal. It uses the version most commonly used in signal processing, which doesn't remove the mean from the observations. Using the 'generalized' option this algorithm computes autocorrelation as described in [3]. Check https://essentia.upf.edu/reference/std_AutoCorrelation.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`array`	VectorFloat			the array to be analyzed
`frequencyDomainCompression`	number	<optional>	0.5	factor at which FFT magnitude is compressed (only used if 'generalized' is set to true, see [3])
`generalized`	boolean	<optional>	false	bool value to indicate whether to compute the 'generalized' autocorrelation as described in [3]
`normalization`	string	<optional>	standard	type of normalization to compute: either 'standard' (default) or 'unbiased'

Returns

Details

BFCC( spectrum [, dctType [, highFrequencyBound [, inputSize [, liftering [, logType [, lowFrequencyBound [, normalize [, numberBands [, numberCoefficients [, sampleRate [, type [, weighting ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes the bark-frequency cepstrum coefficients of a spectrum. Bark bands and their subsequent usage in cepstral analysis have shown to be useful in percussive content [1, 2] This algorithm is implemented using the Bark scaling approach in the Rastamat version of the MFCC algorithm and in a similar manner to the MFCC-FB40 default specs: Check https://essentia.upf.edu/reference/std_BFCC.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`spectrum`	VectorFloat			the audio spectrum
`dctType`	number	<optional>	2	the DCT type
`highFrequencyBound`	number	<optional>	11000	the upper bound of the frequency range [Hz]
`inputSize`	number	<optional>	1025	the size of input spectrum
`liftering`	number	<optional>	0	the liftering coefficient. Use '0' to bypass it
`logType`	string	<optional>	dbamp	logarithmic compression type. Use 'dbpow' if working with power and 'dbamp' if working with magnitudes
`lowFrequencyBound`	number	<optional>	0	the lower bound of the frequency range [Hz]
`normalize`	string	<optional>	unit_sum	'unit_max' makes the vertex of all the triangles equal to 1, 'unit_sum' makes the area of all the triangles equal to 1
`numberBands`	number	<optional>	40	the number of bark bands in the filter
`numberCoefficients`	number	<optional>	13	the number of output cepstrum coefficients
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]
`type`	string	<optional>	power	use magnitude or power spectrum
`weighting`	string	<optional>	warping	type of weighting function for determining triangle area

Returns

Details

BPF( x [, xPoints [, yPoints ] ] ) → {object}

Description

This algorithm implements a break point function which linearly interpolates between discrete xy-coordinates to construct a continuous function. Check https://essentia.upf.edu/reference/std_BPF.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`x`	number			the input coordinate (x-axis)
`xPoints`	Array.<any>	<optional>	[0, 1]	the x-coordinates of the points forming the break-point function (the points must be arranged in ascending order and cannot contain duplicates)
`yPoints`	Array.<any>	<optional>	[0, 1]	the y-coordinates of the points forming the break-point function

Returns

Details

BandPass( signal [, bandwidth [, cutoffFrequency [, sampleRate ] ] ] ) → {object}

Description

This algorithm implements a 2nd order IIR band-pass filter. Because of its dependence on IIR, IIR's requirements are inherited. Check https://essentia.upf.edu/reference/std_BandPass.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input audio signal
`bandwidth`	number	<optional>	500	the bandwidth of the filter [Hz]
`cutoffFrequency`	number	<optional>	1500	the cutoff frequency for the filter [Hz]
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]

Returns

Details

BandReject( signal [, bandwidth [, cutoffFrequency [, sampleRate ] ] ] ) → {object}

Description

This algorithm implements a 2nd order IIR band-reject filter. Because of its dependence on IIR, IIR's requirements are inherited. Check https://essentia.upf.edu/reference/std_BandReject.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input signal
`bandwidth`	number	<optional>	500	the bandwidth of the filter [Hz]
`cutoffFrequency`	number	<optional>	1500	the cutoff frequency for the filter [Hz]
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]

Returns

Details

BarkBands( spectrum [, numberBands [, sampleRate ] ] ) → {object}

Description

This algorithm computes energy in Bark bands of a spectrum. The band frequencies are: [0.0, 50.0, 100.0, 150.0, 200.0, 300.0, 400.0, 510.0, 630.0, 770.0, 920.0, 1080.0, 1270.0, 1480.0, 1720.0, 2000.0, 2320.0, 2700.0, 3150.0, 3700.0, 4400.0, 5300.0, 6400.0, 7700.0, 9500.0, 12000.0, 15500.0, 20500.0, 27000.0]. The first two Bark bands [0,100] and [100,200] have been split in half for better resolution (because of an observed better performance in beat detection). For each bark band the power-spectrum (mag-squared) is summed. Check https://essentia.upf.edu/reference/std_BarkBands.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`spectrum`	VectorFloat			the input spectrum
`numberBands`	number	<optional>	27	the number of desired barkbands
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]

Returns

Details

BeatTrackerDegara( signal [, maxTempo [, minTempo ] ] ) → {object}

Description

This algorithm estimates the beat positions given an input signal. It computes 'complex spectral difference' onset detection function and utilizes the beat tracking algorithm (TempoTapDegara) to extract beats [1]. The algorithm works with the optimized settings of 2048/1024 frame/hop size for the computation of the detection function, with its posterior x2 resampling.) While it has a lower accuracy than BeatTrackerMultifeature (see the evaluation results in [2]), its computational speed is significantly higher, which makes reasonable to apply this algorithm for batch processings of large amounts of audio signals. Check https://essentia.upf.edu/reference/std_BeatTrackerDegara.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the audio input signal
`maxTempo`	number	<optional>	208	the fastest tempo to detect [bpm]
`minTempo`	number	<optional>	40	the slowest tempo to detect [bpm]

Returns

Details

BeatTrackerMultiFeature( signal [, maxTempo [, minTempo ] ] ) → {object}

Description

This algorithm estimates the beat positions given an input signal. It computes a number of onset detection functions and estimates beat location candidates from them using TempoTapDegara algorithm. Thereafter the best candidates are selected using TempoTapMaxAgreement. The employed detection functions, and the optimal frame/hop sizes used for their computation are: - complex spectral difference (see 'complex' method in OnsetDetection algorithm, 2048/1024 with posterior x2 upsample or the detection function) - energy flux (see 'rms' method in OnsetDetection algorithm, the same settings) - spectral flux in Mel-frequency bands (see 'melflux' method in OnsetDetection algorithm, the same settings) - beat emphasis function (see 'beat_emphasis' method in OnsetDetectionGlobal algorithm, 2048/512) - spectral flux between histogrammed spectrum frames, measured by the modified information gain (see 'infogain' method in OnsetDetectionGlobal algorithm, 2048/512) Check https://essentia.upf.edu/reference/std_BeatTrackerMultiFeature.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the audio input signal
`maxTempo`	number	<optional>	208	the fastest tempo to detect [bpm]
`minTempo`	number	<optional>	40	the slowest tempo to detect [bpm]

Returns

Details

Beatogram( loudness, loudnessBandRatio [, size ] ) → {object}

Description

This algorithm filters the loudness matrix given by BeatsLoudness algorithm in order to keep only the most salient beat band representation. This algorithm has been found to be useful for estimating time signatures. Check https://essentia.upf.edu/reference/std_Beatogram.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`loudness`	VectorFloat			the loudness at each beat
`loudnessBandRatio`	VectorVectorFloat			matrix of loudness ratios at each band and beat
`size`	number	<optional>	16	number of beats for dynamic filtering

Returns

Details

BeatsLoudness( signal [, beatDuration [, beatWindowDuration [, beats [, frequencyBands [, sampleRate ] ] ] ] ] ) → {object}

Description

This algorithm computes the spectrum energy of beats in an audio signal given their positions. The energy is computed both on the whole frequency range and for each of the specified frequency bands. See the SingleBeatLoudness algorithm for a more detailed explanation. Check https://essentia.upf.edu/reference/std_BeatsLoudness.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input audio signal
`beatDuration`	number	<optional>	0.05	the duration of the window in which the beat will be restricted [s]
`beatWindowDuration`	number	<optional>	0.1	the duration of the window in which to look for the beginning of the beat (centered around the positions in 'beats') [s]
`beats`	Array.<any>	<optional>	[]	the list of beat positions (each position is in seconds)
`frequencyBands`	Array.<any>	<optional>	[20, 150, 400, 3200, 7000, 22000]	the list of bands to compute energy ratios [Hz
`sampleRate`	number	<optional>	44100	the audio sampling rate [Hz]

Returns

Details

BinaryOperator( array1, array2 [, type ] ) → {object}

Description

This algorithm performs basic arithmetical operations element by element given two arrays. Note: - using this algorithm in streaming mode can cause diamond shape graphs which have not been tested with the current scheduler. There is NO GUARANTEE of its correct work for diamond shape graphs. - for y<0, x/y is invalid Check https://essentia.upf.edu/reference/std_BinaryOperator.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`array1`	VectorFloat			the first operand input array
`array2`	VectorFloat			the second operand input array
`type`	string	<optional>	add	the type of the binary operator to apply to the input arrays

Returns

Details

BinaryOperatorStream( array1, array2 [, type ] ) → {object}

Description

Parameters

Name	Type	Attributes	Default	Description
`array1`	VectorFloat			the first operand input array
`array2`	VectorFloat			the second operand input array
`type`	string	<optional>	add	the type of the binary operator to apply to the input arrays

Returns

Details

BpmHistogramDescriptors( bpmIntervals ) → {object}

Description

This algorithm computes beats per minute histogram and its statistics for the highest and second highest peak. Note: histogram vector contains occurance frequency for each bpm value, 0-th element corresponds to 0 bpm value. Check https://essentia.upf.edu/reference/std_BpmHistogramDescriptors.html for more details.

Parameters

Name	Type	Description
`bpmIntervals`	VectorFloat	the list of bpm intervals [s]

Returns

Details

BpmRubato( beats [, longRegionsPruningTime [, shortRegionsMergingTime [, tolerance ] ] ] ) → {object}

Description

This algorithm extracts the locations of large tempo changes from a list of beat ticks. Check https://essentia.upf.edu/reference/std_BpmRubato.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`beats`	VectorFloat			list of detected beat ticks [s]
`longRegionsPruningTime`	number	<optional>	20	time for the longest constant tempo region inside a rubato region [s]
`shortRegionsMergingTime`	number	<optional>	4	time for the shortest constant tempo region from one tempo region to another [s]
`tolerance`	number	<optional>	0.08	minimum tempo deviation to look for

Returns

Details

CentralMoments( array [, mode [, range ] ] ) → {object}

Description

This algorithm extracts the 0th, 1st, 2nd, 3rd and 4th central moments of an array. It returns a 5-tuple in which the index corresponds to the order of the moment. Check https://essentia.upf.edu/reference/std_CentralMoments.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`array`	VectorFloat			the input array
`mode`	string	<optional>	pdf	compute central moments considering array values as a probability density function over array index or as sample points of a distribution
`range`	number	<optional>	1	the range of the input array, used for normalizing the results in the 'pdf' mode

Returns

Details

Centroid( array [, range ] ) → {object}

Description

This algorithm computes the centroid of an array. The centroid is normalized to a specified range. This algorithm can be used to compute spectral centroid or temporal centroid. Check https://essentia.upf.edu/reference/std_Centroid.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`array`	VectorFloat			the input array
`range`	number	<optional>	1	the range of the input array, used for normalizing the results

Returns

Details

ChordsDescriptors( chords, key, scale ) → {object}

Description

Given a chord progression this algorithm describes it by means of key, scale, histogram, and rate of change. Note: - chordsHistogram indexes follow the circle of fifths order, while being shifted to the input key and scale - key and scale are taken from the most frequent chord. In the case where multiple chords are equally frequent, the chord is hierarchically chosen from the circle of fifths. - chords should follow this name convention <A-G>[<#/b><m>] (i.e. C, C# or C#m are valid chords). Chord names not fitting this convention will throw an exception. Check https://essentia.upf.edu/reference/std_ChordsDescriptors.html for more details.

Parameters

Name	Type	Description
`chords`	VectorString	the chord progression
`key`	string	the key of the whole song, from A to G
`scale`	string	the scale of the whole song (major or minor)

Returns

Details

ChordsDetection( pcp [, hopSize [, sampleRate [, windowSize ] ] ] ) → {object}

Description

This algorithm estimates chords given an input sequence of harmonic pitch class profiles (HPCPs). It finds the best matching major or minor triad and outputs the result as a string (e.g. A#, Bm, G#m, C). This algorithm uses the Sharp versions of each Flatted note (i.e. Bb -> A#). Check https://essentia.upf.edu/reference/std_ChordsDetection.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`pcp`	VectorVectorFloat			the pitch class profile from which to detect the chord
`hopSize`	number	<optional>	2048	the hop size with which the input PCPs were computed
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]
`windowSize`	number	<optional>	2	the size of the window on which to estimate the chords [s]

Returns

Details

ChordsDetectionBeats( pcp, ticks [, chromaPick [, hopSize [, sampleRate ] ] ] ) → {object}

Description

This algorithm estimates chords using pitch profile classes on segments between beats. It is similar to ChordsDetection algorithm, but the chords are estimated on audio segments between each pair of consecutive beats. For each segment the estimation is done based on a chroma (HPCP) vector characterizing it, which can be computed by two methods: - 'interbeat_median', each resulting chroma vector component is a median of all the component values in the segment - 'starting_beat', chroma vector is sampled from the start of the segment (that is, its starting beat position) using its first frame. It makes sense if chroma is preliminary smoothed. Check https://essentia.upf.edu/reference/std_ChordsDetectionBeats.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`pcp`	VectorVectorFloat			the pitch class profile from which to detect the chord
`ticks`	VectorFloat			the list of beat positions (in seconds)
`chromaPick`	string	<optional>	interbeat_median	method of calculating singleton chroma for interbeat interval
`hopSize`	number	<optional>	2048	the hop size with which the input PCPs were computed
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]

Returns

Details

ChromaCrossSimilarity( queryFeature, referenceFeature [, binarizePercentile [, frameStackSize [, frameStackStride [, noti [, oti [, otiBinary [, streaming ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes a binary cross similarity matrix from two chromagam feature vectors of a query and reference song. Check https://essentia.upf.edu/reference/std_ChromaCrossSimilarity.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`queryFeature`	VectorVectorFloat			frame-wise chromagram of the query song (e.g., a HPCP)
`referenceFeature`	VectorVectorFloat			frame-wise chromagram of the reference song (e.g., a HPCP)
`binarizePercentile`	number	<optional>	0.095	maximum percent of distance values to consider as similar in each row and each column
`frameStackSize`	number	<optional>	9	number of input frames to stack together and treat as a feature vector for similarity computation. Choose 'frameStackSize=1' to use the original input frames without stacking
`frameStackStride`	number	<optional>	1	stride size to form a stack of frames (e.g., 'frameStackStride'=1 to use consecutive frames; 'frameStackStride'=2 for using every second frame)
`noti`	number	<optional>	12	number of circular shifts to be checked for Optimal Transposition Index [1]
`oti`	boolean	<optional>	true	whether to transpose the key of the reference song to the query song by Optimal Transposition Index [1]
`otiBinary`	boolean	<optional>	false	whether to use the OTI-based chroma binary similarity method [3]
`streaming`	boolean	<optional>	false	whether to accumulate the input 'queryFeature' in the euclidean similarity matrix calculation on each compute() method call

Returns

Details

Chromagram( frame [, binsPerOctave [, minFrequency [, minimumKernelSize [, normalizeType [, numberBins [, sampleRate [, scale [, threshold [, windowType [, zeroPhase ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes the Constant-Q chromagram using FFT. See ConstantQ algorithm for more details. Check https://essentia.upf.edu/reference/std_Chromagram.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`frame`	VectorFloat			the input audio frame
`binsPerOctave`	number	<optional>	12	number of bins per octave
`minFrequency`	number	<optional>	32.7	minimum frequency [Hz]
`minimumKernelSize`	number	<optional>	4	minimum size allowed for frequency kernels
`normalizeType`	string	<optional>	unit_max	normalize type
`numberBins`	number	<optional>	84	number of frequency bins, starting at minFrequency
`sampleRate`	number	<optional>	44100	FFT sampling rate [Hz]
`scale`	number	<optional>	1	filters scale. Larger values use longer windows
`threshold`	number	<optional>	0.01	bins whose magnitude is below this quantile are discarded
`windowType`	string	<optional>	hann	the window type, which can be 'hamming', 'hann', 'triangular', 'square' or 'blackmanharrisXX'
`zeroPhase`	boolean	<optional>	true	a boolean value that enables zero-phase windowing. Input audio frames should be windowed with the same phase mode

Returns

Details

ClickDetector( frame [, detectionThreshold [, frameSize [, hopSize [, order [, powerEstimationThreshold [, sampleRate [, silenceThreshold ] ] ] ] ] ] ] ) → {object}

Description

This algorithm detects the locations of impulsive noises (clicks and pops) on the input audio frame. It relies on LPC coefficients to inverse-filter the audio in order to attenuate the stationary part and enhance the prediction error (or excitation noise)[1]. After this, a matched filter is used to further enhance the impulsive peaks. The detection threshold is obtained from a robust estimate of the excitation noise power [2] plus a parametric gain value. Check https://essentia.upf.edu/reference/std_ClickDetector.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`frame`	VectorFloat			the input frame (must be non-empty)
`detectionThreshold`	number	<optional>	30	'detectionThreshold' the threshold is based on the instant power of the noisy excitation signal plus detectionThreshold dBs
`frameSize`	number	<optional>	512	the expected size of the input audio signal (this is an optional parameter to optimize memory allocation)
`hopSize`	number	<optional>	256	hop size used for the analysis. This parameter must be set correctly as it cannot be obtained from the input data
`order`	number	<optional>	12	scalar giving the number of LPCs to use
`powerEstimationThreshold`	number	<optional>	10	the noisy excitation is clipped to 'powerEstimationThreshold' times its median.
`sampleRate`	number	<optional>	44100	sample rate used for the analysis
`silenceThreshold`	number	<optional>	-50	threshold to skip silent frames

Returns

Details

Clipper( signal [, max [, min ] ] ) → {object}

Description

This algorithm clips the input signal to fit its values into a specified interval. Check https://essentia.upf.edu/reference/std_Clipper.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input signal
`max`	number	<optional>	1	the maximum value above which the signal will be clipped
`min`	number	<optional>	-1	the minimum value below which the signal will be clipped

Returns

Details

CoverSongSimilarity( inputArray [, alignmentType [, disExtension [, disOnset [, distanceType ] ] ] ] ) → {object}

Description

This algorithm computes a cover song similiarity measure from a binary cross similarity matrix input between two chroma vectors of a query and reference song using various alignment constraints of smith-waterman local-alignment algorithm. Check https://essentia.upf.edu/reference/std_CoverSongSimilarity.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`inputArray`	VectorVectorFloat			a 2D binary cross-similarity matrix between two audio chroma vectors (query vs reference song) (refer 'ChromaCrossSimilarity' algorithm').
`alignmentType`	string	<optional>	serra09	choose either one of the given local-alignment constraints for smith-waterman algorithm as described in [2] or [3] respectively.
`disExtension`	number	<optional>	0.5	penalty for disruption extension
`disOnset`	number	<optional>	0.5	penalty for disruption onset
`distanceType`	string	<optional>	asymmetric	choose the type of distance. By default the algorithm outputs a asymmetric disctance which is obtained by normalising the maximum score in the alignment score matrix with length of reference song

Returns

Details

Crest( array ) → {object}

Description

This algorithm computes the crest of an array. The crest is defined as the ratio between the maximum value and the arithmetic mean of an array. Typically it is used on the magnitude spectrum. Check https://essentia.upf.edu/reference/std_Crest.html for more details.

Parameters

Name	Type	Description
`array`	VectorFloat	the input array (cannot contain negative values, and must be non-empty)

Returns

Details

CrossCorrelation( arrayX, arrayY [, maxLag [, minLag ] ] ) → {object}

Description

This algorithm computes the cross-correlation vector of two signals. It accepts 2 parameters, minLag and maxLag which define the range of the computation of the innerproduct. Check https://essentia.upf.edu/reference/std_CrossCorrelation.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`arrayX`	VectorFloat			the first input array
`arrayY`	VectorFloat			the second input array
`maxLag`	number	<optional>	1	the maximum lag to be computed between the two vectors
`minLag`	number	<optional>	0	the minimum lag to be computed between the two vectors

Returns

Details

CrossSimilarityMatrix( queryFeature, referenceFeature [, binarize [, binarizePercentile [, frameStackSize [, frameStackStride ] ] ] ] ) → {object}

Description

This algorithm computes a euclidean cross-similarity matrix of two sequences of frame features. Similarity values can be optionally binarized Check https://essentia.upf.edu/reference/std_CrossSimilarityMatrix.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`queryFeature`	VectorVectorFloat			input frame features of the query song (e.g., a chromagram)
`referenceFeature`	VectorVectorFloat			input frame features of the reference song (e.g., a chromagram)
`binarize`	boolean	<optional>	false	whether to binarize the euclidean cross-similarity matrix
`binarizePercentile`	number	<optional>	0.095	maximum percent of distance values to consider as similar in each row and each column
`frameStackSize`	number	<optional>	1	number of input frames to stack together and treat as a feature vector for similarity computation. Choose 'frameStackSize=1' to use the original input frames without stacking
`frameStackStride`	number	<optional>	1	stride size to form a stack of frames (e.g., 'frameStackStride'=1 to use consecutive frames; 'frameStackStride'=2 for using every second frame)

Returns

Details

CubicSpline( x [, leftBoundaryFlag [, leftBoundaryValue [, rightBoundaryFlag [, rightBoundaryValue [, xPoints [, yPoints ] ] ] ] ] ] ) → {object}

Description

Computes the second derivatives of a piecewise cubic spline. The input value, i.e. the point at which the spline is to be evaluated typically should be between xPoints[0] and xPoints[size-1]. If the value lies outside this range, extrapolation is used. Regarding [left/right] boundary condition flag parameters: - 0: the cubic spline should be a quadratic over the first interval - 1: the first derivative at the [left/right] endpoint should be [left/right]BoundaryFlag - 2: the second derivative at the [left/right] endpoint should be [left/right]BoundaryFlag References: [1] Spline interpolation - Wikipedia, the free encyclopedia, http://en.wikipedia.org/wiki/Spline_interpolation Check https://essentia.upf.edu/reference/std_CubicSpline.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`x`	number			the input coordinate (x-axis)
`leftBoundaryFlag`	number	<optional>	0	type of boundary condition for the left boundary
`leftBoundaryValue`	number	<optional>	0	the value to be used in the left boundary, when leftBoundaryFlag is 1 or 2
`rightBoundaryFlag`	number	<optional>	0	type of boundary condition for the right boundary
`rightBoundaryValue`	number	<optional>	0	the value to be used in the right boundary, when rightBoundaryFlag is 1 or 2
`xPoints`	Array.<any>	<optional>	[0, 1]	the x-coordinates where data is specified (the points must be arranged in ascending order and cannot contain duplicates)
`yPoints`	Array.<any>	<optional>	[0, 1]	the y-coordinates to be interpolated (i.e. the known data)

Returns

Details

DCRemoval( signal [, cutoffFrequency [, sampleRate ] ] ) → {object}

Description

This algorithm removes the DC offset from a signal using a 1st order IIR highpass filter. Because of its dependence on IIR, IIR's requirements are inherited. Check https://essentia.upf.edu/reference/std_DCRemoval.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input audio signal
`cutoffFrequency`	number	<optional>	40	the cutoff frequency for the filter [Hz]
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]

Returns

Details

DCT( array [, dctType [, inputSize [, liftering [, outputSize ] ] ] ] ) → {object}

Description

This algorithm computes the Discrete Cosine Transform of an array. It uses the DCT-II form, with the 1/sqrt(2) scaling factor for the first coefficient. Check https://essentia.upf.edu/reference/std_DCT.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`array`	VectorFloat			the input array
`dctType`	number	<optional>	2	the DCT type
`inputSize`	number	<optional>	10	the size of the input array
`liftering`	number	<optional>	0	the liftering coefficient. Use '0' to bypass it
`outputSize`	number	<optional>	10	the number of output coefficients

Returns

Details

Danceability( signal [, maxTau [, minTau [, sampleRate [, tauMultiplier ] ] ] ] ) → {object}

Description

This algorithm estimates danceability of a given audio signal. The algorithm is derived from Detrended Fluctuation Analysis (DFA) described in [1]. The parameters minTau and maxTau are used to define the range of time over which DFA will be performed. The output of this algorithm is the danceability of the audio signal. These values usually range from 0 to 3 (higher values meaning more danceable). Check https://essentia.upf.edu/reference/std_Danceability.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input signal
`maxTau`	number	<optional>	8800	maximum segment length to consider [ms]
`minTau`	number	<optional>	310	minimum segment length to consider [ms]
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]
`tauMultiplier`	number	<optional>	1.1	multiplier to increment from min to max tau

Returns

Details

Decrease( array [, range ] ) → {object}

Description

This algorithm computes the decrease of an array defined as the linear regression coefficient. The range parameter is used to normalize the result. For a spectral centroid, the range should be equal to Nyquist and for an audio centroid the range should be equal to (audiosize - 1) / samplerate. The size of the input array must be at least two elements for "decrease" to be computed, otherwise an exception is thrown. References: [1] Least Squares Fitting -- from Wolfram MathWorld, http://mathworld.wolfram.com/LeastSquaresFitting.html Check https://essentia.upf.edu/reference/std_Decrease.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`array`	VectorFloat			the input array
`range`	number	<optional>	1	the range of the input array, used for normalizing the results

Returns

Details

Derivative( signal ) → {object}

Description

This algorithm returns the first-order derivative of an input signal. That is, for each input value it returns the value minus the previous one. Check https://essentia.upf.edu/reference/std_Derivative.html for more details.

Parameters

Name	Type	Description
`signal`	VectorFloat	the input signal

Returns

Details

DerivativeSFX( envelope ) → {object}

Description

This algorithm computes two descriptors that are based on the derivative of a signal envelope. Check https://essentia.upf.edu/reference/std_DerivativeSFX.html for more details.

Parameters

Name	Type	Description
`envelope`	VectorFloat	the envelope of the signal

Returns

Details

DiscontinuityDetector( frame [, detectionThreshold [, energyThreshold [, frameSize [, hopSize [, kernelSize [, order [, silenceThreshold [, subFrameSize ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm uses LPC and some heuristics to detect discontinuities in an audio signal. [1]. Check https://essentia.upf.edu/reference/std_DiscontinuityDetector.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`frame`	VectorFloat			the input frame (must be non-empty)
`detectionThreshold`	number	<optional>	8	'detectionThreshold' times the standard deviation plus the median of the frame is used as detection threshold
`energyThreshold`	number	<optional>	-60	threshold in dB to detect silent subframes
`frameSize`	number	<optional>	512	the expected size of the input audio signal (this is an optional parameter to optimize memory allocation)
`hopSize`	number	<optional>	256	hop size used for the analysis. This parameter must be set correctly as it cannot be obtained from the input data
`kernelSize`	number	<optional>	7	scalar giving the size of the median filter window. Must be odd
`order`	number	<optional>	3	scalar giving the number of LPCs to use
`silenceThreshold`	number	<optional>	-50	threshold to skip silent frames
`subFrameSize`	number	<optional>	32	size of the window used to compute silent subframes

Returns

Details

Dissonance( frequencies, magnitudes ) → {object}

Description

This algorithm computes the sensory dissonance of an audio signal given its spectral peaks. Sensory dissonance (to be distinguished from musical or theoretical dissonance) measures perceptual roughness of the sound and is based on the roughness of its spectral peaks. Given the spectral peaks, the algorithm estimates total dissonance by summing up the normalized dissonance values for each pair of peaks. These values are computed using dissonance curves, which define dissonace between two spectral peaks according to their frequency and amplitude relations. The dissonance curves are based on perceptual experiments conducted in [1]. Exceptions are thrown when the size of the input vectors are not equal or if input frequencies are not ordered ascendantly References: [1] R. Plomp and W. J. M. Levelt, "Tonal Consonance and Critical Bandwidth," The Journal of the Acoustical Society of America, vol. 38, no. 4, pp. 548–560, 1965. Check https://essentia.upf.edu/reference/std_Dissonance.html for more details.

Parameters

Name	Type	Description
`frequencies`	VectorFloat	the frequencies of the spectral peaks (must be sorted by frequency)
`magnitudes`	VectorFloat	the magnitudes of the spectral peaks (must be sorted by frequency

Returns

Details

DistributionShape( centralMoments ) → {object}

Description

This algorithm computes the spread (variance), skewness and kurtosis of an array given its central moments. The extracted features are good indicators of the shape of the distribution. For the required input see CentralMoments algorithm. The size of the input array must be at least 5. An exception will be thrown otherwise. Check https://essentia.upf.edu/reference/std_DistributionShape.html for more details.

Parameters

Name	Type	Description
`centralMoments`	VectorFloat	the central moments of a distribution

Returns

Details

Duration( signal [, sampleRate ] ) → {object}

Description

This algorithm outputs the total duration of an audio signal. Check https://essentia.upf.edu/reference/std_Duration.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input signal
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]

Returns

Details

DynamicComplexity( signal [, frameSize [, sampleRate ] ] ) → {object}

Description

This algorithm computes the dynamic complexity defined as the average absolute deviation from the global loudness level estimate on the dB scale. It is related to the dynamic range and to the amount of fluctuation in loudness present in a recording. Silence at the beginning and at the end of a track are ignored in the computation in order not to deteriorate the results. Check https://essentia.upf.edu/reference/std_DynamicComplexity.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input audio signal
`frameSize`	number	<optional>	0.2	the frame size [s]
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]

Returns

Details

ERBBands( spectrum [, highFrequencyBound [, inputSize [, lowFrequencyBound [, numberBands [, sampleRate [, type [, width ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes energies/magnitudes in ERB bands of a spectrum. The Equivalent Rectangular Bandwidth (ERB) scale is used. The algorithm applies a frequency domain filterbank using gammatone filters. Adapted from matlab code in: D. P. W. Ellis (2009). 'Gammatone-like spectrograms', web resource [1]. Check https://essentia.upf.edu/reference/std_ERBBands.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`spectrum`	VectorFloat			the audio spectrum
`highFrequencyBound`	number	<optional>	22050	an upper-bound limit for the frequencies to be included in the bands
`inputSize`	number	<optional>	1025	the size of the spectrum
`lowFrequencyBound`	number	<optional>	50	a lower-bound limit for the frequencies to be included in the bands
`numberBands`	number	<optional>	40	the number of output bands
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]
`type`	string	<optional>	power	use magnitude or power spectrum
`width`	number	<optional>	1	filter width with respect to ERB

Returns

Details

EffectiveDuration( signal [, sampleRate [, thresholdRatio ] ] ) → {object}

Description

This algorithm computes the effective duration of an envelope signal. The effective duration is a measure of the time the signal is perceptually meaningful. This is approximated by the time the envelope is above or equal to a given threshold and is above the -90db noise floor. This measure allows to distinguish percussive sounds from sustained sounds but depends on the signal length. By default, this algorithm uses 40% of the envelope maximum as the threshold which is suited for short sounds. Note, that the 0% thresold corresponds to the duration of signal above -90db noise floor, while the 100% thresold corresponds to the number of times the envelope takes its maximum value. References: [1] G. Peeters, "A large set of audio features for sound description (similarity and classification) in the CUIDADO project," CUIDADO I.S.T. Project Report, 2004 Check https://essentia.upf.edu/reference/std_EffectiveDuration.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input signal
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]
`thresholdRatio`	number	<optional>	0.4	the ratio of the envelope maximum to be used as the threshold

Returns

Details

Energy( array ) → {object}

Description

This algorithm computes the energy of an array. Check https://essentia.upf.edu/reference/std_Energy.html for more details.

Parameters

Name	Type	Description
`array`	VectorFloat	the input array

Returns

Details

EnergyBand( spectrum [, sampleRate [, startCutoffFrequency [, stopCutoffFrequency ] ] ] ) → {object}

Description

This algorithm computes energy in a given frequency band of a spectrum including both start and stop cutoff frequencies. Note that exceptions will be thrown when input spectrum is empty and if startCutoffFrequency is greater than stopCutoffFrequency. Check https://essentia.upf.edu/reference/std_EnergyBand.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`spectrum`	VectorFloat			the input frequency spectrum
`sampleRate`	number	<optional>	44100	the audio sampling rate [Hz]
`startCutoffFrequency`	number	<optional>	0	the start frequency from which to sum the energy [Hz]
`stopCutoffFrequency`	number	<optional>	100	the stop frequency to which to sum the energy [Hz]

Returns

Details

EnergyBandRatio( spectrum [, sampleRate [, startFrequency [, stopFrequency ] ] ] ) → {object}

Description

This algorithm computes the ratio of the spectral energy in the range [startFrequency, stopFrequency] over the total energy. Check https://essentia.upf.edu/reference/std_EnergyBandRatio.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`spectrum`	VectorFloat			the input audio spectrum
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]
`startFrequency`	number	<optional>	0	the frequency from which to start summing the energy [Hz]
`stopFrequency`	number	<optional>	100	the frequency up to which to sum the energy [Hz]

Returns

Details

Entropy( array ) → {object}

Description

This algorithm computes the Shannon entropy of an array. Entropy can be used to quantify the peakiness of a distribution. This has been used for voiced/unvoiced decision in automatic speech recognition. Check https://essentia.upf.edu/reference/std_Entropy.html for more details.

Parameters

Name	Type	Description
`array`	VectorFloat	the input array (cannot contain negative values, and must be non-empty)

Returns

Details

Envelope( signal [, applyRectification [, attackTime [, releaseTime [, sampleRate ] ] ] ] ) → {object}

Description

This algorithm computes the envelope of a signal by applying a non-symmetric lowpass filter on a signal. By default it rectifies the signal, but that is optional. Check https://essentia.upf.edu/reference/std_Envelope.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input signal
`applyRectification`	boolean	<optional>	true	whether to apply rectification (envelope based on the absolute value of signal)
`attackTime`	number	<optional>	10	the attack time of the first order lowpass in the attack phase [ms]
`releaseTime`	number	<optional>	1500	the release time of the first order lowpass in the release phase [ms]
`sampleRate`	number	<optional>	44100	the audio sampling rate [Hz]

Returns

Details

EqualLoudness( signal [, sampleRate ] ) → {object}

Description

This algorithm implements an equal-loudness filter. The human ear does not perceive sounds of all frequencies as having equal loudness, and to account for this, the signal is filtered by an inverted approximation of the equal-loudness curves. Technically, the filter is a cascade of a 10th order Yulewalk filter with a 2nd order Butterworth high pass filter. Check https://essentia.upf.edu/reference/std_EqualLoudness.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input signal
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]

Returns

Details

Flatness( array ) → {object}

Description

This algorithm computes the flatness of an array, which is defined as the ratio between the geometric mean and the arithmetic mean. Check https://essentia.upf.edu/reference/std_Flatness.html for more details.

Parameters

Name	Type	Description
`array`	VectorFloat	the input array

Returns

Details

FlatnessDB( array ) → {object}

Description

This algorithm computes the flatness of an array, which is defined as the ratio between the geometric mean and the arithmetic mean converted to dB scale. Check https://essentia.upf.edu/reference/std_FlatnessDB.html for more details.

Parameters

Name	Type	Description
`array`	VectorFloat	the input array

Returns

Details

FlatnessSFX( envelope ) → {object}

Description

This algorithm calculates the flatness coefficient of a signal envelope. Check https://essentia.upf.edu/reference/std_FlatnessSFX.html for more details.

Parameters

Name	Type	Description
`envelope`	VectorFloat	the envelope of the signal

Returns

Details

Flux( spectrum [, halfRectify [, norm ] ] ) → {object}

Description

This algorithm computes the spectral flux of a spectrum. Flux is defined as the L2-norm [1] or L1-norm [2] of the difference between two consecutive frames of the magnitude spectrum. The frames have to be of the same size in order to yield a meaningful result. The default L2-norm is used more commonly. Check https://essentia.upf.edu/reference/std_Flux.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`spectrum`	VectorFloat			the input spectrum
`halfRectify`	boolean	<optional>	false	half-rectify the differences in each spectrum bin
`norm`	string	<optional>	L2	the norm to use for difference computation

Returns

Details

FrameCutter( signal [, frameSize [, hopSize [, lastFrameToEndOfFile [, startFromZero [, validFrameThresholdRatio ] ] ] ] ] ) → {object}

Description

This algorithm slices the input buffer into frames. It returns a frame of a constant size and jumps a constant amount of samples forward in the buffer on every compute() call until no more frames can be extracted; empty frame vectors are returned afterwards. Incomplete frames (frames starting before the beginning of the input buffer or going past its end) are zero-padded or dropped according to the "validFrameThresholdRatio" parameter. Check https://essentia.upf.edu/reference/std_FrameCutter.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the buffer from which to read data
`frameSize`	number	<optional>	1024	the output frame size
`hopSize`	number	<optional>	512	the hop size between frames
`lastFrameToEndOfFile`	boolean	<optional>	false	whether the beginning of the last frame should reach the end of file. Only applicable if startFromZero is true
`startFromZero`	boolean	<optional>	false	whether to start the first frame at time 0 (centered at frameSize/2) if true, or -frameSize/2 otherwise (zero-centered)
`validFrameThresholdRatio`	number	<optional>	0	frames smaller than this ratio will be discarded, those larger will be zero-padded to a full frame (i.e. a value of 0 will never discard frames and a value of 1 will only keep frames that are of length 'frameSize')

Returns

Details

FrameToReal( signal [, frameSize [, hopSize ] ] ) → {object}

Description

This algorithm converts a sequence of input audio signal frames into a sequence of audio samples. Check https://essentia.upf.edu/reference/std_FrameToReal.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input audio frame
`frameSize`	number	<optional>	2048	the frame size for computing the overlap-add process
`hopSize`	number	<optional>	128	the hop size with which the overlap-add function is computed

Returns

Details

FrequencyBands( spectrum [, frequencyBands [, sampleRate ] ] ) → {object}

Description

This algorithm computes energy in rectangular frequency bands of a spectrum. The bands are non-overlapping. For each band the power-spectrum (mag-squared) is summed. Check https://essentia.upf.edu/reference/std_FrequencyBands.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`spectrum`	VectorFloat			the input spectrum (must be greater than size one)
`frequencyBands`	Array.<any>	<optional>	[0, 50, 100, 150, 200, 300, 400, 510, 630, 770, 920, 1080, 1270, 1480, 1720, 2000, 2320, 2700, 3150, 3700, 4400, 5300, 6400, 7700, 9500, 12000, 15500, 20500, 27000]	list of frequency ranges in to which the spectrum is divided (these must be in ascending order and connot contain duplicates)
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]

Returns

Details

GFCC( spectrum [, dctType [, highFrequencyBound [, inputSize [, logType [, lowFrequencyBound [, numberBands [, numberCoefficients [, sampleRate [, silenceThreshold [, type ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes the Gammatone-frequency cepstral coefficients of a spectrum. This is an equivalent of MFCCs, but using a gammatone filterbank (ERBBands) scaled on an Equivalent Rectangular Bandwidth (ERB) scale. Check https://essentia.upf.edu/reference/std_GFCC.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`spectrum`	VectorFloat			the audio spectrum
`dctType`	number	<optional>	2	the DCT type
`highFrequencyBound`	number	<optional>	22050	the upper bound of the frequency range [Hz]
`inputSize`	number	<optional>	1025	the size of input spectrum
`logType`	string	<optional>	dbamp	logarithmic compression type. Use 'dbpow' if working with power and 'dbamp' if working with magnitudes
`lowFrequencyBound`	number	<optional>	40	the lower bound of the frequency range [Hz]
`numberBands`	number	<optional>	40	the number of bands in the filter
`numberCoefficients`	number	<optional>	13	the number of output cepstrum coefficients
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]
`silenceThreshold`	number	<optional>	1e-10	silence threshold for computing log-energy bands
`type`	string	<optional>	power	use magnitude or power spectrum

Returns

Details

GapsDetector( frame [, attackTime [, frameSize [, hopSize [, kernelSize [, maximumTime [, minimumTime [, postpowerTime [, prepowerThreshold [, prepowerTime [, releaseTime [, sampleRate [, silenceThreshold ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm uses energy and time thresholds to detect gaps in the waveform. A median filter is used to remove spurious silent samples. The power of a small audio region before the detected gaps (prepower) is thresholded to detect intentional pauses as described in [1]. This technique isextended to the region after the gap. The algorithm was designed for a framewise use and returns the start and end timestamps related to the first frame processed. Call configure() or reset() in order to restart the count. Check https://essentia.upf.edu/reference/std_GapsDetector.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`frame`	VectorFloat			the input frame (must be non-empty)
`attackTime`	number	<optional>	0.05	the attack time of the first order lowpass in the attack phase [ms]
`frameSize`	number	<optional>	2048	frame size used for the analysis. Should match the input frame size. Otherwise, an exception will be thrown
`hopSize`	number	<optional>	1024	hop size used for the analysis
`kernelSize`	number	<optional>	11	scalar giving the size of the median filter window. Must be odd
`maximumTime`	number	<optional>	3500	time of the maximum gap duration [ms]
`minimumTime`	number	<optional>	10	time of the minimum gap duration [ms]
`postpowerTime`	number	<optional>	40	time for the postpower calculation [ms]
`prepowerThreshold`	number	<optional>	-30	prepower threshold [dB].
`prepowerTime`	number	<optional>	40	time for the prepower calculation [ms]
`releaseTime`	number	<optional>	0.05	the release time of the first order lowpass in the release phase [ms]
`sampleRate`	number	<optional>	44100	sample rate used for the analysis
`silenceThreshold`	number	<optional>	-50	silence threshold [dB]

Returns

Details

GeometricMean( array ) → {object}

Description

This algorithm computes the geometric mean of an array of positive values. Check https://essentia.upf.edu/reference/std_GeometricMean.html for more details.

Parameters

Name	Type	Description
`array`	VectorFloat	the input array

Returns

Details

HFC( spectrum [, sampleRate [, type ] ] ) → {object}

Description

This algorithm computes the High Frequency Content of a spectrum. It can be computed according to the following techniques: - 'Masri' (default) which does: sum |X(n)|^2*k, - 'Jensen' which does: sum |X(n)|*k^2 - 'Brossier' which does: sum |X(n)|*k Check https://essentia.upf.edu/reference/std_HFC.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`spectrum`	VectorFloat			the input audio spectrum
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]
`type`	string	<optional>	Masri	the type of HFC coefficient to be computed

Returns

Details

HPCP( frequencies, magnitudes [, bandPreset [, bandSplitFrequency [, harmonics [, maxFrequency [, maxShifted [, minFrequency [, nonLinear [, normalized [, referenceFrequency [, sampleRate [, size [, weightType [, windowSize ] ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

Computes a Harmonic Pitch Class Profile (HPCP) from the spectral peaks of a signal. HPCP is a k*12 dimensional vector which represents the intensities of the twelve (k==1) semitone pitch classes (corresponsing to notes from A to G#), or subdivisions of these (k>1). Check https://essentia.upf.edu/reference/std_HPCP.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`frequencies`	VectorFloat			the frequencies of the spectral peaks [Hz]
`magnitudes`	VectorFloat			the magnitudes of the spectral peaks
`bandPreset`	boolean	<optional>	true	enables whether to use a band preset
`bandSplitFrequency`	number	<optional>	500	the split frequency for low and high bands, not used if bandPreset is false [Hz]
`harmonics`	number	<optional>	0	number of harmonics for frequency contribution, 0 indicates exclusive fundamental frequency contribution
`maxFrequency`	number	<optional>	5000	the maximum frequency that contributes to the HPCP [Hz] (the difference between the max and split frequencies must not be less than 200.0 Hz)
`maxShifted`	boolean	<optional>	false	whether to shift the HPCP vector so that the maximum peak is at index 0
`minFrequency`	number	<optional>	40	the minimum frequency that contributes to the HPCP [Hz] (the difference between the min and split frequencies must not be less than 200.0 Hz)
`nonLinear`	boolean	<optional>	false	apply non-linear post-processing to the output (use with normalized='unitMax'). Boosts values close to 1, decreases values close to 0.
`normalized`	string	<optional>	unitMax	whether to normalize the HPCP vector
`referenceFrequency`	number	<optional>	440	the reference frequency for semitone index calculation, corresponding to A3 [Hz]
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]
`size`	number	<optional>	12	the size of the output HPCP (must be a positive nonzero multiple of 12)
`weightType`	string	<optional>	squaredCosine	type of weighting function for determining frequency contribution
`windowSize`	number	<optional>	1	the size, in semitones, of the window used for the weighting

Returns

Details

HarmonicBpm( bpms [, bpm [, threshold [, tolerance ] ] ] ) → {object}

Description

This algorithm extracts bpms that are harmonically related to the tempo given by the 'bpm' parameter. The algorithm assumes a certain bpm is harmonically related to parameter bpm, when the greatest common divisor between both bpms is greater than threshold. The 'tolerance' parameter is needed in order to consider if two bpms are related. For instance, 120, 122 and 236 may be related or not depending on how much tolerance is given Check https://essentia.upf.edu/reference/std_HarmonicBpm.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`bpms`	VectorFloat			list of bpm candidates
`bpm`	number	<optional>	60	the bpm used to find its harmonics
`threshold`	number	<optional>	20	bpm threshold below which greatest common divisors are discarded
`tolerance`	number	<optional>	5	percentage tolerance to consider two bpms are equal or equal to a harmonic

Returns

Details

HarmonicPeaks( frequencies, magnitudes, pitch [, maxHarmonics [, tolerance ] ] ) → {object}

Description

This algorithm finds the harmonic peaks of a signal given its spectral peaks and its fundamental frequency. Note: - "tolerance" parameter defines the allowed fixed deviation from ideal harmonics, being a percentage over the F0. For example: if the F0 is 100Hz you may decide to allow a deviation of 20%, that is a fixed deviation of 20Hz; for the harmonic series it is: [180-220], [280-320], [380-420], etc. - If "pitch" is zero, it means its value is unknown, or the sound is unpitched, and in that case the HarmonicPeaks algorithm returns an empty vector. - The output frequency and magnitude vectors are of size "maxHarmonics". If a particular harmonic was not found among spectral peaks, its ideal frequency value is output together with 0 magnitude. This algorithm is intended to receive its "frequencies" and "magnitudes" inputs from the SpectralPeaks algorithm. - When input vectors differ in size or are empty, an exception is thrown. Input vectors must be ordered by ascending frequency excluding DC components and not contain duplicates, otherwise an exception is thrown. Check https://essentia.upf.edu/reference/std_HarmonicPeaks.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`frequencies`	VectorFloat			the frequencies of the spectral peaks [Hz] (ascending order)
`magnitudes`	VectorFloat			the magnitudes of the spectral peaks (ascending frequency order)
`pitch`	number			an estimate of the fundamental frequency of the signal [Hz]
`maxHarmonics`	number	<optional>	20	the number of harmonics to return including F0
`tolerance`	number	<optional>	0.2	the allowed ratio deviation from ideal harmonics

Returns

Details

HighPass( signal [, cutoffFrequency [, sampleRate ] ] ) → {object}

Description

This algorithm implements a 1st order IIR high-pass filter. Because of its dependence on IIR, IIR's requirements are inherited. Check https://essentia.upf.edu/reference/std_HighPass.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input audio signal
`cutoffFrequency`	number	<optional>	1500	the cutoff frequency for the filter [Hz]
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]

Returns

Details

HighResolutionFeatures( hpcp [, maxPeaks ] ) → {object}

Description

This algorithm computes high-resolution chroma features from an HPCP vector. The vector's size must be a multiple of 12 and it is recommended that it be larger than 120. In otherwords, the HPCP's resolution should be 10 Cents or more. The high-resolution features being computed are: Check https://essentia.upf.edu/reference/std_HighResolutionFeatures.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`hpcp`	VectorFloat			the HPCPs, preferably of size >= 120
`maxPeaks`	number	<optional>	24	maximum number of HPCP peaks to consider when calculating outputs

Returns

Details

Histogram( array [, maxValue [, minValue [, normalize [, numberBins ] ] ] ] ) → {object}

Description

This algorithm computes a histogram. Values outside the range are ignored Check https://essentia.upf.edu/reference/std_Histogram.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`array`	VectorFloat			the input array
`maxValue`	number	<optional>	1	the max value of the histogram
`minValue`	number	<optional>	0	the min value of the histogram
`normalize`	string	<optional>	none	the normalization setting.
`numberBins`	number	<optional>	10	the number of bins

Returns

Details

HprModelAnal( frame, pitch [, fftSize [, freqDevOffset [, freqDevSlope [, harmDevSlope [, hopSize [, magnitudeThreshold [, maxFrequency [, maxPeaks [, maxnSines [, minFrequency [, nHarmonics [, orderBy [, sampleRate [, stocf ] ] ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes the harmonic plus residual model analysis. Check https://essentia.upf.edu/reference/std_HprModelAnal.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`frame`	VectorFloat			the input frame
`pitch`	number			external pitch input [Hz].
`fftSize`	number	<optional>	2048	the size of the internal FFT size (full spectrum size)
`freqDevOffset`	number	<optional>	20	minimum frequency deviation at 0Hz
`freqDevSlope`	number	<optional>	0.01	slope increase of minimum frequency deviation
`harmDevSlope`	number	<optional>	0.01	slope increase of minimum frequency deviation
`hopSize`	number	<optional>	512	the hop size between frames
`magnitudeThreshold`	number	<optional>	0	peaks below this given threshold are not outputted
`maxFrequency`	number	<optional>	5000	the maximum frequency of the range to evaluate [Hz]
`maxPeaks`	number	<optional>	100	the maximum number of returned peaks
`maxnSines`	number	<optional>	100	maximum number of sines per frame
`minFrequency`	number	<optional>	20	the minimum frequency of the range to evaluate [Hz]
`nHarmonics`	number	<optional>	100	maximum number of harmonics per frame
`orderBy`	string	<optional>	frequency	the ordering type of the outputted peaks (ascending by frequency or descending by magnitude)
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]
`stocf`	number	<optional>	0.2	decimation factor used for the stochastic approximation

Returns

Details

HpsModelAnal( frame, pitch [, fftSize [, freqDevOffset [, freqDevSlope [, harmDevSlope [, hopSize [, magnitudeThreshold [, maxFrequency [, maxPeaks [, maxnSines [, minFrequency [, nHarmonics [, orderBy [, sampleRate [, stocf ] ] ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes the harmonic plus stochastic model analysis. Check https://essentia.upf.edu/reference/std_HpsModelAnal.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`frame`	VectorFloat			the input frame
`pitch`	number			external pitch input [Hz].
`fftSize`	number	<optional>	2048	the size of the internal FFT size (full spectrum size)
`freqDevOffset`	number	<optional>	20	minimum frequency deviation at 0Hz
`freqDevSlope`	number	<optional>	0.01	slope increase of minimum frequency deviation
`harmDevSlope`	number	<optional>	0.01	slope increase of minimum frequency deviation
`hopSize`	number	<optional>	512	the hop size between frames
`magnitudeThreshold`	number	<optional>	0	peaks below this given threshold are not outputted
`maxFrequency`	number	<optional>	5000	the maximum frequency of the range to evaluate [Hz]
`maxPeaks`	number	<optional>	100	the maximum number of returned peaks
`maxnSines`	number	<optional>	100	maximum number of sines per frame
`minFrequency`	number	<optional>	20	the minimum frequency of the range to evaluate [Hz]
`nHarmonics`	number	<optional>	100	maximum number of harmonics per frame
`orderBy`	string	<optional>	frequency	the ordering type of the outputted peaks (ascending by frequency or descending by magnitude)
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]
`stocf`	number	<optional>	0.2	decimation factor used for the stochastic approximation

Returns

Details

IDCT( dct [, dctType [, inputSize [, liftering [, outputSize ] ] ] ] ) → {object}

Description

This algorithm computes the Inverse Discrete Cosine Transform of an array. It can be configured to perform the inverse DCT-II form, with the 1/sqrt(2) scaling factor for the first coefficient or the inverse DCT-III form based on the HTK implementation. Check https://essentia.upf.edu/reference/std_IDCT.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`dct`	VectorFloat			the discrete cosine transform
`dctType`	number	<optional>	2	the DCT type
`inputSize`	number	<optional>	10	the size of the input array
`liftering`	number	<optional>	0	the liftering coefficient. Use '0' to bypass it
`outputSize`	number	<optional>	10	the number of output coefficients

Returns

Details

IIR( signal [, denominator [, numerator ] ] ) → {object}

Description

This algorithm implements a standard IIR filter. It filters the data in the input vector with the filter described by parameter vectors 'numerator' and 'denominator' to create the output filtered vector. In the litterature, the numerator is often referred to as the 'B' coefficients and the denominator as the 'A' coefficients. Check https://essentia.upf.edu/reference/std_IIR.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input signal
`denominator`	Array.<any>	<optional>	[1]	the list of coefficients of the denominator. Often referred to as the A coefficient vector.
`numerator`	Array.<any>	<optional>	[1]	the list of coefficients of the numerator. Often referred to as the B coefficient vector.

Returns

Details

Inharmonicity( frequencies, magnitudes ) → {object}

Description

This algorithm calculates the inharmonicity of a signal given its spectral peaks. The inharmonicity value is computed as an energy weighted divergence of the spectral components from their closest multiple of the fundamental frequency. The fundamental frequency is taken as the first spectral peak from the input. The inharmonicity value ranges from 0 (purely harmonic signal) to 1 (inharmonic signal). Check https://essentia.upf.edu/reference/std_Inharmonicity.html for more details.

Parameters

Name	Type	Description
`frequencies`	VectorFloat	the frequencies of the harmonic peaks [Hz] (in ascending order)
`magnitudes`	VectorFloat	the magnitudes of the harmonic peaks (in frequency ascending order

Returns

Details

InstantPower( array ) → {object}

Description

This algorithm computes the instant power of an array. That is, the energy of the array over its size. Check https://essentia.upf.edu/reference/std_InstantPower.html for more details.

Parameters

Name	Type	Description
`array`	VectorFloat	the input array

Returns

Details

Intensity( signal [, sampleRate ] ) → {object}

Description

This algorithm classifies the input audio signal as either relaxed (-1), moderate (0), or aggressive (1). Check https://essentia.upf.edu/reference/std_Intensity.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input audio signal
`sampleRate`	number	<optional>	44100	the input audio sampling rate [Hz]

Returns

Details

Key( pcp [, numHarmonics [, pcpSize [, profileType [, slope [, useMajMin [, usePolyphony [, useThreeChords ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes key estimate given a pitch class profile (HPCP). The algorithm was severely adapted and changed from the original implementation for readability and speed. Check https://essentia.upf.edu/reference/std_Key.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`pcp`	VectorFloat			the input pitch class profile
`numHarmonics`	number	<optional>	4	number of harmonics that should contribute to the polyphonic profile (1 only considers the fundamental harmonic)
`pcpSize`	number	<optional>	36	number of array elements used to represent a semitone times 12 (this parameter is only a hint, during computation, the size of the input PCP is used instead)
`profileType`	string	<optional>	bgate	the type of polyphic profile to use for correlation calculation
`slope`	number	<optional>	0.6	value of the slope of the exponential harmonic contribution to the polyphonic profile
`useMajMin`	boolean	<optional>	false	use a third profile called 'majmin' for ambiguous tracks [4]. Only avalable for the edma, bgate and braw profiles
`usePolyphony`	boolean	<optional>	true	enables the use of polyphonic profiles to define key profiles (this includes the contributions from triads as well as pitch harmonics)
`useThreeChords`	boolean	<optional>	true	consider only the 3 main triad chords of the key (T, D, SD) to build the polyphonic profiles

Returns

Details

KeyExtractor( audio [, averageDetuningCorrection [, frameSize [, hopSize [, hpcpSize [, maxFrequency [, maximumSpectralPeaks [, minFrequency [, pcpThreshold [, profileType [, sampleRate [, spectralPeaksThreshold [, tuningFrequency [, weightType [, windowType ] ] ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm extracts key/scale for an audio signal. It computes HPCP frames for the input signal and applies key estimation using the Key algorithm. Check https://essentia.upf.edu/reference/std_KeyExtractor.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`audio`	VectorFloat			the audio input signal
`averageDetuningCorrection`	boolean	<optional>	true	shifts a pcp to the nearest tempered bin
`frameSize`	number	<optional>	4096	the framesize for computing tonal features
`hopSize`	number	<optional>	4096	the hopsize for computing tonal features
`hpcpSize`	number	<optional>	12	the size of the output HPCP (must be a positive nonzero multiple of 12)
`maxFrequency`	number	<optional>	3500	max frequency to apply whitening to [Hz]
`maximumSpectralPeaks`	number	<optional>	60	the maximum number of spectral peaks
`minFrequency`	number	<optional>	25	min frequency to apply whitening to [Hz]
`pcpThreshold`	number	<optional>	0.2	pcp bins below this value are set to 0
`profileType`	string	<optional>	bgate	the type of polyphic profile to use for correlation calculation
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]
`spectralPeaksThreshold`	number	<optional>	0.0001	the threshold for the spectral peaks
`tuningFrequency`	number	<optional>	440	the tuning frequency of the input signal
`weightType`	string	<optional>	cosine	type of weighting function for determining frequency contribution
`windowType`	string	<optional>	hann	the window type, which can be 'hamming', 'hann', 'triangular', 'square' or 'blackmanharrisXX'

Returns

Details

LPC( frame [, order [, sampleRate [, type ] ] ] ) → {object}

Description

This algorithm computes Linear Predictive Coefficients and associated reflection coefficients of a signal. Check https://essentia.upf.edu/reference/std_LPC.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`frame`	VectorFloat			the input audio frame
`order`	number	<optional>	10	the order of the LPC analysis (typically [8,14])
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]
`type`	string	<optional>	regular	the type of LPC (regular or warped)

Returns

Details

Larm( signal [, attackTime [, power [, releaseTime [, sampleRate ] ] ] ] ) → {object}

Description

This algorithm estimates the long-term loudness of an audio signal. The LARM model is based on the asymmetrical low-pass filtering of the Peak Program Meter (PPM), combined with Revised Low-frequency B-weighting (RLB) and power mean calculations. LARM has shown to be a reliable and objective loudness estimate of music and speech. Check https://essentia.upf.edu/reference/std_Larm.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the audio input signal
`attackTime`	number	<optional>	10	the attack time of the first order lowpass in the attack phase [ms]
`power`	number	<optional>	1.5	the power used for averaging
`releaseTime`	number	<optional>	1500	the release time of the first order lowpass in the release phase [ms]
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]

Returns

Details

Leq( signal ) → {object}

Description

This algorithm computes the Equivalent sound level (Leq) of an audio signal. The Leq measure can be derived from the Revised Low-frequency B-weighting (RLB) or from the raw signal as described in [1]. If the signal contains no energy, Leq defaults to essentias definition of silence which is -90dB. This algorithm will throw an exception on empty input. Check https://essentia.upf.edu/reference/std_Leq.html for more details.

Parameters

Name	Type	Description
`signal`	VectorFloat	the input signal (must be non-empty)

Returns

Details

LevelExtractor( signal [, frameSize [, hopSize ] ] ) → {object}

Description

This algorithm extracts the loudness of an audio signal in frames using Loudness algorithm. Check https://essentia.upf.edu/reference/std_LevelExtractor.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the audio input signal
`frameSize`	number	<optional>	88200	frame size to compute loudness
`hopSize`	number	<optional>	44100	hop size to compute loudness

Returns

Details

LogAttackTime( signal [, sampleRate [, startAttackThreshold [, stopAttackThreshold ] ] ] ) → {object}

Description

This algorithm computes the log (base 10) of the attack time of a signal envelope. The attack time is defined as the time duration from when the sound becomes perceptually audible to when it reaches its maximum intensity. By default, the start of the attack is estimated as the point where the signal envelope reaches 20% of its maximum value in order to account for possible noise presence. Also by default, the end of the attack is estimated as as the point where the signal envelope has reached 90% of its maximum value, in order to account for the possibility that the max value occurres after the logAttack, as in trumpet sounds. Check https://essentia.upf.edu/reference/std_LogAttackTime.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input signal envelope (must be non-empty)
`sampleRate`	number	<optional>	44100	the audio sampling rate [Hz]
`startAttackThreshold`	number	<optional>	0.2	the percentage of the input signal envelope at which the starting point of the attack is considered
`stopAttackThreshold`	number	<optional>	0.9	the percentage of the input signal envelope at which the ending point of the attack is considered

Returns

Details

LogSpectrum( spectrum [, binsPerSemitone [, frameSize [, rollOn [, sampleRate ] ] ] ] ) → {object}

Description

This algorithm computes spectrum with logarithmically distributed frequency bins. This code is ported from NNLS Chroma [1, 2].This algorithm also returns a local tuning that is retrieved for input frame and a global tuning that is updated with a moving average. Check https://essentia.upf.edu/reference/std_LogSpectrum.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`spectrum`	VectorFloat			spectrum frame
`binsPerSemitone`	number	<optional>	3	bins per semitone
`frameSize`	number	<optional>	1025	the input frame size of the spectrum vector
`rollOn`	number	<optional>	0	this removes low-frequency noise - useful in quiet recordings
`sampleRate`	number	<optional>	44100	the input sample rate

Returns

Details

LoopBpmConfidence( signal, bpmEstimate [, sampleRate ] ) → {object}

Description

This algorithm takes an audio signal and a BPM estimate for that signal and predicts the reliability of the BPM estimate in a value from 0 to 1. The audio signal is assumed to be a musical loop with constant tempo. The confidence returned is based on comparing the duration of the signal with multiples of the BPM estimate (see [1] for more details). Check https://essentia.upf.edu/reference/std_LoopBpmConfidence.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			loop audio signal
`bpmEstimate`	number			estimated BPM for the audio signal
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]

Returns

Details

LoopBpmEstimator( signal [, confidenceThreshold ] ) → {object}

Description

This algorithm estimates the BPM of audio loops. It internally uses PercivalBpmEstimator algorithm to produce a BPM estimate and LoopBpmConfidence to asses the reliability of the estimate. If the provided estimate is below the given confidenceThreshold, the algorithm outputs a BPM 0.0, otherwise it outputs the estimated BPM. For more details on the BPM estimation method and the confidence measure please check the used algorithms. Check https://essentia.upf.edu/reference/std_LoopBpmEstimator.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input signal
`confidenceThreshold`	number	<optional>	0.95	confidence threshold below which bpm estimate will be considered unreliable

Returns

Details

Loudness( signal ) → {object}

Description

This algorithm computes the loudness of an audio signal defined by Steven's power law. It computes loudness as the energy of the signal raised to the power of 0.67. Check https://essentia.upf.edu/reference/std_Loudness.html for more details.

Parameters

Name	Type	Description
`signal`	VectorFloat	the input signal

Returns

Details

LoudnessVickers( signal [, sampleRate ] ) → {object}

Description

This algorithm computes Vickers's loudness of an audio signal. Currently, this algorithm only works for signals with a 44100Hz sampling rate. This algorithm is meant to be given frames of audio as input (not entire audio signals). The algorithm described in the paper performs a weighted average of the loudness value computed for each of the given frames, this step is left as a post processing step and is not performed by this algorithm. Check https://essentia.upf.edu/reference/std_LoudnessVickers.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input signal
`sampleRate`	number	<optional>	44100	the audio sampling rate of the input signal which is used to create the weight vector [Hz] (currently, this algorithm only works on signals with a sampling rate of 44100Hz)

Returns

Details

LowLevelSpectralEqloudExtractor( signal [, frameSize [, hopSize [, sampleRate ] ] ] ) → {object}

Description

This algorithm extracts a set of level spectral features for which it is recommended to apply a preliminary equal-loudness filter over an input audio signal (according to the internal evaluations conducted at Music Technology Group). To this end, you are expected to provide the output of EqualLoudness algorithm as an input for this algorithm. Still, you are free to provide an unprocessed audio input in the case you want to compute these features without equal-loudness filter. Check https://essentia.upf.edu/reference/std_LowLevelSpectralEqloudExtractor.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input audio signal
`frameSize`	number	<optional>	2048	the frame size for computing low level features
`hopSize`	number	<optional>	1024	the hop size for computing low level features
`sampleRate`	number	<optional>	44100	the audio sampling rate

Returns

Details

LowLevelSpectralExtractor( signal [, frameSize [, hopSize [, sampleRate ] ] ] ) → {object}

Description

This algorithm extracts all low-level spectral features, which do not require an equal-loudness filter for their computation, from an audio signal Check https://essentia.upf.edu/reference/std_LowLevelSpectralExtractor.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the audio input signal
`frameSize`	number	<optional>	2048	the frame size for computing low level features
`hopSize`	number	<optional>	1024	the hop size for computing low level features
`sampleRate`	number	<optional>	44100	the audio sampling rate

Returns

object

{barkbands: 'spectral energy at each bark band. See BarkBands alogithm', barkbands_kurtosis: 'kurtosis from bark bands. See DistributionShape algorithm documentation', barkbands_skewness: 'skewness from bark bands. See DistributionShape algorithm documentation', barkbands_spread: 'spread from barkbands. See DistributionShape algorithm documentation', hfc: 'See HFC algorithm documentation', mfcc: 'See MFCC algorithm documentation', pitch: 'See PitchYinFFT algorithm documentation', pitch_instantaneous_confidence: 'See PitchYinFFT algorithm documentation', pitch_salience: 'See PitchSalience algorithm documentation', silence_rate_20dB: 'See SilenceRate algorithm documentation', silence_rate_30dB: 'See SilenceRate algorithm documentation', silence_rate_60dB: 'See SilenceRate algorithm documentation', spectral_complexity: 'See Spectral algorithm documentation', spectral_crest: 'See Crest algorithm documentation', spectral_decrease: 'See Decrease algorithm documentation', spectral_energy: 'See Energy algorithm documentation', spectral_energyband_low: 'Energy in band (20,150] Hz. See EnergyBand algorithm documentation', spectral_energyband_middle_low: 'Energy in band (150,800] Hz.See EnergyBand algorithm documentation', spectral_energyband_middle_high: 'Energy in band (800,4000] Hz. See EnergyBand algorithm documentation', spectral_energyband_high: 'Energy in band (4000,20000] Hz. See EnergyBand algorithm documentation', spectral_flatness_db: 'See flatnessDB algorithm documentation', spectral_flux: 'See Flux algorithm documentation', spectral_rms: 'See RMS algorithm documentation', spectral_rolloff: 'See RollOff algorithm documentation', spectral_strongpeak: 'See StrongPeak algorithm documentation', zerocrossingrate: 'See ZeroCrossingRate algorithm documentation', inharmonicity: 'See Inharmonicity algorithm documentation', tristimulus: 'See Tristimulus algorithm documentation', oddtoevenharmonicenergyratio: 'See OddToEvenHarmonicEnergyRatio algorithm documentation'}

Details

LowPass( signal [, cutoffFrequency [, sampleRate ] ] ) → {object}

Description

This algorithm implements a 1st order IIR low-pass filter. Because of its dependence on IIR, IIR's requirements are inherited. References: [1] U. Zölzer, DAFX - Digital Audio Effects, p. 40, John Wiley & Sons, 2002 Check https://essentia.upf.edu/reference/std_LowPass.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input audio signal
`cutoffFrequency`	number	<optional>	1500	the cutoff frequency for the filter [Hz]
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]

Returns

Details

MFCC( spectrum [, dctType [, highFrequencyBound [, inputSize [, liftering [, logType [, lowFrequencyBound [, normalize [, numberBands [, numberCoefficients [, sampleRate [, silenceThreshold [, type [, warpingFormula [, weighting ] ] ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes the mel-frequency cepstrum coefficients of a spectrum. As there is no standard implementation, the MFCC-FB40 is used by default: - filterbank of 40 bands from 0 to 11000Hz - take the log value of the spectrum energy in each mel band. Bands energy values below silence threshold will be clipped to its value before computing log-energies - DCT of the 40 bands down to 13 mel coefficients There is a paper describing various MFCC implementations [1]. Check https://essentia.upf.edu/reference/std_MFCC.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`spectrum`	VectorFloat			the audio spectrum
`dctType`	number	<optional>	2	the DCT type
`highFrequencyBound`	number	<optional>	11000	the upper bound of the frequency range [Hz]
`inputSize`	number	<optional>	1025	the size of input spectrum
`liftering`	number	<optional>	0	the liftering coefficient. Use '0' to bypass it
`logType`	string	<optional>	dbamp	logarithmic compression type. Use 'dbpow' if working with power and 'dbamp' if working with magnitudes
`lowFrequencyBound`	number	<optional>	0	the lower bound of the frequency range [Hz]
`normalize`	string	<optional>	unit_sum	spectrum bin weights to use for each mel band: 'unit_max' to make each mel band vertex equal to 1, 'unit_sum' to make each mel band area equal to 1 summing the actual weights of spectrum bins, 'unit_area' to make each triangle mel band area equal to 1 normalizing the weights of each triangle by its bandwidth
`numberBands`	number	<optional>	40	the number of mel-bands in the filter
`numberCoefficients`	number	<optional>	13	the number of output mel coefficients
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]
`silenceThreshold`	number	<optional>	1e-10	silence threshold for computing log-energy bands
`type`	string	<optional>	power	use magnitude or power spectrum
`warpingFormula`	string	<optional>	htkMel	The scale implementation type: 'htkMel' scale from the HTK toolkit [2, 3] (default) or 'slaneyMel' scale from the Auditory toolbox [4]
`weighting`	string	<optional>	warping	type of weighting function for determining triangle area

Returns

Details

MaxFilter( signal [, causal [, width ] ] ) → {object}

Description

This algorithm implements a maximum filter for 1d signal using van Herk/Gil-Werman (HGW) algorithm. Check https://essentia.upf.edu/reference/std_MaxFilter.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			signal to be filtered
`causal`	boolean	<optional>	true	use casual filter (window is behind current element otherwise it is centered around)
`width`	number	<optional>	3	the window size, has to be odd if the window is centered

Returns

Details

MaxMagFreq( spectrum [, sampleRate ] ) → {object}

Description

This algorithm computes the frequency with the largest magnitude in a spectrum. Note that a spectrum must contain at least two elements otherwise an exception is thrown Check https://essentia.upf.edu/reference/std_MaxMagFreq.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`spectrum`	VectorFloat			the input spectrum (must have more than 1 element)
`sampleRate`	number	<optional>	44100	the audio sampling rate [Hz]

Returns

Details

MaxToTotal( envelope ) → {object}

Description

This algorithm computes the ratio between the index of the maximum value of the envelope of a signal and the total length of the envelope. This ratio shows how much the maximum amplitude is off-center. Its value is close to 0 if the maximum is close to the beginning (e.g. Decrescendo or Impulsive sounds), close to 0.5 if it is close to the middle (e.g. Delta sounds) and close to 1 if it is close to the end of the sound (e.g. Crescendo sounds). This algorithm is intended to be fed by the output of the Envelope algorithm Check https://essentia.upf.edu/reference/std_MaxToTotal.html for more details.

Parameters

Name	Type	Description
`envelope`	VectorFloat	the envelope of the signal

Returns

Details

Mean( array ) → {object}

Description

This algorithm computes the mean of an array. Check https://essentia.upf.edu/reference/std_Mean.html for more details.

Parameters

Name	Type	Description
`array`	VectorFloat	the input array

Returns

Details

Median( array ) → {object}

Description

This algorithm computes the median of an array. When there is an odd number of numbers, the median is simply the middle number. For example, the median of 2, 4, and 7 is 4. When there is an even number of numbers, the median is the mean of the two middle numbers. Thus, the median of the numbers 2, 4, 7, 12 is (4+7)/2 = 5.5. See [1] for more info. Check https://essentia.upf.edu/reference/std_Median.html for more details.

Parameters

Name	Type	Description
`array`	VectorFloat	the input array (must be non-empty)

Returns

Details

MedianFilter( array [, kernelSize ] ) → {object}

Description

This algorithm computes the median filtered version of the input signal giving the kernel size as detailed in [1]. Check https://essentia.upf.edu/reference/std_MedianFilter.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`array`	VectorFloat			the input array (must be non-empty)
`kernelSize`	number	<optional>	11	scalar giving the size of the median filter window. Must be odd

Returns

Details

MelBands( spectrum [, highFrequencyBound [, inputSize [, log [, lowFrequencyBound [, normalize [, numberBands [, sampleRate [, type [, warpingFormula [, weighting ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes energy in mel bands of a spectrum. It applies a frequency-domain filterbank (MFCC FB-40, [1]), which consists of equal area triangular filters spaced according to the mel scale. The filterbank is normalized in such a way that the sum of coefficients for every filter equals one. It is recommended that the input "spectrum" be calculated by the Spectrum algorithm. Check https://essentia.upf.edu/reference/std_MelBands.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`spectrum`	VectorFloat			the audio spectrum
`highFrequencyBound`	number	<optional>	22050	an upper-bound limit for the frequencies to be included in the bands
`inputSize`	number	<optional>	1025	the size of the spectrum
`log`	boolean	<optional>	false	compute log-energies (log10 (1 + energy))
`lowFrequencyBound`	number	<optional>	0	a lower-bound limit for the frequencies to be included in the bands
`normalize`	string	<optional>	unit_sum	spectrum bin weights to use for each mel band: 'unit_max' to make each mel band vertex equal to 1, 'unit_sum' to make each mel band area equal to 1 summing the actual weights of spectrum bins, 'unit_area' to make each triangle mel band area equal to 1 normalizing the weights of each triangle by its bandwidth
`numberBands`	number	<optional>	24	the number of output bands
`sampleRate`	number	<optional>	44100	the sample rate
`type`	string	<optional>	power	'power' to output squared units, 'magnitude' to keep it as the input
`warpingFormula`	string	<optional>	htkMel	The scale implementation type: 'htkMel' scale from the HTK toolkit [2, 3] (default) or 'slaneyMel' scale from the Auditory toolbox [4]
`weighting`	string	<optional>	warping	type of weighting function for determining triangle area

Returns

Details

Meter( beatogram ) → {object}

Description

This algorithm estimates the time signature of a given beatogram by finding the highest correlation between beats. Check https://essentia.upf.edu/reference/std_Meter.html for more details.

Parameters

Name	Type	Description
`beatogram`	VectorVectorFloat	filtered matrix loudness

Returns

Details

MinMax( array [, type ] ) → {object}

Description

This algorithm calculates the minimum or maximum value of an array. If the array has more than one minimum or maximum value, the index of the first one is returned Check https://essentia.upf.edu/reference/std_MinMax.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`array`	VectorFloat			the input array
`type`	string	<optional>	min	the type of the operation

Returns

Details

MinToTotal( envelope ) → {object}

Description

This algorithm computes the ratio between the index of the minimum value of the envelope of a signal and the total length of the envelope. Check https://essentia.upf.edu/reference/std_MinToTotal.html for more details.

Parameters

Name	Type	Description
`envelope`	VectorFloat	the envelope of the signal

Returns

Details

MovingAverage( signal [, size ] ) → {object}

Description

This algorithm implements a FIR Moving Average filter. Because of its dependece on IIR, IIR's requirements are inherited. Check https://essentia.upf.edu/reference/std_MovingAverage.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input audio signal
`size`	number	<optional>	6	the size of the window [audio samples]

Returns

Details

MultiPitchKlapuri( signal [, binResolution [, frameSize [, harmonicWeight [, hopSize [, magnitudeCompression [, magnitudeThreshold [, maxFrequency [, minFrequency [, numberHarmonics [, referenceFrequency [, sampleRate ] ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm estimates multiple pitch values corresponding to the melodic lines present in a polyphonic music signal (for example, string quartet, piano). This implementation is based on the algorithm in [1]: In each frame, a set of possible fundamental frequency candidates is extracted based on the principle of harmonic summation. In an optimization stage, the number of harmonic sources (polyphony) is estimated and the final set of fundamental frequencies determined. In contrast to the pich salience function proposed in [2], this implementation uses the pitch salience function described in [1]. The output is a vector for each frame containing the estimated melody pitch values. Check https://essentia.upf.edu/reference/std_MultiPitchKlapuri.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input signal
`binResolution`	number	<optional>	10	salience function bin resolution [cents]
`frameSize`	number	<optional>	2048	the frame size for computing pitch saliecnce
`harmonicWeight`	number	<optional>	0.8	harmonic weighting parameter (weight decay ratio between two consequent harmonics, =1 for no decay)
`hopSize`	number	<optional>	128	the hop size with which the pitch salience function was computed
`magnitudeCompression`	number	<optional>	1	magnitude compression parameter for the salience function (=0 for maximum compression, =1 for no compression)
`magnitudeThreshold`	number	<optional>	40	spectral peak magnitude threshold (maximum allowed difference from the highest peak in dBs)
`maxFrequency`	number	<optional>	1760	the maximum allowed frequency for salience function peaks (ignore peaks above) [Hz]
`minFrequency`	number	<optional>	80	the minimum allowed frequency for salience function peaks (ignore peaks below) [Hz]
`numberHarmonics`	number	<optional>	10	number of considered harmonics
`referenceFrequency`	number	<optional>	55	the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]

Returns

Details

MultiPitchMelodia( signal [, binResolution [, filterIterations [, frameSize [, guessUnvoiced [, harmonicWeight [, hopSize [, magnitudeCompression [, magnitudeThreshold [, maxFrequency [, minDuration [, minFrequency [, numberHarmonics [, peakDistributionThreshold [, peakFrameThreshold [, pitchContinuity [, referenceFrequency [, sampleRate [, timeContinuity ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm estimates multiple fundamental frequency contours from an audio signal. It is a multi pitch version of the MELODIA algorithm described in [1]. While the algorithm is originally designed to extract melody in polyphonic music, this implementation is adapted for multiple sources. The approach is based on the creation and characterization of pitch contours, time continuous sequences of pitch candidates grouped using auditory streaming cues. To this end, PitchSalienceFunction, PitchSalienceFunctionPeaks, PitchContours, and PitchContoursMonoMelody algorithms are employed. It is strongly advised to use the default parameter values which are optimized according to [1] (where further details are provided) except for minFrequency, maxFrequency, and voicingTolerance, which will depend on your application. Check https://essentia.upf.edu/reference/std_MultiPitchMelodia.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input signal
`binResolution`	number	<optional>	10	salience function bin resolution [cents]
`filterIterations`	number	<optional>	3	number of iterations for the octave errors / pitch outlier filtering process
`frameSize`	number	<optional>	2048	the frame size for computing pitch saliecnce
`guessUnvoiced`	boolean	<optional>	false	estimate pitch for non-voiced segments by using non-salient contours when no salient ones are present in a frame
`harmonicWeight`	number	<optional>	0.8	harmonic weighting parameter (weight decay ratio between two consequent harmonics, =1 for no decay)
`hopSize`	number	<optional>	128	the hop size with which the pitch salience function was computed
`magnitudeCompression`	number	<optional>	1	magnitude compression parameter for the salience function (=0 for maximum compression, =1 for no compression)
`magnitudeThreshold`	number	<optional>	40	spectral peak magnitude threshold (maximum allowed difference from the highest peak in dBs)
`maxFrequency`	number	<optional>	20000	the minimum allowed frequency for salience function peaks (ignore contours with peaks above) [Hz]
`minDuration`	number	<optional>	100	the minimum allowed contour duration [ms]
`minFrequency`	number	<optional>	40	the minimum allowed frequency for salience function peaks (ignore contours with peaks below) [Hz]
`numberHarmonics`	number	<optional>	20	number of considered harmonics
`peakDistributionThreshold`	number	<optional>	0.9	allowed deviation below the peak salience mean over all frames (fraction of the standard deviation)
`peakFrameThreshold`	number	<optional>	0.9	per-frame salience threshold factor (fraction of the highest peak salience in a frame)
`pitchContinuity`	number	<optional>	27.5625	pitch continuity cue (maximum allowed pitch change during 1 ms time period) [cents]
`referenceFrequency`	number	<optional>	55	the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]
`timeContinuity`	number	<optional>	100	time continuity cue (the maximum allowed gap duration for a pitch contour) [ms]

Returns

Details

Multiplexer( [ numberRealInputs [, numberVectorRealInputs ] ] ) → {object}

Description

This algorithm returns a single vector from a given number of real values and/or frames. Frames from different inputs are multiplexed onto a single stream in an alternating fashion. Check https://essentia.upf.edu/reference/std_Multiplexer.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`numberRealInputs`	number	<optional>	0	the number of inputs of type Real to multiplex
`numberVectorRealInputs`	number	<optional>	0	the number of inputs of type vector to multiplex

Returns

Details

NNLSChroma( logSpectrogram, meanTuning, localTuning [, chromaNormalization [, frameSize [, sampleRate [, spectralShape [, spectralWhitening [, tuningMode [, useNNLS ] ] ] ] ] ] ] ) → {object}

Description

This algorithm extracts treble and bass chromagrams from a sequence of log-frequency spectrum frames. On this representation, two processing steps are performed: -tuning, after which each centre bin (i.e. bin 2, 5, 8, ...) corresponds to a semitone, even if the tuning of the piece deviates from 440 Hz standard pitch. -running standardisation: subtraction of the running mean, division by the running standard deviation. This has a spectral whitening effect. This code is ported from NNLS Chroma [1, 2]. To achieve similar results follow this processing chain: frame slicing with sample rate = 44100, frame size = 16384, hop size = 2048 -> Windowing with Hann and no normalization -> Spectrum -> LogSpectrum. Check https://essentia.upf.edu/reference/std_NNLSChroma.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`logSpectrogram`	VectorVectorFloat			log spectrum frames
`meanTuning`	VectorFloat			mean tuning frames
`localTuning`	VectorFloat			local tuning frames
`chromaNormalization`	string	<optional>	none	determines whether or how the chromagrams are normalised
`frameSize`	number	<optional>	1025	the input frame size of the spectrum vector
`sampleRate`	number	<optional>	44100	the input sample rate
`spectralShape`	number	<optional>	0.7	the shape of the notes in the NNLS dictionary
`spectralWhitening`	number	<optional>	1	determines how much the log-frequency spectrum is whitened
`tuningMode`	string	<optional>	global	local uses a local average for tuning, global uses all audio frames. Local tuning is only advisable when the tuning is likely to change over the audio
`useNNLS`	boolean	<optional>	true	toggle between NNLS approximate transcription and linear spectral mapping

Returns

Details

NoiseAdder( signal [, fixSeed [, level ] ] ) → {object}

Description

This algorithm adds noise to an input signal. The average energy of the noise in dB is defined by the level parameter, and is generated using the Mersenne Twister random number generator. Check https://essentia.upf.edu/reference/std_NoiseAdder.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input signal
`fixSeed`	boolean	<optional>	false	if true, 0 is used as the seed for generating random values
`level`	number	<optional>	-100	power level of the noise generator [dB]

Returns

Details

NoiseBurstDetector( frame [, alpha [, silenceThreshold [, threshold ] ] ] ) → {object}

Description

This algorithm detects noise bursts in the waveform by thresholding the peaks of the second derivative. The threshold is computed using an Exponential Moving Average filter over the RMS of the second derivative of the input frame. Check https://essentia.upf.edu/reference/std_NoiseBurstDetector.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`frame`	VectorFloat			the input frame (must be non-empty)
`alpha`	number	<optional>	0.9	alpha coefficient for the Exponential Moving Average threshold estimation.
`silenceThreshold`	number	<optional>	-50	threshold to skip silent frames
`threshold`	number	<optional>	8	factor to control the dynamic theshold

Returns

Details

NoveltyCurve( frequencyBands [, frameRate [, normalize [, weightCurve [, weightCurveType ] ] ] ] ) → {object}

Description

This algorithm computes the "novelty curve" (Grosche & Müller, 2009) onset detection function. The algorithm expects as an input a frame-wise sequence of frequency-bands energies or spectrum magnitudes as originally proposed in [1] (see FrequencyBands and Spectrum algorithms). Novelty in each band (or frequency bin) is computed as a derivative between log-compressed energy (magnitude) values in consequent frames. The overall novelty value is then computed as a weighted sum that can be configured using 'weightCurve' parameter. The resulting novelty curve can be used for beat tracking and onset detection (see BpmHistogram and Onsets). Check https://essentia.upf.edu/reference/std_NoveltyCurve.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`frequencyBands`	VectorVectorFloat			the frequency bands
`frameRate`	number	<optional>	344.531	the sampling rate of the input audio
`normalize`	boolean	<optional>	false	whether to normalize each band's energy
`weightCurve`	Array.<any>	<optional>	[]	vector containing the weights for each frequency band. Only if weightCurveType==supplied
`weightCurveType`	string	<optional>	hybrid	the type of weighting to be used for the bands novelty

Returns

Details

NoveltyCurveFixedBpmEstimator( novelty [, hopSize [, maxBpm [, minBpm [, sampleRate [, tolerance ] ] ] ] ] ) → {object}

Description

This algorithm outputs a histogram of the most probable bpms assuming the signal has constant tempo given the novelty curve. This algorithm is based on the autocorrelation of the novelty curve (see NoveltyCurve algorithm) and should only be used for signals that have a constant tempo or as a first tempo estimator to be used in conjunction with other algorithms such as BpmHistogram.It is a simplified version of the algorithm described in [1] as, in order to predict the best BPM candidate, it computes autocorrelation of the entire novelty curve instead of analyzing it on frames and histogramming the peaks over frames. Check https://essentia.upf.edu/reference/std_NoveltyCurveFixedBpmEstimator.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`novelty`	VectorFloat			the novelty curve of the audio signal
`hopSize`	number	<optional>	512	the hopSize used to computeh the novelty curve from the original signal
`maxBpm`	number	<optional>	560	the maximum bpm to look for
`minBpm`	number	<optional>	30	the minimum bpm to look for
`sampleRate`	number	<optional>	44100	the sampling rate original audio signal [Hz]
`tolerance`	number	<optional>	3	tolerance (in percentage) for considering bpms to be equal

Returns

Details

OddToEvenHarmonicEnergyRatio( frequencies, magnitudes ) → {object}

Description

This algorithm computes the ratio between a signal's odd and even harmonic energy given the signal's harmonic peaks. The odd to even harmonic energy ratio is a measure allowing to distinguish odd-harmonic-energy predominant sounds (such as from a clarinet) from equally important even-harmonic-energy sounds (such as from a trumpet). The required harmonic frequencies and magnitudes can be computed by the HarmonicPeaks algorithm. In the case when the even energy is zero, which may happen when only even harmonics where found or when only one peak was found, the algorithm outputs the maximum real number possible. Therefore, this algorithm should be used in conjunction with the harmonic peaks algorithm. If no peaks are supplied, the algorithm outputs a value of one, assuming either the spectrum was flat or it was silent. Check https://essentia.upf.edu/reference/std_OddToEvenHarmonicEnergyRatio.html for more details.

Parameters

Name	Type	Description
`frequencies`	VectorFloat	the frequencies of the harmonic peaks (at least two frequencies in frequency ascending order)
`magnitudes`	VectorFloat	the magnitudes of the harmonic peaks (at least two magnitudes in frequency ascending order)

Returns

Details

OnsetDetection( spectrum, phase [, method [, sampleRate ] ] ) → {object}

Description

This algorithm computes various onset detection functions. The output of this algorithm should be post-processed in order to determine whether the frame contains an onset or not. Namely, it could be fed to the Onsets algorithm. It is recommended that the input "spectrum" is generated by the Spectrum algorithm. Four methods are available: - 'HFC', the High Frequency Content detection function which accurately detects percussive events (see HFC algorithm for details). - 'complex', the Complex-Domain spectral difference function [1] taking into account changes in magnitude and phase. It emphasizes note onsets either as a result of significant change in energy in the magnitude spectrum, and/or a deviation from the expected phase values in the phase spectrum, caused by a change in pitch. - 'complex_phase', the simplified Complex-Domain spectral difference function [2] taking into account phase changes, weighted by magnitude. TODO:It reacts better on tonal sounds such as bowed string, but tends to over-detect percussive events. - 'flux', the Spectral Flux detection function which characterizes changes in magnitude spectrum. See Flux algorithm for details. - 'melflux', the spectral difference function, similar to spectral flux, but using half-rectified energy changes in Mel-frequency bands of the spectrum [3]. - 'rms', the difference function, measuring the half-rectified change of the RMS of the magnitude spectrum (i.e., measuring overall energy flux) [4]. Check https://essentia.upf.edu/reference/std_OnsetDetection.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`spectrum`	VectorFloat			the input spectrum
`phase`	VectorFloat			the phase vector corresponding to this spectrum (used only by the "complex" method)
`method`	string	<optional>	hfc	the method used for onset detection
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]

Returns

Details

OnsetDetectionGlobal( signal [, frameSize [, hopSize [, method [, sampleRate ] ] ] ] ) → {object}

Description

This algorithm computes various onset detection functions. Detection values are computed frame-wisely given an input signal. The output of this algorithm should be post-processed in order to determine whether the frame contains an onset or not. Namely, it could be fed to the Onsets algorithm. The following method are available: - 'infogain', the spectral difference measured by the modified information gain [1]. For each frame, it accounts for energy change in between preceding and consecutive frames, histogrammed together, in order to suppress short-term variations on frame-by-frame basis. - 'beat_emphasis', the beat emphasis function [1]. This function is a linear combination of onset detection functions (complex spectral differences) in a number of sub-bands, weighted by their beat strength computed over the entire input signal. Note: - 'infogain' onset detection has been optimized for the default sampleRate=44100Hz, frameSize=2048, hopSize=512. - 'beat_emphasis' is optimized for a fixed resolution of 11.6ms, which corresponds to the default sampleRate=44100Hz, frameSize=1024, hopSize=512. Optimal performance of beat detection with TempoTapDegara is not guaranteed for other settings. Check https://essentia.upf.edu/reference/std_OnsetDetectionGlobal.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input signal
`frameSize`	number	<optional>	2048	the frame size for computing onset detection function
`hopSize`	number	<optional>	512	the hop size for computing onset detection function
`method`	string	<optional>	infogain	the method used for onset detection
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]

Returns

Details

OnsetRate( signal ) → {object}

Description

This algorithm computes the number of onsets per second and their position in time for an audio signal. Onset detection functions are computed using both high frequency content and complex-domain methods available in OnsetDetection algorithm. See OnsetDetection for more information. Please note that due to a dependence on the Onsets algorithm, this algorithm is only valid for audio signals with a sampling rate of 44100Hz. This algorithm throws an exception if the input signal is empty. Check https://essentia.upf.edu/reference/std_OnsetRate.html for more details.

Parameters

Name	Type	Description
`signal`	VectorFloat	the input signal

Returns

Details

OverlapAdd( signal [, frameSize [, gain [, hopSize ] ] ] ) → {object}

Description

This algorithm returns the output of an overlap-add process for a sequence of frames of an audio signal. It considers that the input audio frames are windowed audio signals. Giving the size of the frame and the hop size, overlapping and adding consecutive frames will produce a continuous signal. A normalization gain can be passed as a parameter. Check https://essentia.upf.edu/reference/std_OverlapAdd.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the windowed input audio frame
`frameSize`	number	<optional>	2048	the frame size for computing the overlap-add process
`gain`	number	<optional>	1	the normalization gain that scales the output signal. Useful for IFFT output
`hopSize`	number	<optional>	128	the hop size with which the overlap-add function is computed

Returns

Details

PeakDetection( array [, interpolate [, maxPeaks [, maxPosition [, minPeakDistance [, minPosition [, orderBy [, range [, threshold ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm detects local maxima (peaks) in an array. The algorithm finds positive slopes and detects a peak when the slope changes sign and the peak is above the threshold. It optionally interpolates using parabolic curve fitting. When two consecutive peaks are closer than the minPeakDistance parameter, the smallest one is discarded. A value of 0 bypasses this feature. Check https://essentia.upf.edu/reference/std_PeakDetection.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`array`	VectorFloat			the input array
`interpolate`	boolean	<optional>	true	boolean flag to enable interpolation
`maxPeaks`	number	<optional>	100	the maximum number of returned peaks
`maxPosition`	number	<optional>	1	the maximum value of the range to evaluate
`minPeakDistance`	number	<optional>	0	minimum distance between consecutive peaks (0 to bypass this feature)
`minPosition`	number	<optional>	0	the minimum value of the range to evaluate
`orderBy`	string	<optional>	position	the ordering type of the output peaks (ascending by position or descending by value)
`range`	number	<optional>	1	the input range
`threshold`	number	<optional>	-1e+06	peaks below this given threshold are not output

Returns

Details

PercivalBpmEstimator( signal [, frameSize [, frameSizeOSS [, hopSize [, hopSizeOSS [, maxBPM [, minBPM [, sampleRate ] ] ] ] ] ] ] ) → {object}

Description

This algorithm estimates the tempo in beats per minute (BPM) from an input signal as described in [1]. Check https://essentia.upf.edu/reference/std_PercivalBpmEstimator.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			input signal
`frameSize`	number	<optional>	1024	frame size for the analysis of the input signal
`frameSizeOSS`	number	<optional>	2048	frame size for the analysis of the Onset Strength Signal
`hopSize`	number	<optional>	128	hop size for the analysis of the input signal
`hopSizeOSS`	number	<optional>	128	hop size for the analysis of the Onset Strength Signal
`maxBPM`	number	<optional>	210	maximum BPM to detect
`minBPM`	number	<optional>	50	minimum BPM to detect
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]

Returns

Details

PercivalEnhanceHarmonics( array ) → {object}

Description

This algorithm implements the 'Enhance Harmonics' step as described in [1].Given an input autocorrelation signal, two time-stretched versions of it (by factors of 2 and 4) are added to the original.In this way, peaks with an harmonic relation are boosted. For more details check the referenced paper. Check https://essentia.upf.edu/reference/std_PercivalEnhanceHarmonics.html for more details.

Parameters

Name	Type	Description
`array`	VectorFloat	the input signal

Returns

Details

PercivalEvaluatePulseTrains( oss, positions ) → {object}

Description

This algorithm implements the 'Evaluate Pulse Trains' step as described in [1].Given an input onset strength signal (OSS) and a number of candidate tempo lag positions, the OSS is correlated with ideal expected pulse trains (for each candidate tempo lag) shifted in time by different amounts. The candidate tempo lag which generates the pulse train that better correlates with the OSS is returned as the preferred tempo candidate. For more details check the referenced paper. Check https://essentia.upf.edu/reference/std_PercivalEvaluatePulseTrains.html for more details.

Parameters

Name	Type	Description
`oss`	VectorFloat	onset strength signal (or other novelty curve)
`positions`	VectorFloat	peak positions of BPM candidates

Returns

Details

PitchContourSegmentation( pitch, signal [, hopSize [, minDuration [, pitchDistanceThreshold [, rmsThreshold [, sampleRate [, tuningFrequency ] ] ] ] ] ] ) → {object}

Description

This algorithm converts a pitch sequence estimated from an audio signal into a set of discrete note events. Each note is defined by its onset time, duration and MIDI pitch value, quantized to the equal tempered scale. Check https://essentia.upf.edu/reference/std_PitchContourSegmentation.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`pitch`	VectorFloat			estimated pitch contour [Hz]
`signal`	VectorFloat			input audio signal
`hopSize`	number	<optional>	128	hop size of the extracted pitch
`minDuration`	number	<optional>	0.1	minimum note duration [s]
`pitchDistanceThreshold`	number	<optional>	60	pitch threshold for note segmentation [cents]
`rmsThreshold`	number	<optional>	-2	zscore threshold for note segmentation
`sampleRate`	number	<optional>	44100	sample rate of the audio signal
`tuningFrequency`	number	<optional>	440	tuning reference frequency [Hz]

Returns

Details

PitchContours( peakBins, peakSaliences [, binResolution [, hopSize [, minDuration [, peakDistributionThreshold [, peakFrameThreshold [, pitchContinuity [, sampleRate [, timeContinuity ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm tracks a set of predominant pitch contours of an audio signal. This algorithm is intended to receive its "frequencies" and "magnitudes" inputs from the PitchSalienceFunctionPeaks algorithm outputs aggregated over all frames in the sequence. The output is a vector of estimated melody pitch values. Check https://essentia.upf.edu/reference/std_PitchContours.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`peakBins`	VectorVectorFloat			frame-wise array of cent bins corresponding to pitch salience function peaks
`peakSaliences`	VectorVectorFloat			frame-wise array of values of salience function peaks
`binResolution`	number	<optional>	10	salience function bin resolution [cents]
`hopSize`	number	<optional>	128	the hop size with which the pitch salience function was computed
`minDuration`	number	<optional>	100	the minimum allowed contour duration [ms]
`peakDistributionThreshold`	number	<optional>	0.9	allowed deviation below the peak salience mean over all frames (fraction of the standard deviation)
`peakFrameThreshold`	number	<optional>	0.9	per-frame salience threshold factor (fraction of the highest peak salience in a frame)
`pitchContinuity`	number	<optional>	27.5625	pitch continuity cue (maximum allowed pitch change durig 1 ms time period) [cents]
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]
`timeContinuity`	number	<optional>	100	time continuity cue (the maximum allowed gap duration for a pitch contour) [ms]

Returns

Details

PitchContoursMelody( contoursBins, contoursSaliences, contoursStartTimes, duration [, binResolution [, filterIterations [, guessUnvoiced [, hopSize [, maxFrequency [, minFrequency [, referenceFrequency [, sampleRate [, voiceVibrato [, voicingTolerance ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm converts a set of pitch contours into a sequence of predominant f0 values in Hz by taking the value of the most predominant contour in each frame. This algorithm is intended to receive its "contoursBins", "contoursSaliences", and "contoursStartTimes" inputs from the PitchContours algorithm. The "duration" input corresponds to the time duration of the input signal. The output is a vector of estimated pitch values and a vector of confidence values. Check https://essentia.upf.edu/reference/std_PitchContoursMelody.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`contoursBins`	VectorVectorFloat			array of frame-wise vectors of cent bin values representing each contour
`contoursSaliences`	VectorVectorFloat			array of frame-wise vectors of pitch saliences representing each contour
`contoursStartTimes`	VectorFloat			array of the start times of each contour [s]
`duration`	number			time duration of the input signal [s]
`binResolution`	number	<optional>	10	salience function bin resolution [cents]
`filterIterations`	number	<optional>	3	number of interations for the octave errors / pitch outlier filtering process
`guessUnvoiced`	boolean	<optional>	false	Estimate pitch for non-voiced segments by using non-salient contours when no salient ones are present in a frame
`hopSize`	number	<optional>	128	the hop size with which the pitch salience function was computed
`maxFrequency`	number	<optional>	20000	the minimum allowed frequency for salience function peaks (ignore contours with peaks above) [Hz]
`minFrequency`	number	<optional>	80	the minimum allowed frequency for salience function peaks (ignore contours with peaks below) [Hz]
`referenceFrequency`	number	<optional>	55	the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal (Hz)
`voiceVibrato`	boolean	<optional>	false	detect voice vibrato
`voicingTolerance`	number	<optional>	0.2	allowed deviation below the average contour mean salience of all contours (fraction of the standard deviation)

Returns

Details

PitchContoursMonoMelody( contoursBins, contoursSaliences, contoursStartTimes, duration [, binResolution [, filterIterations [, guessUnvoiced [, hopSize [, maxFrequency [, minFrequency [, referenceFrequency [, sampleRate ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm converts a set of pitch contours into a sequence of f0 values in Hz by taking the value of the most salient contour in each frame. In contrast to pitchContoursMelody, it assumes a single source. This algorithm is intended to receive its "contoursBins", "contoursSaliences", and "contoursStartTimes" inputs from the PitchContours algorithm. The "duration" input corresponds to the time duration of the input signal. The output is a vector of estimated pitch values and a vector of confidence values. Check https://essentia.upf.edu/reference/std_PitchContoursMonoMelody.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`contoursBins`	VectorVectorFloat			array of frame-wise vectors of cent bin values representing each contour
`contoursSaliences`	VectorVectorFloat			array of frame-wise vectors of pitch saliences representing each contour
`contoursStartTimes`	VectorFloat			array of the start times of each contour [s]
`duration`	number			time duration of the input signal [s]
`binResolution`	number	<optional>	10	salience function bin resolution [cents]
`filterIterations`	number	<optional>	3	number of interations for the octave errors / pitch outlier filtering process
`guessUnvoiced`	boolean	<optional>	false	Estimate pitch for non-voiced segments by using non-salient contours when no salient ones are present in a frame
`hopSize`	number	<optional>	128	the hop size with which the pitch salience function was computed
`maxFrequency`	number	<optional>	20000	the minimum allowed frequency for salience function peaks (ignore contours with peaks above) [Hz]
`minFrequency`	number	<optional>	80	the minimum allowed frequency for salience function peaks (ignore contours with peaks below) [Hz]
`referenceFrequency`	number	<optional>	55	the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal (Hz)

Returns

Details

PitchContoursMultiMelody( contoursBins, contoursSaliences, contoursStartTimes, duration [, binResolution [, filterIterations [, guessUnvoiced [, hopSize [, maxFrequency [, minFrequency [, referenceFrequency [, sampleRate ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm post-processes a set of pitch contours into a sequence of mutliple f0 values in Hz. This algorithm is intended to receive its "contoursBins", "contoursSaliences", and "contoursStartTimes" inputs from the PitchContours algorithm. The "duration" input corresponds to the time duration of the input signal. The output is a vector of estimated pitch values Check https://essentia.upf.edu/reference/std_PitchContoursMultiMelody.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`contoursBins`	VectorVectorFloat			array of frame-wise vectors of cent bin values representing each contour
`contoursSaliences`	VectorVectorFloat			array of frame-wise vectors of pitch saliences representing each contour
`contoursStartTimes`	VectorFloat			array of the start times of each contour [s]
`duration`	number			time duration of the input signal [s]
`binResolution`	number	<optional>	10	salience function bin resolution [cents]
`filterIterations`	number	<optional>	3	number of interations for the octave errors / pitch outlier filtering process
`guessUnvoiced`	boolean	<optional>	false	Estimate pitch for non-voiced segments by using non-salient contours when no salient ones are present in a frame
`hopSize`	number	<optional>	128	the hop size with which the pitch salience function was computed
`maxFrequency`	number	<optional>	20000	the minimum allowed frequency for salience function peaks (ignore contours with peaks above) [Hz]
`minFrequency`	number	<optional>	80	the minimum allowed frequency for salience function peaks (ignore contours with peaks below) [Hz]
`referenceFrequency`	number	<optional>	55	the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal (Hz)

Returns

Details

PitchFilter( pitch, pitchConfidence [, confidenceThreshold [, minChunkSize [, useAbsolutePitchConfidence ] ] ] ) → {object}

Description

This algorithm corrects the fundamental frequency estimations for a sequence of frames given pitch values together with their confidence values. In particular, it removes non-confident parts and spurious jumps in pitch and applies octave corrections. Check https://essentia.upf.edu/reference/std_PitchFilter.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`pitch`	VectorFloat			vector of pitch values for the input frames [Hz]
`pitchConfidence`	VectorFloat			vector of pitch confidence values for the input frames
`confidenceThreshold`	number	<optional>	36	ratio between the average confidence of the most confident chunk and the minimum allowed average confidence of a chunk
`minChunkSize`	number	<optional>	30	minumum number of frames in non-zero pitch chunks
`useAbsolutePitchConfidence`	boolean	<optional>	false	treat negative pitch confidence values as positive (use with melodia guessUnvoiced=True)

Returns

Details

PitchMelodia( signal [, binResolution [, filterIterations [, frameSize [, guessUnvoiced [, harmonicWeight [, hopSize [, magnitudeCompression [, magnitudeThreshold [, maxFrequency [, minDuration [, minFrequency [, numberHarmonics [, peakDistributionThreshold [, peakFrameThreshold [, pitchContinuity [, referenceFrequency [, sampleRate [, timeContinuity ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm estimates the fundamental frequency corresponding to the melody of a monophonic music signal based on the MELODIA algorithm. While the algorithm is originally designed to extract the predominant melody from polyphonic music [1], this implementation is adapted for monophonic signals. The approach is based on the creation and characterization of pitch contours, time continuous sequences of pitch candidates grouped using auditory streaming cues. To this end, PitchSalienceFunction, PitchSalienceFunctionPeaks, PitchContours, and PitchContoursMonoMelody algorithms are employed. It is strongly advised to use the default parameter values which are optimized according to [1] (where further details are provided) except for minFrequency and maxFrequency, which will depend on your application. Check https://essentia.upf.edu/reference/std_PitchMelodia.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input signal
`binResolution`	number	<optional>	10	salience function bin resolution [cents]
`filterIterations`	number	<optional>	3	number of iterations for the octave errors / pitch outlier filtering process
`frameSize`	number	<optional>	2048	the frame size for computing pitch saliecnce
`guessUnvoiced`	boolean	<optional>	false	estimate pitch for non-voiced segments by using non-salient contours when no salient ones are present in a frame
`harmonicWeight`	number	<optional>	0.8	harmonic weighting parameter (weight decay ratio between two consequent harmonics, =1 for no decay)
`hopSize`	number	<optional>	128	the hop size with which the pitch salience function was computed
`magnitudeCompression`	number	<optional>	1	magnitude compression parameter for the salience function (=0 for maximum compression, =1 for no compression)
`magnitudeThreshold`	number	<optional>	40	spectral peak magnitude threshold (maximum allowed difference from the highest peak in dBs)
`maxFrequency`	number	<optional>	20000	the minimum allowed frequency for salience function peaks (ignore contours with peaks above) [Hz]
`minDuration`	number	<optional>	100	the minimum allowed contour duration [ms]
`minFrequency`	number	<optional>	40	the minimum allowed frequency for salience function peaks (ignore contours with peaks below) [Hz]
`numberHarmonics`	number	<optional>	20	number of considered harmonics
`peakDistributionThreshold`	number	<optional>	0.9	allowed deviation below the peak salience mean over all frames (fraction of the standard deviation)
`peakFrameThreshold`	number	<optional>	0.9	per-frame salience threshold factor (fraction of the highest peak salience in a frame)
`pitchContinuity`	number	<optional>	27.5625	pitch continuity cue (maximum allowed pitch change during 1 ms time period) [cents]
`referenceFrequency`	number	<optional>	55	the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]
`timeContinuity`	number	<optional>	100	time continuity cue (the maximum allowed gap duration for a pitch contour) [ms]

Returns

Details

PitchSalience( spectrum [, highBoundary [, lowBoundary [, sampleRate ] ] ] ) → {object}

Description

This algorithm computes the pitch salience of a spectrum. The pitch salience is given by the ratio of the highest auto correlation value of the spectrum to the non-shifted auto correlation value. Pitch salience was designed as quick measure of tone sensation. Unpitched sounds (non-musical sound effects) and pure tones have an average pitch salience value close to 0 whereas sounds containing several harmonics in the spectrum tend to have a higher value. Check https://essentia.upf.edu/reference/std_PitchSalience.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`spectrum`	VectorFloat			the input audio spectrum
`highBoundary`	number	<optional>	5000	until which frequency we are looking for the minimum (must be smaller than half sampleRate) [Hz]
`lowBoundary`	number	<optional>	100	from which frequency we are looking for the maximum (must not be larger than highBoundary) [Hz]
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]

Returns

Details

PitchSalienceFunction( frequencies, magnitudes [, binResolution [, harmonicWeight [, magnitudeCompression [, magnitudeThreshold [, numberHarmonics [, referenceFrequency ] ] ] ] ] ] ) → {object}

Description

This algorithm computes the pitch salience function of a signal frame given its spectral peaks. The salience function covers a pitch range of nearly five octaves (i.e., 6000 cents), starting from the "referenceFrequency", and is quantized into cent bins according to the specified "binResolution". The salience of a given frequency is computed as the sum of the weighted energies found at integer multiples (harmonics) of that frequency. Check https://essentia.upf.edu/reference/std_PitchSalienceFunction.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`frequencies`	VectorFloat			the frequencies of the spectral peaks [Hz]
`magnitudes`	VectorFloat			the magnitudes of the spectral peaks
`binResolution`	number	<optional>	10	salience function bin resolution [cents]
`harmonicWeight`	number	<optional>	0.8	harmonic weighting parameter (weight decay ratio between two consequent harmonics, =1 for no decay)
`magnitudeCompression`	number	<optional>	1	magnitude compression parameter (=0 for maximum compression, =1 for no compression)
`magnitudeThreshold`	number	<optional>	40	peak magnitude threshold (maximum allowed difference from the highest peak in dBs)
`numberHarmonics`	number	<optional>	20	number of considered harmonics
`referenceFrequency`	number	<optional>	55	the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin

Returns

Details

PitchSalienceFunctionPeaks( salienceFunction [, binResolution [, maxFrequency [, minFrequency [, referenceFrequency ] ] ] ] ) → {object}

Description

This algorithm computes the peaks of a given pitch salience function. Check https://essentia.upf.edu/reference/std_PitchSalienceFunctionPeaks.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`salienceFunction`	VectorFloat			the array of salience function values corresponding to cent frequency bins
`binResolution`	number	<optional>	10	salience function bin resolution [cents]
`maxFrequency`	number	<optional>	1760	the maximum frequency to evaluate (ignore peaks above) [Hz]
`minFrequency`	number	<optional>	55	the minimum frequency to evaluate (ignore peaks below) [Hz]
`referenceFrequency`	number	<optional>	55	the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin

Returns

Details

PitchYin( signal [, frameSize [, interpolate [, maxFrequency [, minFrequency [, sampleRate [, tolerance ] ] ] ] ] ] ) → {object}

Description

This algorithm estimates the fundamental frequency given the frame of a monophonic music signal. It is an implementation of the Yin algorithm [1] for computations in the time domain. Check https://essentia.upf.edu/reference/std_PitchYin.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input signal frame
`frameSize`	number	<optional>	2048	number of samples in the input frame (this is an optional parameter to optimize memory allocation)
`interpolate`	boolean	<optional>	true	enable interpolation
`maxFrequency`	number	<optional>	22050	the maximum allowed frequency [Hz]
`minFrequency`	number	<optional>	20	the minimum allowed frequency [Hz]
`sampleRate`	number	<optional>	44100	sampling rate of the input audio [Hz]
`tolerance`	number	<optional>	0.15	tolerance for peak detection

Returns

Details

PitchYinFFT( spectrum [, frameSize [, interpolate [, maxFrequency [, minFrequency [, sampleRate [, tolerance ] ] ] ] ] ] ) → {object}

Description

This algorithm estimates the fundamental frequency given the spectrum of a monophonic music signal. It is an implementation of YinFFT algorithm [1], which is an optimized version of Yin algorithm for computation in the frequency domain. It is recommended to window the input spectrum with a Hann window. The raw spectrum can be computed with the Spectrum algorithm. Check https://essentia.upf.edu/reference/std_PitchYinFFT.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`spectrum`	VectorFloat			the input spectrum (preferably created with a hann window)
`frameSize`	number	<optional>	2048	number of samples in the input spectrum
`interpolate`	boolean	<optional>	true	boolean flag to enable interpolation
`maxFrequency`	number	<optional>	22050	the maximum allowed frequency [Hz]
`minFrequency`	number	<optional>	20	the minimum allowed frequency [Hz]
`sampleRate`	number	<optional>	44100	sampling rate of the input spectrum [Hz]
`tolerance`	number	<optional>	1	tolerance for peak detection

Returns

Details

PitchYinProbabilistic( signal [, frameSize [, hopSize [, lowRMSThreshold [, outputUnvoiced [, preciseTime [, sampleRate ] ] ] ] ] ] ) → {object}

Description

This algorithm computes the pitch track of a mono audio signal using probabilistic Yin algorithm. Check https://essentia.upf.edu/reference/std_PitchYinProbabilistic.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input mono audio signal
`frameSize`	number	<optional>	2048	the frame size of FFT
`hopSize`	number	<optional>	256	the hop size with which the pitch is computed
`lowRMSThreshold`	number	<optional>	0.1	the low RMS amplitude threshold
`outputUnvoiced`	string	<optional>	negative	whether output unvoiced frame, zero: output non-voiced pitch as 0.; abs: output non-voiced pitch as absolute values; negative: output non-voiced pitch as negative values
`preciseTime`	boolean	<optional>	false	use non-standard precise YIN timing (slow).
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]

Returns

Details

PitchYinProbabilities( signal [, frameSize [, lowAmp [, preciseTime [, sampleRate ] ] ] ] ) → {object}

Description

This algorithm estimates the fundamental frequencies, their probabilities given the frame of a monophonic music signal. It is a part of the implementation of the probabilistic Yin algorithm [1]. Check https://essentia.upf.edu/reference/std_PitchYinProbabilities.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input signal frame
`frameSize`	number	<optional>	2048	number of samples in the input frame
`lowAmp`	number	<optional>	0.1	the low RMS amplitude threshold
`preciseTime`	boolean	<optional>	false	use non-standard precise YIN timing (slow).
`sampleRate`	number	<optional>	44100	sampling rate of the input audio [Hz]

Returns

Details

PitchYinProbabilitiesHMM( pitchCandidates, probabilities [, minFrequency [, numberBinsPerSemitone [, selfTransition [, yinTrust ] ] ] ] ) → {object}

Description

This algorithm estimates the smoothed fundamental frequency given the pitch candidates and probabilities using hidden Markov models. It is a part of the implementation of the probabilistic Yin algorithm [1]. Check https://essentia.upf.edu/reference/std_PitchYinProbabilitiesHMM.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`pitchCandidates`	VectorVectorFloat			the pitch candidates
`probabilities`	VectorVectorFloat			the pitch probabilities
`minFrequency`	number	<optional>	61.735	minimum detected frequency
`numberBinsPerSemitone`	number	<optional>	5	number of bins per semitone
`selfTransition`	number	<optional>	0.99	the self transition probabilities
`yinTrust`	number	<optional>	0.5	the yin trust parameter

Returns

Details

PowerMean( array [, power ] ) → {object}

Description

This algorithm computes the power mean of an array. It accepts one parameter, p, which is the power (or order or degree) of the Power Mean. Note that if p=-1, the Power Mean is equal to the Harmonic Mean, if p=0, the Power Mean is equal to the Geometric Mean, if p=1, the Power Mean is equal to the Arithmetic Mean, if p=2, the Power Mean is equal to the Root Mean Square. Check https://essentia.upf.edu/reference/std_PowerMean.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`array`	VectorFloat			the input array (must contain only positive real numbers)
`power`	number	<optional>	1	the power to which to elevate each element before taking the mean

Returns

Details

PowerSpectrum( signal [, size ] ) → {object}

Description

This algorithm computes the power spectrum of an array of Reals. The resulting power spectrum has a size which is half the size of the input array plus one. Bins contain squared magnitude values. Check https://essentia.upf.edu/reference/std_PowerSpectrum.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input signal
`size`	number	<optional>	2048	the expected size of the input frame (this is purely optional and only targeted at optimizing the creation time of the FFT object)

Returns

Details

PredominantPitchMelodia( signal [, binResolution [, filterIterations [, frameSize [, guessUnvoiced [, harmonicWeight [, hopSize [, magnitudeCompression [, magnitudeThreshold [, maxFrequency [, minDuration [, minFrequency [, numberHarmonics [, peakDistributionThreshold [, peakFrameThreshold [, pitchContinuity [, referenceFrequency [, sampleRate [, timeContinuity [, voiceVibrato [, voicingTolerance ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm estimates the fundamental frequency of the predominant melody from polyphonic music signals using the MELODIA algorithm. It is specifically suited for music with a predominent melodic element, for example the singing voice melody in an accompanied singing recording. The approach [1] is based on the creation and characterization of pitch contours, time continuous sequences of pitch candidates grouped using auditory streaming cues. It furthermore determines for each frame, if the predominant melody is present or not. To this end, PitchSalienceFunction, PitchSalienceFunctionPeaks, PitchContours, and PitchContoursMelody algorithms are employed. It is strongly advised to use the default parameter values which are optimized according to [1] (where further details are provided) except for minFrequency, maxFrequency, and voicingTolerance, which will depend on your application. Check https://essentia.upf.edu/reference/std_PredominantPitchMelodia.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input signal
`binResolution`	number	<optional>	10	salience function bin resolution [cents]
`filterIterations`	number	<optional>	3	number of iterations for the octave errors / pitch outlier filtering process
`frameSize`	number	<optional>	2048	the frame size for computing pitch salience
`guessUnvoiced`	boolean	<optional>	false	estimate pitch for non-voiced segments by using non-salient contours when no salient ones are present in a frame
`harmonicWeight`	number	<optional>	0.8	harmonic weighting parameter (weight decay ratio between two consequent harmonics, =1 for no decay)
`hopSize`	number	<optional>	128	the hop size with which the pitch salience function was computed
`magnitudeCompression`	number	<optional>	1	magnitude compression parameter for the salience function (=0 for maximum compression, =1 for no compression)
`magnitudeThreshold`	number	<optional>	40	spectral peak magnitude threshold (maximum allowed difference from the highest peak in dBs)
`maxFrequency`	number	<optional>	20000	the minimum allowed frequency for salience function peaks (ignore contours with peaks above) [Hz]
`minDuration`	number	<optional>	100	the minimum allowed contour duration [ms]
`minFrequency`	number	<optional>	80	the minimum allowed frequency for salience function peaks (ignore contours with peaks below) [Hz]
`numberHarmonics`	number	<optional>	20	number of considered harmonics
`peakDistributionThreshold`	number	<optional>	0.9	allowed deviation below the peak salience mean over all frames (fraction of the standard deviation)
`peakFrameThreshold`	number	<optional>	0.9	per-frame salience threshold factor (fraction of the highest peak salience in a frame)
`pitchContinuity`	number	<optional>	27.5625	pitch continuity cue (maximum allowed pitch change during 1 ms time period) [cents]
`referenceFrequency`	number	<optional>	55	the reference frequency for Hertz to cent conversion [Hz], corresponding to the 0th cent bin
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]
`timeContinuity`	number	<optional>	100	time continuity cue (the maximum allowed gap duration for a pitch contour) [ms]
`voiceVibrato`	boolean	<optional>	false	detect voice vibrato
`voicingTolerance`	number	<optional>	0.2	allowed deviation below the average contour mean salience of all contours (fraction of the standard deviation)

Returns

Details

RMS( array ) → {object}

Description

This algorithm computes the root mean square (quadratic mean) of an array. RMS is not defined for empty arrays. In such case, an exception will be thrown . References: [1] Root mean square - Wikipedia, the free encyclopedia, http://en.wikipedia.org/wiki/Root_mean_square Check https://essentia.upf.edu/reference/std_RMS.html for more details.

Parameters

Name	Type	Description
`array`	VectorFloat	the input array

Returns

Details

RawMoments( array [, range ] ) → {object}

Description

This algorithm computes the first 5 raw moments of an array. The output array is of size 6 because the zero-ith moment is used for padding so that the first moment corresponds to index 1. Check https://essentia.upf.edu/reference/std_RawMoments.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`array`	VectorFloat			the input array
`range`	number	<optional>	22050	the range of the input array, used for normalizing the results

Returns

Details

ReplayGain( signal [, sampleRate ] ) → {object}

Description

This algorithm computes the Replay Gain loudness value of an audio signal. The algorithm is described in detail in [1]. The value returned is the 'standard' ReplayGain value, not the value with 6dB preamplification as computed by lame, mp3gain, vorbisgain, and all widely used ReplayGain programs. Check https://essentia.upf.edu/reference/std_ReplayGain.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input audio signal (must be longer than 0.05ms)
`sampleRate`	number	<optional>	44100	the sampling rate of the input audio signal [Hz]

Returns

Details

Resample( signal [, inputSampleRate [, outputSampleRate [, quality ] ] ] ) → {object}

Description

This algorithm resamples the input signal to the desired sampling rate. Check https://essentia.upf.edu/reference/std_Resample.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input signal
`inputSampleRate`	number	<optional>	44100	the sampling rate of the input signal [Hz]
`outputSampleRate`	number	<optional>	44100	the sampling rate of the output signal [Hz]
`quality`	number	<optional>	1	the quality of the conversion, 0 for best quality

Returns

Details

ResampleFFT( input [, inSize [, outSize ] ] ) → {object}

Description

This algorithm resamples a sequence using FFT / IFFT. The input and output sizes must be an even number. (It is meant to be eqivalent to the resample function in Numpy). Check https://essentia.upf.edu/reference/std_ResampleFFT.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`input`	VectorFloat			input array
`inSize`	number	<optional>	128	the size of the input sequence. It needss to be even-sized.
`outSize`	number	<optional>	128	the size of the output sequence. It needss to be even-sized.

Returns

Details

RhythmDescriptors( signal ) → {object}

Description

This algorithm computes rhythm features (bpm, beat positions, beat histogram peaks) for an audio signal. It combines RhythmExtractor2013 for beat tracking and BPM estimation with BpmHistogramDescriptors algorithms. Check https://essentia.upf.edu/reference/std_RhythmDescriptors.html for more details.

Parameters

Name	Type	Description
`signal`	VectorFloat	the audio input signal

Returns

object

{beats_position: 'See RhythmExtractor2013 algorithm documentation', confidence: 'See RhythmExtractor2013 algorithm documentation', bpm: 'See RhythmExtractor2013 algorithm documentation', bpm_estimates: 'See RhythmExtractor2013 algorithm documentation', bpm_intervals: 'See RhythmExtractor2013 algorithm documentation', first_peak_bpm: 'See BpmHistogramDescriptors algorithm documentation', first_peak_spread: 'See BpmHistogramDescriptors algorithm documentation', first_peak_weight: 'See BpmHistogramDescriptors algorithm documentation', second_peak_bpm: 'See BpmHistogramDescriptors algorithm documentation', second_peak_spread: 'See BpmHistogramDescriptors algorithm documentation', second_peak_weight: 'See BpmHistogramDescriptors algorithm documentation', histogram: 'bpm histogram [bpm]'}

Details

RhythmExtractor( signal [, frameHop [, frameSize [, hopSize [, lastBeatInterval [, maxTempo [, minTempo [, numberFrames [, sampleRate [, tempoHints [, tolerance [, useBands [, useOnset ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm estimates the tempo in bpm and beat positions given an audio signal. The algorithm combines several periodicity functions and estimates beats using TempoTap and TempoTapTicks. It combines: - onset detection functions based on high-frequency content (see OnsetDetection) - complex-domain spectral difference function (see OnsetDetection) - periodicity function based on energy bands (see FrequencyBands, TempoScaleBands) Check https://essentia.upf.edu/reference/std_RhythmExtractor.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the audio input signal
`frameHop`	number	<optional>	1024	the number of feature frames separating two evaluations
`frameSize`	number	<optional>	1024	the number audio samples used to compute a feature
`hopSize`	number	<optional>	256	the number of audio samples per features
`lastBeatInterval`	number	<optional>	0.1	the minimum interval between last beat and end of file [s]
`maxTempo`	number	<optional>	208	the fastest tempo to detect [bpm]
`minTempo`	number	<optional>	40	the slowest tempo to detect [bpm]
`numberFrames`	number	<optional>	1024	the number of feature frames to buffer on
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]
`tempoHints`	Array.<any>	<optional>	[]	the optional list of initial beat locations, to favor the detection of pre-determined tempo period and beats alignment [s]
`tolerance`	number	<optional>	0.24	the minimum interval between two consecutive beats [s]
`useBands`	boolean	<optional>	true	whether or not to use band energy as periodicity function
`useOnset`	boolean	<optional>	true	whether or not to use onsets as periodicity function

Returns

Details

RhythmExtractor2013( signal [, maxTempo [, method [, minTempo ] ] ] ) → {object}

Description

This algorithm extracts the beat positions and estimates their confidence as well as tempo in bpm for an audio signal. The beat locations can be computed using: - 'multifeature', the BeatTrackerMultiFeature algorithm - 'degara', the BeatTrackerDegara algorithm (note that there is no confidence estimation for this method, the output confidence value is always 0) Check https://essentia.upf.edu/reference/std_RhythmExtractor2013.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the audio input signal
`maxTempo`	number	<optional>	208	the fastest tempo to detect [bpm]
`method`	string	<optional>	multifeature	the method used for beat tracking
`minTempo`	number	<optional>	40	the slowest tempo to detect [bpm]

Returns

Details

RhythmTransform( melBands [, frameSize [, hopSize ] ] ) → {object}

Description

This algorithm implements the rhythm transform. It computes a tempogram, a representation of rhythmic periodicities in the input signal in the rhythm domain, by using FFT similarly to computation of spectrum in the frequency domain [1]. Additional features, including rhythmic centroid and a rhythmic counterpart of MFCCs, can be derived from this rhythmic representation. Check https://essentia.upf.edu/reference/std_RhythmTransform.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`melBands`	VectorVectorFloat			the energies in the mel bands
`frameSize`	number	<optional>	256	the frame size to compute the rhythm trasform
`hopSize`	number	<optional>	32	the hop size to compute the rhythm transform

Returns

Details

RollOff( spectrum [, cutoff [, sampleRate ] ] ) → {object}

Description

This algorithm computes the roll-off frequency of a spectrum. The roll-off frequency is defined as the frequency under which some percentage (cutoff) of the total energy of the spectrum is contained. The roll-off frequency can be used to distinguish between harmonic (below roll-off) and noisy sounds (above roll-off). Check https://essentia.upf.edu/reference/std_RollOff.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`spectrum`	VectorFloat			the input audio spectrum (must have more than one elements)
`cutoff`	number	<optional>	0.85	the ratio of total energy to attain before yielding the roll-off frequency
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal (used to normalize rollOff) [Hz]

Returns

Details

SNR( frame [, MAAlpha [, MMSEAlpha [, NoiseAlpha [, frameSize [, noiseThreshold [, sampleRate [, useBroadbadNoiseCorrection ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes the SNR of the input audio in a frame-wise manner. The algorithm assumes that: 1. The noise is gaussian. 2. There is a region of noise (without signal) at the beginning of the stream in order to estimate the PSD of the noise.[1] Once the noise PSD is estimated, the algorithm relies on the Ephraim-Malah [2] recursion to estimate the SNR for each frequency bin. The algorithm also returns an overall (a single value for the whole spectrum) SNR estimation and an averaged overall SNR estimation using Exponential Moving Average filtering. This algorithm throws a Warning if less than 15 frames are used to estimte the noise PSD. Check https://essentia.upf.edu/reference/std_SNR.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`frame`	VectorFloat			the input audio frame
`MAAlpha`	number	<optional>	0.95	Alpha coefficient for the EMA SNR estimation [2]
`MMSEAlpha`	number	<optional>	0.98	Alpha coefficient for the MMSE estimation [1].
`NoiseAlpha`	number	<optional>	0.9	Alpha coefficient for the EMA noise estimation [2]
`frameSize`	number	<optional>	512	the size of the input frame
`noiseThreshold`	number	<optional>	-40	Threshold to detect frames without signal
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]
`useBroadbadNoiseCorrection`	boolean	<optional>	true	flag to apply the -10 * log10(BW) broadband noise correction factor

Returns

Details

SaturationDetector( frame [, differentialThreshold [, energyThreshold [, frameSize [, hopSize [, minimumDuration [, sampleRate ] ] ] ] ] ] ) → {object}

Description

this algorithm outputs the staring/ending locations of the saturated regions in seconds. Saturated regions are found by means of a tripe criterion: 1. samples in a saturated region should have more energy than a given threshold. 2. the difference between the samples in a saturated region should be smaller than a given threshold. 3. the duration of the saturated region should be longer than a given threshold. Check https://essentia.upf.edu/reference/std_SaturationDetector.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`frame`	VectorFloat			the input audio frame
`differentialThreshold`	number	<optional>	0.001	minimum difference between contiguous samples of the salturated regions
`energyThreshold`	number	<optional>	-1	mininimum energy of the samples in the saturated regions [dB]
`frameSize`	number	<optional>	512	expected input frame size
`hopSize`	number	<optional>	256	hop size used for the analysis
`minimumDuration`	number	<optional>	0.005	minimum duration of the saturated regions [ms]
`sampleRate`	number	<optional>	44100	sample rate used for the analysis

Returns

Details

Scale( signal [, clipping [, factor [, maxAbsValue ] ] ] ) → {object}

Description

This algorithm scales the audio by the specified factor using clipping if required. Check https://essentia.upf.edu/reference/std_Scale.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input audio signal
`clipping`	boolean	<optional>	true	boolean flag whether to apply clipping or not
`factor`	number	<optional>	10	the multiplication factor by which the audio will be scaled
`maxAbsValue`	number	<optional>	1	the maximum value above which to apply clipping

Returns

Details

SineSubtraction( frame, magnitudes, frequencies, phases [, fftSize [, hopSize [, sampleRate ] ] ] ) → {object}

Description

This algorithm subtracts the sinusoids computed with the sine model analysis from an input audio signal. It ouputs an audio signal. Check https://essentia.upf.edu/reference/std_SineSubtraction.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`frame`	VectorFloat			the input audio frame to subtract from
`magnitudes`	VectorFloat			the magnitudes of the sinusoidal peaks
`frequencies`	VectorFloat			the frequencies of the sinusoidal peaks [Hz]
`phases`	VectorFloat			the phases of the sinusoidal peaks
`fftSize`	number	<optional>	512	the size of the FFT internal process (full spectrum size) and output frame. Minimum twice the hopsize.
`hopSize`	number	<optional>	128	the hop size between frames
`sampleRate`	number	<optional>	44100	the audio sampling rate [Hz]

Returns

Details

SingleBeatLoudness( beat [, beatDuration [, beatWindowDuration [, frequencyBands [, onsetStart [, sampleRate ] ] ] ] ] ) → {object}

Description

This algorithm computes the spectrum energy of a single beat across the whole frequency range and on each specified frequency band given an audio segment. It detects the onset of the beat within the input segment, computes spectrum on a window starting on this onset, and estimates energy (see Energy and EnergyBandRatio algorithms). The frequency bands used by default are: 0-200 Hz, 200-400 Hz, 400-800 Hz, 800-1600 Hz, 1600-3200 Hz, 3200-22000Hz, following E. Scheirer [1]. Check https://essentia.upf.edu/reference/std_SingleBeatLoudness.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`beat`	VectorFloat			audio segement containing a beat
`beatDuration`	number	<optional>	0.05	window size for the beat's energy computation (the window starts at the onset) [s]
`beatWindowDuration`	number	<optional>	0.1	window size for the beat's onset detection [s]
`frequencyBands`	Array.<any>	<optional>	[0, 200, 400, 800, 1600, 3200, 22000]	frequency bands
`onsetStart`	string	<optional>	sumEnergy	criteria for finding the start of the beat
`sampleRate`	number	<optional>	44100	the audio sampling rate [Hz]

Returns

Details

Slicer( audio [, endTimes [, sampleRate [, startTimes [, timeUnits ] ] ] ] ) → {object}

Description

This algorithm splits an audio signal into segments given their start and end times. Check https://essentia.upf.edu/reference/std_Slicer.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`audio`	VectorFloat			the input audio signal
`endTimes`	Array.<any>	<optional>	[]	the list of end times for the slices you want to extract
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]
`startTimes`	Array.<any>	<optional>	[]	the list of start times for the slices you want to extract
`timeUnits`	string	<optional>	seconds	the units of time of the start and end times

Returns

Details

SpectralCentroidTime( array [, sampleRate ] ) → {object}

Description

This algorithm computes the spectral centroid of a signal in time domain. A first difference filter is applied to the input signal. Then the centroid is computed by dividing the norm of the resulting signal by the norm of the input signal. The centroid is given in hertz. References: [1] Udo Zölzer (2002). DAFX Digital Audio Effects pag.364-365 Check https://essentia.upf.edu/reference/std_SpectralCentroidTime.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`array`	VectorFloat			the input array
`sampleRate`	number	<optional>	44100	sampling rate of the input spectrum [Hz]

Returns

Details

SpectralComplexity( spectrum [, magnitudeThreshold [, sampleRate ] ] ) → {object}

Description

This algorithm computes the spectral complexity of a spectrum. The spectral complexity is based on the number of peaks in the input spectrum. Check https://essentia.upf.edu/reference/std_SpectralComplexity.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`spectrum`	VectorFloat			the input spectrum
`magnitudeThreshold`	number	<optional>	0.005	the minimum spectral-peak magnitude that contributes to spectral complexity
`sampleRate`	number	<optional>	44100	the audio sampling rate [Hz]

Returns

Details

SpectralContrast( spectrum [, frameSize [, highFrequencyBound [, lowFrequencyBound [, neighbourRatio [, numberBands [, sampleRate [, staticDistribution ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes the Spectral Contrast feature of a spectrum. It is based on the Octave Based Spectral Contrast feature as described in [1]. The version implemented here is a modified version to improve discriminative power and robustness. The modifications are described in [2]. Check https://essentia.upf.edu/reference/std_SpectralContrast.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`spectrum`	VectorFloat			the audio spectrum
`frameSize`	number	<optional>	2048	the size of the fft frames
`highFrequencyBound`	number	<optional>	11000	the upper bound of the highest band
`lowFrequencyBound`	number	<optional>	20	the lower bound of the lowest band
`neighbourRatio`	number	<optional>	0.4	the ratio of the bins in the sub band used to calculate the peak and valley
`numberBands`	number	<optional>	6	the number of bands in the filter
`sampleRate`	number	<optional>	22050	the sampling rate of the audio signal
`staticDistribution`	number	<optional>	0.15	the ratio of the bins to distribute equally

Returns

Details

SpectralPeaks( spectrum [, magnitudeThreshold [, maxFrequency [, maxPeaks [, minFrequency [, orderBy [, sampleRate ] ] ] ] ] ] ) → {object}

Description

This algorithm extracts peaks from a spectrum. It is important to note that the peak algorithm is independent of an input that is linear or in dB, so one has to adapt the threshold to fit with the type of data fed to it. The algorithm relies on PeakDetection algorithm which is run with parabolic interpolation [1]. The exactness of the peak-searching depends heavily on the windowing type. It gives best results with dB input, a blackman-harris 92dB window and interpolation set to true. According to [1], spectral peak frequencies tend to be about twice as accurate when dB magnitude is used rather than just linear magnitude. For further information about the peak detection, see the description of the PeakDetection algorithm. Check https://essentia.upf.edu/reference/std_SpectralPeaks.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`spectrum`	VectorFloat			the input spectrum
`magnitudeThreshold`	number	<optional>	0	peaks below this given threshold are not outputted
`maxFrequency`	number	<optional>	5000	the maximum frequency of the range to evaluate [Hz]
`maxPeaks`	number	<optional>	100	the maximum number of returned peaks
`minFrequency`	number	<optional>	0	the minimum frequency of the range to evaluate [Hz]
`orderBy`	string	<optional>	frequency	the ordering type of the outputted peaks (ascending by frequency or descending by magnitude)
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]

Returns

Details

SpectralWhitening( spectrum, frequencies, magnitudes [, maxFrequency [, sampleRate ] ] ) → {object}

Description

Performs spectral whitening of spectral peaks of a spectrum. The algorithm works in dB scale, but the conversion is done by the algorithm so input should be in linear scale. The concept of 'whitening' refers to 'white noise' or a non-zero flat spectrum. It first computes a spectral envelope similar to the 'true envelope' in [1], and then modifies the amplitude of each peak relative to the envelope. For example, the predominant peaks will have a value close to 0dB because they are very close to the envelope. On the other hand, minor peaks between significant peaks will have lower amplitudes such as -30dB. Check https://essentia.upf.edu/reference/std_SpectralWhitening.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`spectrum`	VectorFloat			the audio linear spectrum
`frequencies`	VectorFloat			the spectral peaks' linear frequencies
`magnitudes`	VectorFloat			the spectral peaks' linear magnitudes
`maxFrequency`	number	<optional>	5000	max frequency to apply whitening to [Hz]
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]

Returns

Details

Spectrum( frame [, size ] ) → {object}

Description

This algorithm computes the magnitude spectrum of an array of Reals. The resulting magnitude spectrum has a size which is half the size of the input array plus one. Bins contain raw (linear) magnitude values. Check https://essentia.upf.edu/reference/std_Spectrum.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`frame`	VectorFloat			the input audio frame
`size`	number	<optional>	2048	the expected size of the input audio signal (this is an optional parameter to optimize memory allocation)

Returns

Details

SpectrumCQ( frame [, binsPerOctave [, minFrequency [, minimumKernelSize [, numberBins [, sampleRate [, scale [, threshold [, windowType [, zeroPhase ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes the magnitude of the Constant-Q spectrum. See ConstantQ algorithm for more details. Check https://essentia.upf.edu/reference/std_SpectrumCQ.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`frame`	VectorFloat			the input audio frame
`binsPerOctave`	number	<optional>	12	number of bins per octave
`minFrequency`	number	<optional>	32.7	minimum frequency [Hz]
`minimumKernelSize`	number	<optional>	4	minimum size allowed for frequency kernels
`numberBins`	number	<optional>	84	number of frequency bins, starting at minFrequency
`sampleRate`	number	<optional>	44100	FFT sampling rate [Hz]
`scale`	number	<optional>	1	filters scale. Larger values use longer windows
`threshold`	number	<optional>	0.01	bins whose magnitude is below this quantile are discarded
`windowType`	string	<optional>	hann	the window type, which can be 'hamming', 'hann', 'triangular', 'square' or 'blackmanharrisXX'
`zeroPhase`	boolean	<optional>	true	a boolean value that enables zero-phase windowing. Input audio frames should be windowed with the same phase mode

Returns

Details

SpectrumToCent( spectrum [, bands [, centBinResolution [, inputSize [, log [, minimumFrequency [, normalize [, sampleRate [, type ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes energy in triangular frequency bands of a spectrum equally spaced on the cent scale. Each band is computed to have a constant wideness in the cent scale. For each band the power-spectrum (mag-squared) is summed. Check https://essentia.upf.edu/reference/std_SpectrumToCent.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`spectrum`	VectorFloat			the input spectrum (must be greater than size one)
`bands`	number	<optional>	720	number of bins to compute. Default is 720 (6 octaves with the default 'centBinResolution')
`centBinResolution`	number	<optional>	10	Width of each band in cents. Default is 10 cents
`inputSize`	number	<optional>	32768	the size of the spectrum
`log`	boolean	<optional>	true	compute log-energies (log10 (1 + energy))
`minimumFrequency`	number	<optional>	164	central frequency of the first band of the bank [Hz]
`normalize`	string	<optional>	unit_sum	use unit area or vertex equal to 1 triangles.
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]
`type`	string	<optional>	power	use magnitude or power spectrum

Returns

Details

Spline( x [, beta1 [, beta2 [, type [, xPoints [, yPoints ] ] ] ] ] ) → {object}

Description

Evaluates a piecewise spline of type b, beta or quadratic. The input value, i.e. the point at which the spline is to be evaluated typically should be between xPoins[0] and xPoinst[size-1]. If the value lies outside this range, extrapolation is used. Regarding spline types: - B: evaluates a cubic B spline approximant. - Beta: evaluates a cubic beta spline approximant. For beta splines parameters 'beta1' and 'beta2' can be supplied. For no bias set beta1 to 1 and for no tension set beta2 to 0. Note that if beta1=1 and beta2=0, the cubic beta becomes a cubic B spline. On the other hand if beta1=1 and beta2 is large the beta spline turns into a linear spline. - Quadratic: evaluates a piecewise quadratic spline at a point. Note that size of input must be odd. Check https://essentia.upf.edu/reference/std_Spline.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`x`	number			the input coordinate (x-axis)
`beta1`	number	<optional>	1	the skew or bias parameter (only available for type beta)
`beta2`	number	<optional>	0	the tension parameter
`type`	string	<optional>	b	the type of spline to be computed
`xPoints`	Array.<any>	<optional>	[0, 1]	the x-coordinates where data is specified (the points must be arranged in ascending order and cannot contain duplicates)
`yPoints`	Array.<any>	<optional>	[0, 1]	the y-coordinates to be interpolated (i.e. the known data)

Returns

Details

SprModelAnal( frame [, fftSize [, freqDevOffset [, freqDevSlope [, hopSize [, magnitudeThreshold [, maxFrequency [, maxPeaks [, maxnSines [, minFrequency [, orderBy [, sampleRate ] ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes the sinusoidal plus residual model analysis. Check https://essentia.upf.edu/reference/std_SprModelAnal.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`frame`	VectorFloat			the input frame
`fftSize`	number	<optional>	2048	the size of the internal FFT size (full spectrum size)
`freqDevOffset`	number	<optional>	20	minimum frequency deviation at 0Hz
`freqDevSlope`	number	<optional>	0.01	slope increase of minimum frequency deviation
`hopSize`	number	<optional>	512	the hop size between frames
`magnitudeThreshold`	number	<optional>	0	peaks below this given threshold are not outputted
`maxFrequency`	number	<optional>	5000	the maximum frequency of the range to evaluate [Hz]
`maxPeaks`	number	<optional>	100	the maximum number of returned peaks
`maxnSines`	number	<optional>	100	maximum number of sines per frame
`minFrequency`	number	<optional>	0	the minimum frequency of the range to evaluate [Hz]
`orderBy`	string	<optional>	frequency	the ordering type of the outputted peaks (ascending by frequency or descending by magnitude)
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]

Returns

Details

SprModelSynth( magnitudes, frequencies, phases, res [, fftSize [, hopSize [, sampleRate ] ] ] ) → {object}

Description

This algorithm computes the sinusoidal plus residual model synthesis from SPS model analysis. Check https://essentia.upf.edu/reference/std_SprModelSynth.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`magnitudes`	VectorFloat			the magnitudes of the sinusoidal peaks
`frequencies`	VectorFloat			the frequencies of the sinusoidal peaks [Hz]
`phases`	VectorFloat			the phases of the sinusoidal peaks
`res`	VectorFloat			the residual frame
`fftSize`	number	<optional>	2048	the size of the output FFT frame (full spectrum size)
`hopSize`	number	<optional>	512	the hop size between frames
`sampleRate`	number	<optional>	44100	the audio sampling rate [Hz]

Returns

Details

SpsModelAnal( frame [, fftSize [, freqDevOffset [, freqDevSlope [, hopSize [, magnitudeThreshold [, maxFrequency [, maxPeaks [, maxnSines [, minFrequency [, orderBy [, sampleRate [, stocf ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes the stochastic model analysis. Check https://essentia.upf.edu/reference/std_SpsModelAnal.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`frame`	VectorFloat			the input frame
`fftSize`	number	<optional>	2048	the size of the internal FFT size (full spectrum size)
`freqDevOffset`	number	<optional>	20	minimum frequency deviation at 0Hz
`freqDevSlope`	number	<optional>	0.01	slope increase of minimum frequency deviation
`hopSize`	number	<optional>	512	the hop size between frames
`magnitudeThreshold`	number	<optional>	0	peaks below this given threshold are not outputted
`maxFrequency`	number	<optional>	5000	the maximum frequency of the range to evaluate [Hz]
`maxPeaks`	number	<optional>	100	the maximum number of returned peaks
`maxnSines`	number	<optional>	100	maximum number of sines per frame
`minFrequency`	number	<optional>	0	the minimum frequency of the range to evaluate [Hz]
`orderBy`	string	<optional>	frequency	the ordering type of the outputted peaks (ascending by frequency or descending by magnitude)
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]
`stocf`	number	<optional>	0.2	decimation factor used for the stochastic approximation

Returns

Details

SpsModelSynth( magnitudes, frequencies, phases, stocenv [, fftSize [, hopSize [, sampleRate [, stocf ] ] ] ] ) → {object}

Description

This algorithm computes the sinusoidal plus stochastic model synthesis from SPS model analysis. Check https://essentia.upf.edu/reference/std_SpsModelSynth.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`magnitudes`	VectorFloat			the magnitudes of the sinusoidal peaks
`frequencies`	VectorFloat			the frequencies of the sinusoidal peaks [Hz]
`phases`	VectorFloat			the phases of the sinusoidal peaks
`stocenv`	VectorFloat			the stochastic envelope
`fftSize`	number	<optional>	2048	the size of the output FFT frame (full spectrum size)
`hopSize`	number	<optional>	512	the hop size between frames
`sampleRate`	number	<optional>	44100	the audio sampling rate [Hz]
`stocf`	number	<optional>	0.2	decimation factor used for the stochastic approximation

Returns

Details

StartStopCut( audio [, frameSize [, hopSize [, maximumStartTime [, maximumStopTime [, sampleRate [, threshold ] ] ] ] ] ] ) → {object}

Description

This algorithm outputs if there is a cut at the beginning or at the end of the audio by locating the first and last non-silent frames and comparing their positions to the actual beginning and end of the audio. The input audio is considered to be cut at the beginning (or the end) and the corresponding flag is activated if the first (last) non-silent frame occurs before (after) the configurable time threshold. Check https://essentia.upf.edu/reference/std_StartStopCut.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`audio`	VectorFloat			the input audio
`frameSize`	number	<optional>	256	the frame size for the internal power analysis
`hopSize`	number	<optional>	256	the hop size for the internal power analysis
`maximumStartTime`	number	<optional>	10	if the first non-silent frame occurs before maximumStartTime startCut is activated [ms]
`maximumStopTime`	number	<optional>	10	if the last non-silent frame occurs after maximumStopTime to the end stopCut is activated [ms]
`sampleRate`	number	<optional>	44100	the sample rate
`threshold`	number	<optional>	-60	the threshold below which average energy is defined as silence [dB]

Returns

Details

StartStopSilence( frame [, threshold ] ) → {object}

Description

This algorithm outputs the frame at which sound begins and the frame at which sound ends. Check https://essentia.upf.edu/reference/std_StartStopSilence.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`frame`	VectorFloat			the input audio frames
`threshold`	number	<optional>	-60	the threshold below which average energy is defined as silence [dB]

Returns

Details

StochasticModelAnal( frame [, fftSize [, hopSize [, sampleRate [, stocf ] ] ] ] ) → {object}

Description

This algorithm computes the stochastic model analysis. It gets the resampled spectral envelope of the stochastic component. Check https://essentia.upf.edu/reference/std_StochasticModelAnal.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`frame`	VectorFloat			the input frame
`fftSize`	number	<optional>	2048	the size of the internal FFT size (full spectrum size)
`hopSize`	number	<optional>	512	the hop size between frames
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]
`stocf`	number	<optional>	0.2	decimation factor used for the stochastic approximation

Returns

Details

StochasticModelSynth( stocenv [, fftSize [, hopSize [, sampleRate [, stocf ] ] ] ] ) → {object}

Description

This algorithm computes the stochastic model synthesis. It generates the noisy spectrum from a resampled spectral envelope of the stochastic component. Check https://essentia.upf.edu/reference/std_StochasticModelSynth.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`stocenv`	VectorFloat			the stochastic envelope input
`fftSize`	number	<optional>	2048	the size of the internal FFT size (full spectrum size)
`hopSize`	number	<optional>	512	the hop size between frames
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]
`stocf`	number	<optional>	0.2	decimation factor used for the stochastic approximation

Returns

Details

StrongDecay( signal [, sampleRate ] ) → {object}

Description

This algorithm computes the Strong Decay of an audio signal. The Strong Decay is built from the non-linear combination of the signal energy and the signal temporal centroid, the latter being the balance of the absolute value of the signal. A signal containing a temporal centroid near its start boundary and a strong energy is said to have a strong decay. Check https://essentia.upf.edu/reference/std_StrongDecay.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input audio signal
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]

Returns

Details

StrongPeak( spectrum ) → {object}

Description

This algorithm computes the Strong Peak of a spectrum. The Strong Peak is defined as the ratio between the spectrum's maximum peak's magnitude and the "bandwidth" of the peak above a threshold (half its amplitude). This ratio reveals whether the spectrum presents a very "pronounced" maximum peak (i.e. the thinner and the higher the maximum of the spectrum is, the higher the ratio value). Check https://essentia.upf.edu/reference/std_StrongPeak.html for more details.

Parameters

Name	Type	Description
`spectrum`	VectorFloat	the input spectrum (must be greater than one element and cannot contain negative values)

Returns

Details

SuperFluxExtractor( signal [, combine [, frameSize [, hopSize [, ratioThreshold [, sampleRate [, threshold ] ] ] ] ] ] ) → {object}

Description

This algorithm detects onsets given an audio signal using SuperFlux algorithm. This implementation is based on the available reference implementation in python [2]. The algorithm computes spectrum of the input signal, summarizes it into triangular band energies, and computes a onset detection function based on spectral flux tracking spectral trajectories with a maximum filter (SuperFluxNovelty). The peaks of the function are then detected (SuperFluxPeaks). Check https://essentia.upf.edu/reference/std_SuperFluxExtractor.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the audio input signal
`combine`	number	<optional>	20	time threshold for double onsets detections (ms)
`frameSize`	number	<optional>	2048	the frame size for computing low-level features
`hopSize`	number	<optional>	256	the hop size for computing low-level features
`ratioThreshold`	number	<optional>	16	ratio threshold for peak picking with respect to novelty_signal/novelty_average rate, use 0 to disable it (for low-energy onsets)
`sampleRate`	number	<optional>	44100	the audio sampling rate [Hz]
`threshold`	number	<optional>	0.05	threshold for peak peaking with respect to the difference between novelty_signal and average_signal (for onsets in ambient noise)

Returns

Details

SuperFluxNovelty( bands [, binWidth [, frameWidth ] ] ) → {object}

Description

Onset detection function for Superflux algorithm. See SuperFluxExtractor for more details. Check https://essentia.upf.edu/reference/std_SuperFluxNovelty.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`bands`	VectorVectorFloat			the input bands spectrogram
`binWidth`	number	<optional>	3	filter width (number of frequency bins)
`frameWidth`	number	<optional>	2	differentiation offset (compute the difference with the N-th previous frame)

Returns

Details

SuperFluxPeaks( novelty [, combine [, frameRate [, pre_avg [, pre_max [, ratioThreshold [, threshold ] ] ] ] ] ] ) → {object}

Description

This algorithm detects peaks of an onset detection function computed by the SuperFluxNovelty algorithm. See SuperFluxExtractor for more details. Check https://essentia.upf.edu/reference/std_SuperFluxPeaks.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`novelty`	VectorFloat			the input onset detection function
`combine`	number	<optional>	30	time threshold for double onsets detections (ms)
`frameRate`	number	<optional>	172	frameRate
`pre_avg`	number	<optional>	100	look back duration for moving average filter [ms]
`pre_max`	number	<optional>	30	look back duration for moving maximum filter [ms]
`ratioThreshold`	number	<optional>	16	ratio threshold for peak picking with respect to novelty_signal/novelty_average rate, use 0 to disable it (for low-energy onsets)
`threshold`	number	<optional>	0.05	threshold for peak peaking with respect to the difference between novelty_signal and average_signal (for onsets in ambient noise)

Returns

Details

TCToTotal( envelope ) → {object}

Description

This algorithm calculates the ratio of the temporal centroid to the total length of a signal envelope. This ratio shows how the sound is 'balanced'. Its value is close to 0 if most of the energy lies at the beginning of the sound (e.g. decrescendo or impulsive sounds), close to 0.5 if the sound is symetric (e.g. 'delta unvarying' sounds), and close to 1 if most of the energy lies at the end of the sound (e.g. crescendo sounds). Check https://essentia.upf.edu/reference/std_TCToTotal.html for more details.

Parameters

Name	Type	Description
`envelope`	VectorFloat	the envelope of the signal (its length must be greater than 1

Returns

Details

TempoScaleBands( bands [, bandsGain [, frameTime ] ] ) → {object}

Description

This algorithm computes features for tempo tracking to be used with the TempoTap algorithm. See standard_rhythmextractor_tempotap in examples folder. Check https://essentia.upf.edu/reference/std_TempoScaleBands.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`bands`	VectorFloat			the audio power spectrum divided into bands
`bandsGain`	Array.<any>	<optional>	[2, 3, 2, 1, 1.20000004768, 2, 3, 2.5]	gain for each bands
`frameTime`	number	<optional>	512	the frame rate in samples

Returns

Details

TempoTap( featuresFrame [, frameHop [, frameSize [, maxTempo [, minTempo [, numberFrames [, sampleRate [, tempoHints ] ] ] ] ] ] ] ) → {object}

Description

This algorithm estimates the periods and phases of a periodic signal, represented by a sequence of values of any number of detection functions, such as energy bands, onsets locations, etc. It requires to be sequentially run on a vector of such values ("featuresFrame") for each particular audio frame in order to get estimations related to that frames. The estimations are done for each detection function separately, utilizing the latest "frameHop" frames, including the present one, to compute autocorrelation. Empty estimations will be returned until enough frames are accumulated in the algorithm's buffer. The algorithm uses elements of the following beat-tracking methods: - BeatIt, elaborated by Fabien Gouyon and Simon Dixon (input features) [1] - Multi-comb filter with Rayleigh weighting, Mathew Davies [2] Check https://essentia.upf.edu/reference/std_TempoTap.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`featuresFrame`	VectorFloat			input temporal features of a frame
`frameHop`	number	<optional>	1024	number of feature frames separating two evaluations
`frameSize`	number	<optional>	256	number of audio samples in a frame
`maxTempo`	number	<optional>	208	fastest tempo allowed to be detected [bpm]
`minTempo`	number	<optional>	40	slowest tempo allowed to be detected [bpm]
`numberFrames`	number	<optional>	1024	number of feature frames to buffer on
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]
`tempoHints`	Array.<any>	<optional>	[]	optional list of initial beat locations, to favor the detection of pre-determined tempo period and beats alignment [s]

Returns

Details

TempoTapDegara( onsetDetections [, maxTempo [, minTempo [, resample [, sampleRateODF ] ] ] ] ) → {object}

Description

This algorithm estimates beat positions given an onset detection function. The detection function is partitioned into 6-second frames with a 1.5-second increment, and the autocorrelation is computed for each frame, and is weighted by a tempo preference curve [2]. Periodicity estimations are done frame-wisely, searching for the best match with the Viterbi algorith [3]. The estimated periods are then passed to the probabilistic beat tracking algorithm [1], which computes beat positions. Check https://essentia.upf.edu/reference/std_TempoTapDegara.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`onsetDetections`	VectorFloat			the input frame-wise vector of onset detection values
`maxTempo`	number	<optional>	208	fastest tempo allowed to be detected [bpm]
`minTempo`	number	<optional>	40	slowest tempo allowed to be detected [bpm]
`resample`	string	<optional>	none	use upsampling of the onset detection function (may increase accuracy)
`sampleRateODF`	number	<optional>	86.1328	the sampling rate of the onset detection function [Hz]

Returns

Details

TempoTapMaxAgreement( tickCandidates ) → {object}

Description

This algorithm outputs beat positions and confidence of their estimation based on the maximum mutual agreement between beat candidates estimated by different beat trackers (or using different features). Check https://essentia.upf.edu/reference/std_TempoTapMaxAgreement.html for more details.

Parameters

Name	Type	Description
`tickCandidates`	VectorVectorFloat	the tick candidates estimated using different beat trackers (or features) [s]

Returns

Details

TempoTapTicks( periods, phases [, frameHop [, hopSize [, sampleRate ] ] ] ) → {object}

Description

This algorithm builds the list of ticks from the period and phase candidates given by the TempoTap algorithm. Check https://essentia.upf.edu/reference/std_TempoTapTicks.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`periods`	VectorFloat			tempo period candidates for the current frame, in frames
`phases`	VectorFloat			tempo ticks phase candidates for the current frame, in frames
`frameHop`	number	<optional>	512	number of feature frames separating two evaluations
`hopSize`	number	<optional>	256	number of audio samples per features
`sampleRate`	number	<optional>	44100	sampling rate of the audio signal [Hz]

Returns

Details

TensorflowInputMusiCNN( frame ) → {object}

Description

This algorithm computes mel-bands with a particular parametrization specific to MusiCNN based models. Check https://essentia.upf.edu/reference/std_TensorflowInputMusiCNN.html for more details.

Parameters

Name	Type	Description
`frame`	VectorFloat	the audio frame

Returns

Details

TensorflowInputVGGish( frame ) → {object}

Description

This algorithm computes mel-bands with a particular parametrization specific to VGGish based models. Check https://essentia.upf.edu/reference/std_TensorflowInputVGGish.html for more details.

Parameters

Name	Type	Description
`frame`	VectorFloat	the audio frame

Returns

Details

TonalExtractor( signal [, frameSize [, hopSize [, tuningFrequency ] ] ] ) → {object}

Description

This algorithm computes tonal features for an audio signal Check https://essentia.upf.edu/reference/std_TonalExtractor.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the audio input signal
`frameSize`	number	<optional>	4096	the framesize for computing tonal features
`hopSize`	number	<optional>	2048	the hopsize for computing tonal features
`tuningFrequency`	number	<optional>	440	the tuning frequency of the input signal

Returns

Details

TonicIndianArtMusic( signal [, binResolution [, frameSize [, harmonicWeight [, hopSize [, magnitudeCompression [, magnitudeThreshold [, maxTonicFrequency [, minTonicFrequency [, numberHarmonics [, numberSaliencePeaks [, referenceFrequency [, sampleRate ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm estimates the tonic frequency of the lead artist in Indian art music. It uses multipitch representation of the audio signal (pitch salience) to compute a histogram using which the tonic is identified as one of its peak. The decision is made based on the distance between the prominent peaks, the classification is done using a decision tree. Check https://essentia.upf.edu/reference/std_TonicIndianArtMusic.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input signal
`binResolution`	number	<optional>	10	salience function bin resolution [cents]
`frameSize`	number	<optional>	2048	the frame size for computing pitch saliecnce
`harmonicWeight`	number	<optional>	0.85	harmonic weighting parameter (weight decay ratio between two consequent harmonics, =1 for no decay)
`hopSize`	number	<optional>	512	the hop size with which the pitch salience function was computed
`magnitudeCompression`	number	<optional>	1	magnitude compression parameter (=0 for maximum compression, =1 for no compression)
`magnitudeThreshold`	number	<optional>	40	peak magnitude threshold (maximum allowed difference from the highest peak in dBs)
`maxTonicFrequency`	number	<optional>	375	the maximum allowed tonic frequency [Hz]
`minTonicFrequency`	number	<optional>	100	the minimum allowed tonic frequency [Hz]
`numberHarmonics`	number	<optional>	20	number of considered hamonics
`numberSaliencePeaks`	number	<optional>	5	number of top peaks of the salience function which should be considered for constructing histogram
`referenceFrequency`	number	<optional>	55	the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]

Returns

Details

TriangularBands( spectrum [, frequencyBands [, inputSize [, log [, normalize [, sampleRate [, type [, weighting ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes energy in triangular frequency bands of a spectrum. The arbitrary number of overlapping bands can be specified. For each band the power-spectrum (mag-squared) is summed. Check https://essentia.upf.edu/reference/std_TriangularBands.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`spectrum`	VectorFloat			the input spectrum (must be greater than size one)
`frequencyBands`	Array.<any>	<optional>	[21.533203125, 43.06640625, 64.599609375, 86.1328125, 107.666015625, 129.19921875, 150.732421875, 172.265625, 193.798828125, 215.33203125, 236.865234375, 258.3984375, 279.931640625, 301.46484375, 322.998046875, 344.53125, 366.064453125, 387.59765625, 409.130859375, 430.6640625, 452.197265625, 473.73046875, 495.263671875, 516.796875, 538.330078125, 559.86328125, 581.396484375, 602.9296875, 624.462890625, 645.99609375, 667.529296875, 689.0625, 710.595703125, 732.12890625, 753.662109375, 775.1953125, 796.728515625, 839.794921875, 861.328125, 882.861328125, 904.39453125, 925.927734375, 968.994140625, 990.52734375, 1012.06054688, 1055.12695312, 1076.66015625, 1098.19335938, 1141.25976562, 1184.32617188, 1205.859375, 1248.92578125, 1270.45898438, 1313.52539062, 1356.59179688, 1399.65820312, 1442.72460938, 1485.79101562, 1528.85742188, 1571.92382812, 1614.99023438, 1658.05664062, 1701.12304688, 1765.72265625, 1808.7890625, 1873.38867188, 1916.45507812, 1981.0546875, 2024.12109375, 2088.72070312, 2153.3203125, 2217.91992188, 2282.51953125, 2347.11914062, 2411.71875, 2497.8515625, 2562.45117188, 2627.05078125, 2713.18359375, 2799.31640625, 2885.44921875, 2950.04882812, 3036.18164062, 3143.84765625, 3229.98046875, 3316.11328125, 3423.77929688, 3509.91210938, 3617.578125, 3725.24414062, 3832.91015625, 3940.57617188, 4069.77539062, 4177.44140625, 4306.640625, 4435.83984375, 4565.0390625, 4694.23828125, 4844.97070312, 4974.16992188, 5124.90234375, 5275.63476562, 5426.3671875, 5577.09960938, 5749.36523438, 5921.63085938, 6093.89648438, 6266.16210938, 6459.9609375, 6653.75976562, 6847.55859375, 7041.35742188, 7256.68945312, 7450.48828125, 7687.35351562, 7902.68554688, 8139.55078125, 8376.41601562, 8613.28125, 8871.6796875, 9130.078125, 9388.4765625, 9668.40820312, 9948.33984375, 10249.8046875, 10551.2695312, 10852.734375, 11175.7324219, 11498.7304688, 11843.2617188, 12187.7929688, 12553.8574219, 12919.921875, 13285.9863281, 13673.5839844, 14082.7148438, 14491.8457031, 14922.5097656, 15353.1738281, 15805.3710938, 16257.5683594]	list of frequency ranges into which the spectrum is divided (these must be in ascending order and connot contain duplicates),each triangle is build as x(i-1)=0, x(i)=1, x(i+1)=0 over i, the resulting number of bands is size of input array - 2
`inputSize`	number	<optional>	1025	the size of the spectrum
`log`	boolean	<optional>	true	compute log-energies (log10 (1 + energy))
`normalize`	string	<optional>	unit_sum	spectrum bin weights to use for each triangular band: 'unit_max' to make each triangle vertex equal to 1, 'unit_sum' to make each triangle area equal to 1 summing the actual weights of spectrum bins, 'unit_area' to make each triangle area equal to 1 normalizing the weights of each triangle by its bandwidth
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]
`type`	string	<optional>	power	use magnitude or power spectrum
`weighting`	string	<optional>	linear	type of weighting function for determining triangle area

Returns

Details

TriangularBarkBands( spectrum [, highFrequencyBound [, inputSize [, log [, lowFrequencyBound [, normalize [, numberBands [, sampleRate [, type [, weighting ] ] ] ] ] ] ] ] ] ) → {object}

Description

This algorithm computes energy in the bark bands of a spectrum. It is different to the regular BarkBands algorithm in that is more configurable so that it can be used in the BFCC algorithm to produce output similar to Rastamat (http://www.ee.columbia.edu/ln/rosa/matlab/rastamat/) See the BFCC algorithm documentation for more information as to why you might want to choose this over Mel frequency analysis It is recommended that the input "spectrum" be calculated by the Spectrum algorithm. Check https://essentia.upf.edu/reference/std_TriangularBarkBands.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`spectrum`	VectorFloat			the audio spectrum
`highFrequencyBound`	number	<optional>	22050	an upper-bound limit for the frequencies to be included in the bands
`inputSize`	number	<optional>	1025	the size of the spectrum
`log`	boolean	<optional>	false	compute log-energies (log10 (1 + energy))
`lowFrequencyBound`	number	<optional>	0	a lower-bound limit for the frequencies to be included in the bands
`normalize`	string	<optional>	unit_sum	'unit_max' makes the vertex of all the triangles equal to 1, 'unit_sum' makes the area of all the triangles equal to 1
`numberBands`	number	<optional>	24	the number of output bands
`sampleRate`	number	<optional>	44100	the sample rate
`type`	string	<optional>	power	'power' to output squared units, 'magnitude' to keep it as the input
`weighting`	string	<optional>	warping	type of weighting function for determining triangle area

Returns

Details

Trimmer( signal [, checkRange [, endTime [, sampleRate [, startTime ] ] ] ] ) → {object}

Description

This algorithm extracts a segment of an audio signal given its start and end times. Giving "startTime" greater than "endTime" will raise an exception. Check https://essentia.upf.edu/reference/std_Trimmer.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input signal
`checkRange`	boolean	<optional>	false	check whether the specified time range for a slice fits the size of input signal (throw exception if not)
`endTime`	number	<optional>	1e+06	the end time of the slice you want to extract [s]
`sampleRate`	number	<optional>	44100	the sampling rate of the input audio signal [Hz]
`startTime`	number	<optional>	0	the start time of the slice you want to extract [s]

Returns

Details

Tristimulus( frequencies, magnitudes ) → {object}

Description

This algorithm calculates the tristimulus of a signal given its harmonic peaks. The tristimulus has been introduced as a timbre equivalent to the color attributes in the vision. Tristimulus measures the mixture of harmonics in a given sound, grouped into three sections. The first tristimulus measures the relative weight of the first harmonic; the second tristimulus measures the relative weight of the second, third, and fourth harmonics taken together; and the third tristimulus measures the relative weight of all the remaining harmonics. Check https://essentia.upf.edu/reference/std_Tristimulus.html for more details.

Parameters

Name	Type	Description
`frequencies`	VectorFloat	the frequencies of the harmonic peaks ordered by frequency
`magnitudes`	VectorFloat	the magnitudes of the harmonic peaks ordered by frequency

Returns

Details

TruePeakDetector( signal [, blockDC [, emphasise [, oversamplingFactor [, quality [, sampleRate [, threshold [, version ] ] ] ] ] ] ] ) → {object}

Description

This algorithm implements a “true-peak” level meter for clipping detection. According to the ITU-R recommendations, “true-peak” values overcoming the full-scale range are potential sources of “clipping in subsequent processes, such as within particular D/A converters or during sample-rate conversion”. The ITU-R BS.1770-4[1] (by default) and the ITU-R BS.1770-2[2] signal-flows can be used. Go to the references for information about the differences. Only the peaks (if any) exceeding the configurable amplitude threshold are returned. Note: the parameters 'blockDC' and 'emphasise' work only when 'version' is set to 2. References: [1] Series, B. S. (2011). Recommendation ITU-R BS.1770-4. Algorithms to measure audio programme loudness and true-peak audio level, https://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.1770-4-201510-I!!PDF-E.pdf [2] Series, B. S. (2011). Recommendation ITU-R BS.1770-2. Algorithms to measure audio programme loudness and true-peak audio level, https://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.1770-2-201103-S!!PDF-E.pdf Check https://essentia.upf.edu/reference/std_TruePeakDetector.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input audio signal
`blockDC`	boolean	<optional>	false	flag to activate the optional DC blocker
`emphasise`	boolean	<optional>	false	flag to activate the optional emphasis filter
`oversamplingFactor`	number	<optional>	4	times the signal is oversapled
`quality`	number	<optional>	1	type of interpolation applied (see libresmple)
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]
`threshold`	number	<optional>	-0.0002	threshold to detect peaks [dB]
`version`	number	<optional>	4	algorithm version

Returns

Details

TuningFrequency( frequencies, magnitudes [, resolution ] ) → {object}

Description

This algorithm estimates the tuning frequency give a sequence/set of spectral peaks. The result is the tuning frequency in Hz, and its distance from 440Hz in cents. This version is slightly adapted from the original algorithm [1], but gives the same results. Check https://essentia.upf.edu/reference/std_TuningFrequency.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`frequencies`	VectorFloat			the frequencies of the spectral peaks [Hz]
`magnitudes`	VectorFloat			the magnitudes of the spectral peaks
`resolution`	number	<optional>	1	resolution in cents (logarithmic scale, 100 cents = 1 semitone) for tuning frequency determination

Returns

Details

TuningFrequencyExtractor( signal [, frameSize [, hopSize ] ] ) → {object}

Description

This algorithm extracts the tuning frequency of an audio signal Check https://essentia.upf.edu/reference/std_TuningFrequencyExtractor.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the audio input signal
`frameSize`	number	<optional>	4096	the frameSize for computing tuning frequency
`hopSize`	number	<optional>	2048	the hopsize for computing tuning frequency

Returns

Details

UnaryOperator( array [, scale [, shift [, type ] ] ] ) → {object}

Description

This algorithm performs basic arithmetical operations element by element given an array. Note: - log and ln are equivalent to the natural logarithm - for log, ln, log10 and lin2db, x is clipped to 1e-30 for x<1e-30 - for x<0, sqrt(x) is invalid - scale and shift parameters define linear transformation to be applied to the resulting elements Check https://essentia.upf.edu/reference/std_UnaryOperator.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`array`	VectorFloat			the input array
`scale`	number	<optional>	1	multiply result by factor
`shift`	number	<optional>	0	shift result by value (add value)
`type`	string	<optional>	identity	the type of the unary operator to apply to input array

Returns

Details

UnaryOperatorStream( array [, scale [, shift [, type ] ] ] ) → {object}

Description

Parameters

Name	Type	Attributes	Default	Description
`array`	VectorFloat			the input array
`scale`	number	<optional>	1	multiply result by factor
`shift`	number	<optional>	0	shift result by value (add value)
`type`	string	<optional>	identity	the type of the unary operator to apply to input array

Returns

Details

Variance( array ) → {object}

Description

This algorithm computes the variance of an array. Check https://essentia.upf.edu/reference/std_Variance.html for more details.

Parameters

Name	Type	Description
`array`	VectorFloat	the input array

Returns

Details

Vibrato( pitch [, maxExtend [, maxFrequency [, minExtend [, minFrequency [, sampleRate ] ] ] ] ] ) → {object}

Description

This algorithm detects the presence of vibrato and estimates its parameters given a pitch contour [Hz]. The result is the vibrato frequency in Hz and the extent (peak to peak) in cents. If no vibrato is detected in a frame, the output of both values is zero. Check https://essentia.upf.edu/reference/std_Vibrato.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`pitch`	VectorFloat			the pitch trajectory [Hz].
`maxExtend`	number	<optional>	250	maximum considered vibrato extent [cents]
`maxFrequency`	number	<optional>	8	maximum considered vibrato frequency [Hz]
`minExtend`	number	<optional>	50	minimum considered vibrato extent [cents]
`minFrequency`	number	<optional>	4	minimum considered vibrato frequency [Hz]
`sampleRate`	number	<optional>	344.531	sample rate of the input pitch contour

Returns

Details

WarpedAutoCorrelation( array [, maxLag [, sampleRate ] ] ) → {object}

Description

This algorithm computes the warped auto-correlation of an audio signal. The implementation is an adapted version of K. Schmidt's implementation of the matlab algorithm from the 'warped toolbox' by Aki Harma and Matti Karjalainen found [2]. For a detailed explanation of the algorithm, see [1]. This algorithm is only defined for positive lambda = 1.0674sqrt(2.0atan(0.00006583*sampleRate)/PI) - 0.1916, thus it will throw an exception when the supplied sampling rate does not pass the requirements. If maxLag is larger than the size of the input array, an exception is thrown. Check https://essentia.upf.edu/reference/std_WarpedAutoCorrelation.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`array`	VectorFloat			the array to be analyzed
`maxLag`	number	<optional>	1	the maximum lag for which the auto-correlation is computed (inclusive) (must be smaller than signal size)
`sampleRate`	number	<optional>	44100	the audio sampling rate [Hz]

Returns

Details

Welch( frame [, averagingFrames [, fftSize [, frameSize [, sampleRate [, scaling [, windowType ] ] ] ] ] ] ) → {object}

Description

This algorithm estimates the Power Spectral Density of the input signal using the Welch's method [1]. The input should be fed with the overlapped audio frames. The algorithm stores internally therequired past frames to compute each output. Call reset() to clear the buffers. This implentation is based on Scipy [2] Check https://essentia.upf.edu/reference/std_Welch.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`frame`	VectorFloat			the input stereo audio signal
`averagingFrames`	number	<optional>	10	amount of frames to average
`fftSize`	number	<optional>	1024	size of the FFT. Zero padding is added if this is larger the input frame size.
`frameSize`	number	<optional>	512	the expected size of the input audio signal (this is an optional parameter to optimize memory allocation)
`sampleRate`	number	<optional>	44100	the sampling rate of the audio signal [Hz]
`scaling`	string	<optional>	density	'density' normalizes the result to the bandwidth while 'power' outputs the unnormalized power spectrum
`windowType`	string	<optional>	hann	the window type, which can be 'hamming', 'hann', 'triangular', 'square' or 'blackmanharrisXX'

Returns

Details

Windowing( frame [, normalized [, size [, type [, zeroPadding [, zeroPhase ] ] ] ] ] ) → {object}

Description

This algorithm applies windowing to an audio signal. It optionally applies zero-phase windowing and optionally adds zero-padding. The resulting windowed frame size is equal to the incoming frame size plus the number of padded zeros. By default, the available windows are normalized (to have an area of 1) and then scaled by a factor of 2. Check https://essentia.upf.edu/reference/std_Windowing.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`frame`	VectorFloat			the input audio frame
`normalized`	boolean	<optional>	true	a boolean value to specify whether to normalize windows (to have an area of 1) and then scale by a factor of 2
`size`	number	<optional>	1024	the window size
`type`	string	<optional>	hann	the window type, which can be 'hamming', 'hann', 'triangular', 'square' or 'blackmanharrisXX'
`zeroPadding`	number	<optional>	0	the size of the zero-padding
`zeroPhase`	boolean	<optional>	true	a boolean value that enables zero-phase windowing

Returns

Details

ZeroCrossingRate( signal [, threshold ] ) → {object}

Description

This algorithm computes the zero-crossing rate of an audio signal. It is the number of sign changes between consecutive signal values divided by the total number of values. Noisy signals tend to have higher zero-crossing rate. In order to avoid small variations around zero caused by noise, a threshold around zero is given to consider a valid zerocrosing whenever the boundary is crossed. Check https://essentia.upf.edu/reference/std_ZeroCrossingRate.html for more details.

Parameters

Name	Type	Attributes	Default	Description
`signal`	VectorFloat			the input signal
`threshold`	number	<optional>	0	the threshold which will be taken as the zero axis in both positive and negative sign

EssentiaExtractor

new EssentiaExtractor( EssentiaWASM [, isDebug ] )

Description

Parameters

Details

Methods

<async> getAudioBufferFromURL( audioURL, webAudioCtx ) → {AudioBuffer}

Description

Parameters

Returns

Details

<async> getAudioChannelDataFromURL( audioURL, webAudioCtx [, channel ] ) → {Float32Array}

Description

Parameters

Returns

Details

melSpectrumExtractor( audioFrame, sampleRate [, asVector [, config ] ] ) → {Array}

Description

Parameters

Returns

Details

audioBufferToMonoSignal( buffer ) → {Float32Array}

Description

Parameters

Returns

Details

shutdown()

Description

Details

hpcpExtractor( audioFrame, sampleRate [, asVector [, config ] ] ) → {Array}

Description

Parameters

Returns

Details

reinstantiate()

Description

Details

"delete"()

Description

Details

arrayToVector( inputArray ) → {VectorFloat}

Description

Parameters

Returns

Details

vectorToArray( inputVector ) → {Float32Array}

Description

Parameters

Returns

Details

FrameGenerator( inputAudioData [, frameSize [, hopSize ] ] ) → {VectorVectorFloat}

Description

Parameters

Returns

Details

MonoMixer( leftChannel, rightChannel ) → {object}

Description

Parameters

Returns

Details

LoudnessEBUR128( leftChannel, rightChannel [, hopSize [, sampleRate [, startAtZero ] ] ] ) → {object}

Description

Parameters

Returns

Details

AfterMaxToBeforeMaxEnergyRatio( pitch ) → {object}

Description

Parameters

Returns

Details

AllPass( signal [, bandwidth [, cutoffFrequency [, order [, sampleRate ] ] ] ] ) → {object}

Description

Parameters

Returns

Details

AudioOnsetsMarker( signal [, onsets [, sampleRate [, type ] ] ] ) → {object}

Description

Parameters

Returns

Details