Methods
-
<async> getAudioBufferFromURL( audioURL, webAudioCtx ) → {AudioBuffer}
-
Description
Decode and returns the audio buffer of a given audio url or blob uri using Web Audio API. (NOTE: This method doesn't works on Safari browser)
Parameters
Name Type Description audioURL
string web url or blob uri of a audio file
webAudioCtx
AudioContext an instance of Web Audio API
AudioContext
Returns
Details
-
<async> getAudioChannelDataFromURL( audioURL, webAudioCtx [, channel ] ) → {Float32Array}
-
Description
Decode and returns the audio channel data from an given audio url or blob uri using Web Audio API. (NOTE: This method doesn't works on Safari browser)
Parameters
Name Type Attributes Default Description audioURL
string web url or blob uri of a audio file
webAudioCtx
AudioContext an instance of Web Audio API
AudioContext
channel
number <optional> 0 audio channel number
Returns
Details
-
melSpectrumExtractor( audioFrame, sampleRate [, asVector [, config ] ] ) → {Array}
-
Description
Compute log-scaled mel spectrogram for a given audio signal frame along with an optional extractor profile configuration
Parameters
Name Type Attributes Default Description audioFrame
Float32Array a frame of decoded audio signal as Float32 typed array.
sampleRate
number Sample rate of the input audio signal.
asVector
boolean <optional> false whether to output the spectrogram as a vector float type for chaining with other essentia algorithms.
config
* <optional> this.profile Returns
Details
-
audioBufferToMonoSignal( buffer ) → {Float32Array}
-
Description
Convert an AudioBuffer object to a Mono audio signal array. The audio signal is downmixed to mono using essentia
MonoMixer
algorithm if the audio buffer has 2 channels of audio. Throws an expection if the input AudioBuffer object has more than 2 channels of audio.Parameters
Name Type Description buffer
AudioBuffer AudioBuffer
object decoded from an audio file.Returns
Details
-
shutdown()
-
Description
Method to shutdown essentia algorithm instance after it's use
Details
-
hpcpExtractor( audioFrame, sampleRate [, asVector [, config ] ] ) → {Array}
-
Description
Compute HPCP chroma feature for a given audio signal frame along with an optional extractor profile configuration
Parameters
Name Type Attributes Default Description audioFrame
Float32Array a decoded audio signal frame as Float32 typed array.
sampleRate
number Sample rate of the input audio signal.
asVector
boolean <optional> false whether to output the hpcpgram as a vector float type for chaining with other essentia algorithms.
config
* <optional> this.profile Returns
Details
-
reinstantiate()
-
Description
Method for re-instantiating essentia algorithms instance after using the shutdown method
Details
-
"delete"()
-
Description
Delete essentiajs class instance
Details
-
arrayToVector( inputArray ) → {VectorFloat}
-
Description
Convert an input JS array into VectorFloat type
Parameters
Name Type Description inputArray
Float32Array input JS typed array
Returns
Details
-
vectorToArray( inputVector ) → {Float32Array}
-
Description
Convert an input VectorFloat array into typed JS Float32Array
Parameters
Name Type Description inputVector
VectorFloat input VectorFloat array
Returns
Details
-
FrameGenerator( inputAudioData [, frameSize [, hopSize ] ] ) → {VectorVectorFloat}
-
Description
Cuts an audio signal data into overlapping frames given frame size and hop size
Parameters
Name Type Attributes Default Description inputAudioData
Float32Array a single channel audio channel data
frameSize
number <optional> 2048 frame size for cutting the audio signal
hopSize
number <optional> 1024 size of overlapping frame
Returns
Details
-
MonoMixer( leftChannel, rightChannel ) → {object}
-
Description
This algorithm downmixes the signal into a single channel given a stereo signal. It is a wrapper around https://essentia.upf.edu/reference/std_MonoMixer.html.
Parameters
Name Type Description leftChannel
VectorFloat the left channel of the stereo audio signal
rightChannel
VectorFloat the right channel of the stereo audio signal
Returns
Details
-
LoudnessEBUR128( leftChannel, rightChannel [, hopSize [, sampleRate [, startAtZero ] ] ] ) → {object}
-
Description
This algorithm computes the EBUR128 loudness descriptors of an audio signal. It is a wrapper around https://essentia.upf.edu/reference/std_LoudnessEBUR128.html.
Parameters
Name Type Attributes Default Description leftChannel
VectorFloat the left channel of the stereo audio signal
rightChannel
VectorFloat the right channel of the stereo audio signal
hopSize
number <optional> 0.1 the hop size with which the loudness is computed [s]
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
startAtZero
boolean <optional> false start momentary/short-term loudness estimation at time 0 (zero-centered loudness estimation windows) if true; otherwise start both windows at time 0 (time positions for momentary and short-term values will not be syncronized)
Returns
Details
-
AfterMaxToBeforeMaxEnergyRatio( pitch ) → {object}
-
Description
This algorithm computes the ratio between the pitch energy after the pitch maximum and the pitch energy before the pitch maximum. Sounds having an monotonically ascending pitch or one unique pitch will show a value of (0,1], while sounds having a monotonically descending pitch will show a value of [1,inf). In case there is no energy before the max pitch, the algorithm will return the energy after the maximum pitch. Check https://essentia.upf.edu/reference/std_AfterMaxToBeforeMaxEnergyRatio.html for more details.
Parameters
Name Type Description pitch
VectorFloat the array of pitch values [Hz]
Returns
Details
-
AllPass( signal [, bandwidth [, cutoffFrequency [, order [, sampleRate ] ] ] ] ) → {object}
-
Description
This algorithm implements a IIR all-pass filter of order 1 or 2. Because of its dependence on IIR, IIR's requirements are inherited. Check https://essentia.upf.edu/reference/std_AllPass.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input signal
bandwidth
number <optional> 500 the bandwidth of the filter [Hz] (used only for 2nd-order filters)
cutoffFrequency
number <optional> 1500 the cutoff frequency for the filter [Hz]
order
number <optional> 1 the order of the filter
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
AudioOnsetsMarker( signal [, onsets [, sampleRate [, type ] ] ] ) → {object}
-
Description
This algorithm creates a wave file in which a given audio signal is mixed with a series of time onsets. The sonification of the onsets can be heard as beeps, or as short white noise pulses if configured to do so. Check https://essentia.upf.edu/reference/std_AudioOnsetsMarker.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input signal
onsets
Array.<any> <optional> [] the list of onset locations [s]
sampleRate
number <optional> 44100 the sampling rate of the output signal [Hz]
type
string <optional> beep the type of sound to be added on the event
Returns
Details
-
AutoCorrelation( array [, frequencyDomainCompression [, generalized [, normalization ] ] ] ) → {object}
-
Description
This algorithm computes the autocorrelation vector of a signal. It uses the version most commonly used in signal processing, which doesn't remove the mean from the observations. Using the 'generalized' option this algorithm computes autocorrelation as described in [3]. Check https://essentia.upf.edu/reference/std_AutoCorrelation.html for more details.
Parameters
Name Type Attributes Default Description array
VectorFloat the array to be analyzed
frequencyDomainCompression
number <optional> 0.5 factor at which FFT magnitude is compressed (only used if 'generalized' is set to true, see [3])
generalized
boolean <optional> false bool value to indicate whether to compute the 'generalized' autocorrelation as described in [3]
normalization
string <optional> standard type of normalization to compute: either 'standard' (default) or 'unbiased'
Returns
Details
-
BFCC( spectrum [, dctType [, highFrequencyBound [, inputSize [, liftering [, logType [, lowFrequencyBound [, normalize [, numberBands [, numberCoefficients [, sampleRate [, type [, weighting ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes the bark-frequency cepstrum coefficients of a spectrum. Bark bands and their subsequent usage in cepstral analysis have shown to be useful in percussive content [1, 2] This algorithm is implemented using the Bark scaling approach in the Rastamat version of the MFCC algorithm and in a similar manner to the MFCC-FB40 default specs: Check https://essentia.upf.edu/reference/std_BFCC.html for more details.
Parameters
Name Type Attributes Default Description spectrum
VectorFloat the audio spectrum
dctType
number <optional> 2 the DCT type
highFrequencyBound
number <optional> 11000 the upper bound of the frequency range [Hz]
inputSize
number <optional> 1025 the size of input spectrum
liftering
number <optional> 0 the liftering coefficient. Use '0' to bypass it
logType
string <optional> dbamp logarithmic compression type. Use 'dbpow' if working with power and 'dbamp' if working with magnitudes
lowFrequencyBound
number <optional> 0 the lower bound of the frequency range [Hz]
normalize
string <optional> unit_sum 'unit_max' makes the vertex of all the triangles equal to 1, 'unit_sum' makes the area of all the triangles equal to 1
numberBands
number <optional> 40 the number of bark bands in the filter
numberCoefficients
number <optional> 13 the number of output cepstrum coefficients
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
type
string <optional> power use magnitude or power spectrum
weighting
string <optional> warping type of weighting function for determining triangle area
Returns
Details
-
BPF( x [, xPoints [, yPoints ] ] ) → {object}
-
Description
This algorithm implements a break point function which linearly interpolates between discrete xy-coordinates to construct a continuous function. Check https://essentia.upf.edu/reference/std_BPF.html for more details.
Parameters
Name Type Attributes Default Description x
number the input coordinate (x-axis)
xPoints
Array.<any> <optional> [0, 1] the x-coordinates of the points forming the break-point function (the points must be arranged in ascending order and cannot contain duplicates)
yPoints
Array.<any> <optional> [0, 1] the y-coordinates of the points forming the break-point function
Returns
Details
-
BandPass( signal [, bandwidth [, cutoffFrequency [, sampleRate ] ] ] ) → {object}
-
Description
This algorithm implements a 2nd order IIR band-pass filter. Because of its dependence on IIR, IIR's requirements are inherited. Check https://essentia.upf.edu/reference/std_BandPass.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input audio signal
bandwidth
number <optional> 500 the bandwidth of the filter [Hz]
cutoffFrequency
number <optional> 1500 the cutoff frequency for the filter [Hz]
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
BandReject( signal [, bandwidth [, cutoffFrequency [, sampleRate ] ] ] ) → {object}
-
Description
This algorithm implements a 2nd order IIR band-reject filter. Because of its dependence on IIR, IIR's requirements are inherited. Check https://essentia.upf.edu/reference/std_BandReject.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input signal
bandwidth
number <optional> 500 the bandwidth of the filter [Hz]
cutoffFrequency
number <optional> 1500 the cutoff frequency for the filter [Hz]
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
BarkBands( spectrum [, numberBands [, sampleRate ] ] ) → {object}
-
Description
This algorithm computes energy in Bark bands of a spectrum. The band frequencies are: [0.0, 50.0, 100.0, 150.0, 200.0, 300.0, 400.0, 510.0, 630.0, 770.0, 920.0, 1080.0, 1270.0, 1480.0, 1720.0, 2000.0, 2320.0, 2700.0, 3150.0, 3700.0, 4400.0, 5300.0, 6400.0, 7700.0, 9500.0, 12000.0, 15500.0, 20500.0, 27000.0]. The first two Bark bands [0,100] and [100,200] have been split in half for better resolution (because of an observed better performance in beat detection). For each bark band the power-spectrum (mag-squared) is summed. Check https://essentia.upf.edu/reference/std_BarkBands.html for more details.
Parameters
Name Type Attributes Default Description spectrum
VectorFloat the input spectrum
numberBands
number <optional> 27 the number of desired barkbands
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
BeatTrackerDegara( signal [, maxTempo [, minTempo ] ] ) → {object}
-
Description
This algorithm estimates the beat positions given an input signal. It computes 'complex spectral difference' onset detection function and utilizes the beat tracking algorithm (TempoTapDegara) to extract beats [1]. The algorithm works with the optimized settings of 2048/1024 frame/hop size for the computation of the detection function, with its posterior x2 resampling.) While it has a lower accuracy than BeatTrackerMultifeature (see the evaluation results in [2]), its computational speed is significantly higher, which makes reasonable to apply this algorithm for batch processings of large amounts of audio signals. Check https://essentia.upf.edu/reference/std_BeatTrackerDegara.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the audio input signal
maxTempo
number <optional> 208 the fastest tempo to detect [bpm]
minTempo
number <optional> 40 the slowest tempo to detect [bpm]
Returns
Details
-
BeatTrackerMultiFeature( signal [, maxTempo [, minTempo ] ] ) → {object}
-
Description
This algorithm estimates the beat positions given an input signal. It computes a number of onset detection functions and estimates beat location candidates from them using TempoTapDegara algorithm. Thereafter the best candidates are selected using TempoTapMaxAgreement. The employed detection functions, and the optimal frame/hop sizes used for their computation are: - complex spectral difference (see 'complex' method in OnsetDetection algorithm, 2048/1024 with posterior x2 upsample or the detection function) - energy flux (see 'rms' method in OnsetDetection algorithm, the same settings) - spectral flux in Mel-frequency bands (see 'melflux' method in OnsetDetection algorithm, the same settings) - beat emphasis function (see 'beat_emphasis' method in OnsetDetectionGlobal algorithm, 2048/512) - spectral flux between histogrammed spectrum frames, measured by the modified information gain (see 'infogain' method in OnsetDetectionGlobal algorithm, 2048/512) Check https://essentia.upf.edu/reference/std_BeatTrackerMultiFeature.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the audio input signal
maxTempo
number <optional> 208 the fastest tempo to detect [bpm]
minTempo
number <optional> 40 the slowest tempo to detect [bpm]
Returns
Details
-
Beatogram( loudness, loudnessBandRatio [, size ] ) → {object}
-
Description
This algorithm filters the loudness matrix given by BeatsLoudness algorithm in order to keep only the most salient beat band representation. This algorithm has been found to be useful for estimating time signatures. Check https://essentia.upf.edu/reference/std_Beatogram.html for more details.
Parameters
Name Type Attributes Default Description loudness
VectorFloat the loudness at each beat
loudnessBandRatio
VectorVectorFloat matrix of loudness ratios at each band and beat
size
number <optional> 16 number of beats for dynamic filtering
Returns
Details
-
BeatsLoudness( signal [, beatDuration [, beatWindowDuration [, beats [, frequencyBands [, sampleRate ] ] ] ] ] ) → {object}
-
Description
This algorithm computes the spectrum energy of beats in an audio signal given their positions. The energy is computed both on the whole frequency range and for each of the specified frequency bands. See the SingleBeatLoudness algorithm for a more detailed explanation. Check https://essentia.upf.edu/reference/std_BeatsLoudness.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input audio signal
beatDuration
number <optional> 0.05 the duration of the window in which the beat will be restricted [s]
beatWindowDuration
number <optional> 0.1 the duration of the window in which to look for the beginning of the beat (centered around the positions in 'beats') [s]
beats
Array.<any> <optional> [] the list of beat positions (each position is in seconds)
frequencyBands
Array.<any> <optional> [20, 150, 400, 3200, 7000, 22000] the list of bands to compute energy ratios [Hz
sampleRate
number <optional> 44100 the audio sampling rate [Hz]
Returns
Details
-
BinaryOperator( array1, array2 [, type ] ) → {object}
-
Description
This algorithm performs basic arithmetical operations element by element given two arrays. Note: - using this algorithm in streaming mode can cause diamond shape graphs which have not been tested with the current scheduler. There is NO GUARANTEE of its correct work for diamond shape graphs. - for y<0, x/y is invalid Check https://essentia.upf.edu/reference/std_BinaryOperator.html for more details.
Parameters
Name Type Attributes Default Description array1
VectorFloat the first operand input array
array2
VectorFloat the second operand input array
type
string <optional> add the type of the binary operator to apply to the input arrays
Returns
Details
-
BinaryOperatorStream( array1, array2 [, type ] ) → {object}
-
Description
This algorithm performs basic arithmetical operations element by element given two arrays. Note: - using this algorithm in streaming mode can cause diamond shape graphs which have not been tested with the current scheduler. There is NO GUARANTEE of its correct work for diamond shape graphs. - for y<0, x/y is invalid Check https://essentia.upf.edu/reference/std_BinaryOperatorStream.html for more details.
Parameters
Name Type Attributes Default Description array1
VectorFloat the first operand input array
array2
VectorFloat the second operand input array
type
string <optional> add the type of the binary operator to apply to the input arrays
Returns
Details
-
BpmHistogramDescriptors( bpmIntervals ) → {object}
-
Description
This algorithm computes beats per minute histogram and its statistics for the highest and second highest peak. Note: histogram vector contains occurance frequency for each bpm value, 0-th element corresponds to 0 bpm value. Check https://essentia.upf.edu/reference/std_BpmHistogramDescriptors.html for more details.
Parameters
Name Type Description bpmIntervals
VectorFloat the list of bpm intervals [s]
Returns
Details
-
BpmRubato( beats [, longRegionsPruningTime [, shortRegionsMergingTime [, tolerance ] ] ] ) → {object}
-
Description
This algorithm extracts the locations of large tempo changes from a list of beat ticks. Check https://essentia.upf.edu/reference/std_BpmRubato.html for more details.
Parameters
Name Type Attributes Default Description beats
VectorFloat list of detected beat ticks [s]
longRegionsPruningTime
number <optional> 20 time for the longest constant tempo region inside a rubato region [s]
shortRegionsMergingTime
number <optional> 4 time for the shortest constant tempo region from one tempo region to another [s]
tolerance
number <optional> 0.08 minimum tempo deviation to look for
Returns
Details
-
CentralMoments( array [, mode [, range ] ] ) → {object}
-
Description
This algorithm extracts the 0th, 1st, 2nd, 3rd and 4th central moments of an array. It returns a 5-tuple in which the index corresponds to the order of the moment. Check https://essentia.upf.edu/reference/std_CentralMoments.html for more details.
Parameters
Name Type Attributes Default Description array
VectorFloat the input array
mode
string <optional> pdf compute central moments considering array values as a probability density function over array index or as sample points of a distribution
range
number <optional> 1 the range of the input array, used for normalizing the results in the 'pdf' mode
Returns
Details
-
Centroid( array [, range ] ) → {object}
-
Description
This algorithm computes the centroid of an array. The centroid is normalized to a specified range. This algorithm can be used to compute spectral centroid or temporal centroid. Check https://essentia.upf.edu/reference/std_Centroid.html for more details.
Parameters
Name Type Attributes Default Description array
VectorFloat the input array
range
number <optional> 1 the range of the input array, used for normalizing the results
Returns
Details
-
ChordsDescriptors( chords, key, scale ) → {object}
-
Description
Given a chord progression this algorithm describes it by means of key, scale, histogram, and rate of change. Note: - chordsHistogram indexes follow the circle of fifths order, while being shifted to the input key and scale - key and scale are taken from the most frequent chord. In the case where multiple chords are equally frequent, the chord is hierarchically chosen from the circle of fifths. - chords should follow this name convention
<A-G>[<#/b><m>]
(i.e. C, C# or C#m are valid chords). Chord names not fitting this convention will throw an exception. Check https://essentia.upf.edu/reference/std_ChordsDescriptors.html for more details.Parameters
Name Type Description chords
VectorString the chord progression
key
string the key of the whole song, from A to G
scale
string the scale of the whole song (major or minor)
Returns
Details
-
ChordsDetection( pcp [, hopSize [, sampleRate [, windowSize ] ] ] ) → {object}
-
Description
This algorithm estimates chords given an input sequence of harmonic pitch class profiles (HPCPs). It finds the best matching major or minor triad and outputs the result as a string (e.g. A#, Bm, G#m, C). This algorithm uses the Sharp versions of each Flatted note (i.e. Bb -> A#). Check https://essentia.upf.edu/reference/std_ChordsDetection.html for more details.
Parameters
Name Type Attributes Default Description pcp
VectorVectorFloat the pitch class profile from which to detect the chord
hopSize
number <optional> 2048 the hop size with which the input PCPs were computed
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
windowSize
number <optional> 2 the size of the window on which to estimate the chords [s]
Returns
Details
-
ChordsDetectionBeats( pcp, ticks [, chromaPick [, hopSize [, sampleRate ] ] ] ) → {object}
-
Description
This algorithm estimates chords using pitch profile classes on segments between beats. It is similar to ChordsDetection algorithm, but the chords are estimated on audio segments between each pair of consecutive beats. For each segment the estimation is done based on a chroma (HPCP) vector characterizing it, which can be computed by two methods: - 'interbeat_median', each resulting chroma vector component is a median of all the component values in the segment - 'starting_beat', chroma vector is sampled from the start of the segment (that is, its starting beat position) using its first frame. It makes sense if chroma is preliminary smoothed. Check https://essentia.upf.edu/reference/std_ChordsDetectionBeats.html for more details.
Parameters
Name Type Attributes Default Description pcp
VectorVectorFloat the pitch class profile from which to detect the chord
ticks
VectorFloat the list of beat positions (in seconds)
chromaPick
string <optional> interbeat_median method of calculating singleton chroma for interbeat interval
hopSize
number <optional> 2048 the hop size with which the input PCPs were computed
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
ChromaCrossSimilarity( queryFeature, referenceFeature [, binarizePercentile [, frameStackSize [, frameStackStride [, noti [, oti [, otiBinary [, streaming ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes a binary cross similarity matrix from two chromagam feature vectors of a query and reference song. Check https://essentia.upf.edu/reference/std_ChromaCrossSimilarity.html for more details.
Parameters
Name Type Attributes Default Description queryFeature
VectorVectorFloat frame-wise chromagram of the query song (e.g., a HPCP)
referenceFeature
VectorVectorFloat frame-wise chromagram of the reference song (e.g., a HPCP)
binarizePercentile
number <optional> 0.095 maximum percent of distance values to consider as similar in each row and each column
frameStackSize
number <optional> 9 number of input frames to stack together and treat as a feature vector for similarity computation. Choose 'frameStackSize=1' to use the original input frames without stacking
frameStackStride
number <optional> 1 stride size to form a stack of frames (e.g., 'frameStackStride'=1 to use consecutive frames; 'frameStackStride'=2 for using every second frame)
noti
number <optional> 12 number of circular shifts to be checked for Optimal Transposition Index [1]
oti
boolean <optional> true whether to transpose the key of the reference song to the query song by Optimal Transposition Index [1]
otiBinary
boolean <optional> false whether to use the OTI-based chroma binary similarity method [3]
streaming
boolean <optional> false whether to accumulate the input 'queryFeature' in the euclidean similarity matrix calculation on each compute() method call
Returns
Details
-
Chromagram( frame [, binsPerOctave [, minFrequency [, minimumKernelSize [, normalizeType [, numberBins [, sampleRate [, scale [, threshold [, windowType [, zeroPhase ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes the Constant-Q chromagram using FFT. See ConstantQ algorithm for more details. Check https://essentia.upf.edu/reference/std_Chromagram.html for more details.
Parameters
Name Type Attributes Default Description frame
VectorFloat the input audio frame
binsPerOctave
number <optional> 12 number of bins per octave
minFrequency
number <optional> 32.7 minimum frequency [Hz]
minimumKernelSize
number <optional> 4 minimum size allowed for frequency kernels
normalizeType
string <optional> unit_max normalize type
numberBins
number <optional> 84 number of frequency bins, starting at minFrequency
sampleRate
number <optional> 44100 FFT sampling rate [Hz]
scale
number <optional> 1 filters scale. Larger values use longer windows
threshold
number <optional> 0.01 bins whose magnitude is below this quantile are discarded
windowType
string <optional> hann the window type, which can be 'hamming', 'hann', 'triangular', 'square' or 'blackmanharrisXX'
zeroPhase
boolean <optional> true a boolean value that enables zero-phase windowing. Input audio frames should be windowed with the same phase mode
Returns
Details
-
ClickDetector( frame [, detectionThreshold [, frameSize [, hopSize [, order [, powerEstimationThreshold [, sampleRate [, silenceThreshold ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm detects the locations of impulsive noises (clicks and pops) on the input audio frame. It relies on LPC coefficients to inverse-filter the audio in order to attenuate the stationary part and enhance the prediction error (or excitation noise)[1]. After this, a matched filter is used to further enhance the impulsive peaks. The detection threshold is obtained from a robust estimate of the excitation noise power [2] plus a parametric gain value. Check https://essentia.upf.edu/reference/std_ClickDetector.html for more details.
Parameters
Name Type Attributes Default Description frame
VectorFloat the input frame (must be non-empty)
detectionThreshold
number <optional> 30 'detectionThreshold' the threshold is based on the instant power of the noisy excitation signal plus detectionThreshold dBs
frameSize
number <optional> 512 the expected size of the input audio signal (this is an optional parameter to optimize memory allocation)
hopSize
number <optional> 256 hop size used for the analysis. This parameter must be set correctly as it cannot be obtained from the input data
order
number <optional> 12 scalar giving the number of LPCs to use
powerEstimationThreshold
number <optional> 10 the noisy excitation is clipped to 'powerEstimationThreshold' times its median.
sampleRate
number <optional> 44100 sample rate used for the analysis
silenceThreshold
number <optional> -50 threshold to skip silent frames
Returns
Details
-
Clipper( signal [, max [, min ] ] ) → {object}
-
Description
This algorithm clips the input signal to fit its values into a specified interval. Check https://essentia.upf.edu/reference/std_Clipper.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input signal
max
number <optional> 1 the maximum value above which the signal will be clipped
min
number <optional> -1 the minimum value below which the signal will be clipped
Returns
Details
-
CoverSongSimilarity( inputArray [, alignmentType [, disExtension [, disOnset [, distanceType ] ] ] ] ) → {object}
-
Description
This algorithm computes a cover song similiarity measure from a binary cross similarity matrix input between two chroma vectors of a query and reference song using various alignment constraints of smith-waterman local-alignment algorithm. Check https://essentia.upf.edu/reference/std_CoverSongSimilarity.html for more details.
Parameters
Name Type Attributes Default Description inputArray
VectorVectorFloat a 2D binary cross-similarity matrix between two audio chroma vectors (query vs reference song) (refer 'ChromaCrossSimilarity' algorithm').
alignmentType
string <optional> serra09 choose either one of the given local-alignment constraints for smith-waterman algorithm as described in [2] or [3] respectively.
disExtension
number <optional> 0.5 penalty for disruption extension
disOnset
number <optional> 0.5 penalty for disruption onset
distanceType
string <optional> asymmetric choose the type of distance. By default the algorithm outputs a asymmetric disctance which is obtained by normalising the maximum score in the alignment score matrix with length of reference song
Returns
Details
-
Crest( array ) → {object}
-
Description
This algorithm computes the crest of an array. The crest is defined as the ratio between the maximum value and the arithmetic mean of an array. Typically it is used on the magnitude spectrum. Check https://essentia.upf.edu/reference/std_Crest.html for more details.
Parameters
Name Type Description array
VectorFloat the input array (cannot contain negative values, and must be non-empty)
Returns
Details
-
CrossCorrelation( arrayX, arrayY [, maxLag [, minLag ] ] ) → {object}
-
Description
This algorithm computes the cross-correlation vector of two signals. It accepts 2 parameters, minLag and maxLag which define the range of the computation of the innerproduct. Check https://essentia.upf.edu/reference/std_CrossCorrelation.html for more details.
Parameters
Name Type Attributes Default Description arrayX
VectorFloat the first input array
arrayY
VectorFloat the second input array
maxLag
number <optional> 1 the maximum lag to be computed between the two vectors
minLag
number <optional> 0 the minimum lag to be computed between the two vectors
Returns
Details
-
CrossSimilarityMatrix( queryFeature, referenceFeature [, binarize [, binarizePercentile [, frameStackSize [, frameStackStride ] ] ] ] ) → {object}
-
Description
This algorithm computes a euclidean cross-similarity matrix of two sequences of frame features. Similarity values can be optionally binarized Check https://essentia.upf.edu/reference/std_CrossSimilarityMatrix.html for more details.
Parameters
Name Type Attributes Default Description queryFeature
VectorVectorFloat input frame features of the query song (e.g., a chromagram)
referenceFeature
VectorVectorFloat input frame features of the reference song (e.g., a chromagram)
binarize
boolean <optional> false whether to binarize the euclidean cross-similarity matrix
binarizePercentile
number <optional> 0.095 maximum percent of distance values to consider as similar in each row and each column
frameStackSize
number <optional> 1 number of input frames to stack together and treat as a feature vector for similarity computation. Choose 'frameStackSize=1' to use the original input frames without stacking
frameStackStride
number <optional> 1 stride size to form a stack of frames (e.g., 'frameStackStride'=1 to use consecutive frames; 'frameStackStride'=2 for using every second frame)
Returns
Details
-
CubicSpline( x [, leftBoundaryFlag [, leftBoundaryValue [, rightBoundaryFlag [, rightBoundaryValue [, xPoints [, yPoints ] ] ] ] ] ] ) → {object}
-
Description
Computes the second derivatives of a piecewise cubic spline. The input value, i.e. the point at which the spline is to be evaluated typically should be between xPoints[0] and xPoints[size-1]. If the value lies outside this range, extrapolation is used. Regarding [left/right] boundary condition flag parameters: - 0: the cubic spline should be a quadratic over the first interval - 1: the first derivative at the [left/right] endpoint should be [left/right]BoundaryFlag - 2: the second derivative at the [left/right] endpoint should be [left/right]BoundaryFlag References: [1] Spline interpolation - Wikipedia, the free encyclopedia, http://en.wikipedia.org/wiki/Spline_interpolation Check https://essentia.upf.edu/reference/std_CubicSpline.html for more details.
Parameters
Name Type Attributes Default Description x
number the input coordinate (x-axis)
leftBoundaryFlag
number <optional> 0 type of boundary condition for the left boundary
leftBoundaryValue
number <optional> 0 the value to be used in the left boundary, when leftBoundaryFlag is 1 or 2
rightBoundaryFlag
number <optional> 0 type of boundary condition for the right boundary
rightBoundaryValue
number <optional> 0 the value to be used in the right boundary, when rightBoundaryFlag is 1 or 2
xPoints
Array.<any> <optional> [0, 1] the x-coordinates where data is specified (the points must be arranged in ascending order and cannot contain duplicates)
yPoints
Array.<any> <optional> [0, 1] the y-coordinates to be interpolated (i.e. the known data)
Returns
Details
-
DCRemoval( signal [, cutoffFrequency [, sampleRate ] ] ) → {object}
-
Description
This algorithm removes the DC offset from a signal using a 1st order IIR highpass filter. Because of its dependence on IIR, IIR's requirements are inherited. Check https://essentia.upf.edu/reference/std_DCRemoval.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input audio signal
cutoffFrequency
number <optional> 40 the cutoff frequency for the filter [Hz]
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
DCT( array [, dctType [, inputSize [, liftering [, outputSize ] ] ] ] ) → {object}
-
Description
This algorithm computes the Discrete Cosine Transform of an array. It uses the DCT-II form, with the 1/sqrt(2) scaling factor for the first coefficient. Check https://essentia.upf.edu/reference/std_DCT.html for more details.
Parameters
Name Type Attributes Default Description array
VectorFloat the input array
dctType
number <optional> 2 the DCT type
inputSize
number <optional> 10 the size of the input array
liftering
number <optional> 0 the liftering coefficient. Use '0' to bypass it
outputSize
number <optional> 10 the number of output coefficients
Returns
Details
-
Danceability( signal [, maxTau [, minTau [, sampleRate [, tauMultiplier ] ] ] ] ) → {object}
-
Description
This algorithm estimates danceability of a given audio signal. The algorithm is derived from Detrended Fluctuation Analysis (DFA) described in [1]. The parameters minTau and maxTau are used to define the range of time over which DFA will be performed. The output of this algorithm is the danceability of the audio signal. These values usually range from 0 to 3 (higher values meaning more danceable). Check https://essentia.upf.edu/reference/std_Danceability.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input signal
maxTau
number <optional> 8800 maximum segment length to consider [ms]
minTau
number <optional> 310 minimum segment length to consider [ms]
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
tauMultiplier
number <optional> 1.1 multiplier to increment from min to max tau
Returns
Details
-
Decrease( array [, range ] ) → {object}
-
Description
This algorithm computes the decrease of an array defined as the linear regression coefficient. The range parameter is used to normalize the result. For a spectral centroid, the range should be equal to Nyquist and for an audio centroid the range should be equal to (audiosize - 1) / samplerate. The size of the input array must be at least two elements for "decrease" to be computed, otherwise an exception is thrown. References: [1] Least Squares Fitting -- from Wolfram MathWorld, http://mathworld.wolfram.com/LeastSquaresFitting.html Check https://essentia.upf.edu/reference/std_Decrease.html for more details.
Parameters
Name Type Attributes Default Description array
VectorFloat the input array
range
number <optional> 1 the range of the input array, used for normalizing the results
Returns
Details
-
Derivative( signal ) → {object}
-
Description
This algorithm returns the first-order derivative of an input signal. That is, for each input value it returns the value minus the previous one. Check https://essentia.upf.edu/reference/std_Derivative.html for more details.
Parameters
Name Type Description signal
VectorFloat the input signal
Returns
Details
-
DerivativeSFX( envelope ) → {object}
-
Description
This algorithm computes two descriptors that are based on the derivative of a signal envelope. Check https://essentia.upf.edu/reference/std_DerivativeSFX.html for more details.
Parameters
Name Type Description envelope
VectorFloat the envelope of the signal
Returns
Details
-
DiscontinuityDetector( frame [, detectionThreshold [, energyThreshold [, frameSize [, hopSize [, kernelSize [, order [, silenceThreshold [, subFrameSize ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm uses LPC and some heuristics to detect discontinuities in an audio signal. [1]. Check https://essentia.upf.edu/reference/std_DiscontinuityDetector.html for more details.
Parameters
Name Type Attributes Default Description frame
VectorFloat the input frame (must be non-empty)
detectionThreshold
number <optional> 8 'detectionThreshold' times the standard deviation plus the median of the frame is used as detection threshold
energyThreshold
number <optional> -60 threshold in dB to detect silent subframes
frameSize
number <optional> 512 the expected size of the input audio signal (this is an optional parameter to optimize memory allocation)
hopSize
number <optional> 256 hop size used for the analysis. This parameter must be set correctly as it cannot be obtained from the input data
kernelSize
number <optional> 7 scalar giving the size of the median filter window. Must be odd
order
number <optional> 3 scalar giving the number of LPCs to use
silenceThreshold
number <optional> -50 threshold to skip silent frames
subFrameSize
number <optional> 32 size of the window used to compute silent subframes
Returns
Details
-
Dissonance( frequencies, magnitudes ) → {object}
-
Description
This algorithm computes the sensory dissonance of an audio signal given its spectral peaks. Sensory dissonance (to be distinguished from musical or theoretical dissonance) measures perceptual roughness of the sound and is based on the roughness of its spectral peaks. Given the spectral peaks, the algorithm estimates total dissonance by summing up the normalized dissonance values for each pair of peaks. These values are computed using dissonance curves, which define dissonace between two spectral peaks according to their frequency and amplitude relations. The dissonance curves are based on perceptual experiments conducted in [1]. Exceptions are thrown when the size of the input vectors are not equal or if input frequencies are not ordered ascendantly References: [1] R. Plomp and W. J. M. Levelt, "Tonal Consonance and Critical Bandwidth," The Journal of the Acoustical Society of America, vol. 38, no. 4, pp. 548–560, 1965. Check https://essentia.upf.edu/reference/std_Dissonance.html for more details.
Parameters
Name Type Description frequencies
VectorFloat the frequencies of the spectral peaks (must be sorted by frequency)
magnitudes
VectorFloat the magnitudes of the spectral peaks (must be sorted by frequency
Returns
Details
-
DistributionShape( centralMoments ) → {object}
-
Description
This algorithm computes the spread (variance), skewness and kurtosis of an array given its central moments. The extracted features are good indicators of the shape of the distribution. For the required input see CentralMoments algorithm. The size of the input array must be at least 5. An exception will be thrown otherwise. Check https://essentia.upf.edu/reference/std_DistributionShape.html for more details.
Parameters
Name Type Description centralMoments
VectorFloat the central moments of a distribution
Returns
Details
-
Duration( signal [, sampleRate ] ) → {object}
-
Description
This algorithm outputs the total duration of an audio signal. Check https://essentia.upf.edu/reference/std_Duration.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input signal
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
DynamicComplexity( signal [, frameSize [, sampleRate ] ] ) → {object}
-
Description
This algorithm computes the dynamic complexity defined as the average absolute deviation from the global loudness level estimate on the dB scale. It is related to the dynamic range and to the amount of fluctuation in loudness present in a recording. Silence at the beginning and at the end of a track are ignored in the computation in order not to deteriorate the results. Check https://essentia.upf.edu/reference/std_DynamicComplexity.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input audio signal
frameSize
number <optional> 0.2 the frame size [s]
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
ERBBands( spectrum [, highFrequencyBound [, inputSize [, lowFrequencyBound [, numberBands [, sampleRate [, type [, width ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes energies/magnitudes in ERB bands of a spectrum. The Equivalent Rectangular Bandwidth (ERB) scale is used. The algorithm applies a frequency domain filterbank using gammatone filters. Adapted from matlab code in: D. P. W. Ellis (2009). 'Gammatone-like spectrograms', web resource [1]. Check https://essentia.upf.edu/reference/std_ERBBands.html for more details.
Parameters
Name Type Attributes Default Description spectrum
VectorFloat the audio spectrum
highFrequencyBound
number <optional> 22050 an upper-bound limit for the frequencies to be included in the bands
inputSize
number <optional> 1025 the size of the spectrum
lowFrequencyBound
number <optional> 50 a lower-bound limit for the frequencies to be included in the bands
numberBands
number <optional> 40 the number of output bands
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
type
string <optional> power use magnitude or power spectrum
width
number <optional> 1 filter width with respect to ERB
Returns
Details
-
EffectiveDuration( signal [, sampleRate [, thresholdRatio ] ] ) → {object}
-
Description
This algorithm computes the effective duration of an envelope signal. The effective duration is a measure of the time the signal is perceptually meaningful. This is approximated by the time the envelope is above or equal to a given threshold and is above the -90db noise floor. This measure allows to distinguish percussive sounds from sustained sounds but depends on the signal length. By default, this algorithm uses 40% of the envelope maximum as the threshold which is suited for short sounds. Note, that the 0% thresold corresponds to the duration of signal above -90db noise floor, while the 100% thresold corresponds to the number of times the envelope takes its maximum value. References: [1] G. Peeters, "A large set of audio features for sound description (similarity and classification) in the CUIDADO project," CUIDADO I.S.T. Project Report, 2004 Check https://essentia.upf.edu/reference/std_EffectiveDuration.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input signal
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
thresholdRatio
number <optional> 0.4 the ratio of the envelope maximum to be used as the threshold
Returns
Details
-
Energy( array ) → {object}
-
Description
This algorithm computes the energy of an array. Check https://essentia.upf.edu/reference/std_Energy.html for more details.
Parameters
Name Type Description array
VectorFloat the input array
Returns
Details
-
EnergyBand( spectrum [, sampleRate [, startCutoffFrequency [, stopCutoffFrequency ] ] ] ) → {object}
-
Description
This algorithm computes energy in a given frequency band of a spectrum including both start and stop cutoff frequencies. Note that exceptions will be thrown when input spectrum is empty and if startCutoffFrequency is greater than stopCutoffFrequency. Check https://essentia.upf.edu/reference/std_EnergyBand.html for more details.
Parameters
Name Type Attributes Default Description spectrum
VectorFloat the input frequency spectrum
sampleRate
number <optional> 44100 the audio sampling rate [Hz]
startCutoffFrequency
number <optional> 0 the start frequency from which to sum the energy [Hz]
stopCutoffFrequency
number <optional> 100 the stop frequency to which to sum the energy [Hz]
Returns
Details
-
EnergyBandRatio( spectrum [, sampleRate [, startFrequency [, stopFrequency ] ] ] ) → {object}
-
Description
This algorithm computes the ratio of the spectral energy in the range [startFrequency, stopFrequency] over the total energy. Check https://essentia.upf.edu/reference/std_EnergyBandRatio.html for more details.
Parameters
Name Type Attributes Default Description spectrum
VectorFloat the input audio spectrum
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
startFrequency
number <optional> 0 the frequency from which to start summing the energy [Hz]
stopFrequency
number <optional> 100 the frequency up to which to sum the energy [Hz]
Returns
Details
-
Entropy( array ) → {object}
-
Description
This algorithm computes the Shannon entropy of an array. Entropy can be used to quantify the peakiness of a distribution. This has been used for voiced/unvoiced decision in automatic speech recognition. Check https://essentia.upf.edu/reference/std_Entropy.html for more details.
Parameters
Name Type Description array
VectorFloat the input array (cannot contain negative values, and must be non-empty)
Returns
Details
-
Envelope( signal [, applyRectification [, attackTime [, releaseTime [, sampleRate ] ] ] ] ) → {object}
-
Description
This algorithm computes the envelope of a signal by applying a non-symmetric lowpass filter on a signal. By default it rectifies the signal, but that is optional. Check https://essentia.upf.edu/reference/std_Envelope.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input signal
applyRectification
boolean <optional> true whether to apply rectification (envelope based on the absolute value of signal)
attackTime
number <optional> 10 the attack time of the first order lowpass in the attack phase [ms]
releaseTime
number <optional> 1500 the release time of the first order lowpass in the release phase [ms]
sampleRate
number <optional> 44100 the audio sampling rate [Hz]
Returns
Details
-
EqualLoudness( signal [, sampleRate ] ) → {object}
-
Description
This algorithm implements an equal-loudness filter. The human ear does not perceive sounds of all frequencies as having equal loudness, and to account for this, the signal is filtered by an inverted approximation of the equal-loudness curves. Technically, the filter is a cascade of a 10th order Yulewalk filter with a 2nd order Butterworth high pass filter. Check https://essentia.upf.edu/reference/std_EqualLoudness.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input signal
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
Flatness( array ) → {object}
-
Description
This algorithm computes the flatness of an array, which is defined as the ratio between the geometric mean and the arithmetic mean. Check https://essentia.upf.edu/reference/std_Flatness.html for more details.
Parameters
Name Type Description array
VectorFloat the input array
Returns
Details
-
FlatnessDB( array ) → {object}
-
Description
This algorithm computes the flatness of an array, which is defined as the ratio between the geometric mean and the arithmetic mean converted to dB scale. Check https://essentia.upf.edu/reference/std_FlatnessDB.html for more details.
Parameters
Name Type Description array
VectorFloat the input array
Returns
Details
-
FlatnessSFX( envelope ) → {object}
-
Description
This algorithm calculates the flatness coefficient of a signal envelope. Check https://essentia.upf.edu/reference/std_FlatnessSFX.html for more details.
Parameters
Name Type Description envelope
VectorFloat the envelope of the signal
Returns
Details
-
Flux( spectrum [, halfRectify [, norm ] ] ) → {object}
-
Description
This algorithm computes the spectral flux of a spectrum. Flux is defined as the L2-norm [1] or L1-norm [2] of the difference between two consecutive frames of the magnitude spectrum. The frames have to be of the same size in order to yield a meaningful result. The default L2-norm is used more commonly. Check https://essentia.upf.edu/reference/std_Flux.html for more details.
Parameters
Name Type Attributes Default Description spectrum
VectorFloat the input spectrum
halfRectify
boolean <optional> false half-rectify the differences in each spectrum bin
norm
string <optional> L2 the norm to use for difference computation
Returns
Details
-
FrameCutter( signal [, frameSize [, hopSize [, lastFrameToEndOfFile [, startFromZero [, validFrameThresholdRatio ] ] ] ] ] ) → {object}
-
Description
This algorithm slices the input buffer into frames. It returns a frame of a constant size and jumps a constant amount of samples forward in the buffer on every compute() call until no more frames can be extracted; empty frame vectors are returned afterwards. Incomplete frames (frames starting before the beginning of the input buffer or going past its end) are zero-padded or dropped according to the "validFrameThresholdRatio" parameter. Check https://essentia.upf.edu/reference/std_FrameCutter.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the buffer from which to read data
frameSize
number <optional> 1024 the output frame size
hopSize
number <optional> 512 the hop size between frames
lastFrameToEndOfFile
boolean <optional> false whether the beginning of the last frame should reach the end of file. Only applicable if startFromZero is true
startFromZero
boolean <optional> false whether to start the first frame at time 0 (centered at frameSize/2) if true, or -frameSize/2 otherwise (zero-centered)
validFrameThresholdRatio
number <optional> 0 frames smaller than this ratio will be discarded, those larger will be zero-padded to a full frame (i.e. a value of 0 will never discard frames and a value of 1 will only keep frames that are of length 'frameSize')
Returns
Details
-
FrameToReal( signal [, frameSize [, hopSize ] ] ) → {object}
-
Description
This algorithm converts a sequence of input audio signal frames into a sequence of audio samples. Check https://essentia.upf.edu/reference/std_FrameToReal.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input audio frame
frameSize
number <optional> 2048 the frame size for computing the overlap-add process
hopSize
number <optional> 128 the hop size with which the overlap-add function is computed
Returns
Details
-
FrequencyBands( spectrum [, frequencyBands [, sampleRate ] ] ) → {object}
-
Description
This algorithm computes energy in rectangular frequency bands of a spectrum. The bands are non-overlapping. For each band the power-spectrum (mag-squared) is summed. Check https://essentia.upf.edu/reference/std_FrequencyBands.html for more details.
Parameters
Name Type Attributes Default Description spectrum
VectorFloat the input spectrum (must be greater than size one)
frequencyBands
Array.<any> <optional> [0, 50, 100, 150, 200, 300, 400, 510, 630, 770, 920, 1080, 1270, 1480, 1720, 2000, 2320, 2700, 3150, 3700, 4400, 5300, 6400, 7700, 9500, 12000, 15500, 20500, 27000] list of frequency ranges in to which the spectrum is divided (these must be in ascending order and connot contain duplicates)
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
GFCC( spectrum [, dctType [, highFrequencyBound [, inputSize [, logType [, lowFrequencyBound [, numberBands [, numberCoefficients [, sampleRate [, silenceThreshold [, type ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes the Gammatone-frequency cepstral coefficients of a spectrum. This is an equivalent of MFCCs, but using a gammatone filterbank (ERBBands) scaled on an Equivalent Rectangular Bandwidth (ERB) scale. Check https://essentia.upf.edu/reference/std_GFCC.html for more details.
Parameters
Name Type Attributes Default Description spectrum
VectorFloat the audio spectrum
dctType
number <optional> 2 the DCT type
highFrequencyBound
number <optional> 22050 the upper bound of the frequency range [Hz]
inputSize
number <optional> 1025 the size of input spectrum
logType
string <optional> dbamp logarithmic compression type. Use 'dbpow' if working with power and 'dbamp' if working with magnitudes
lowFrequencyBound
number <optional> 40 the lower bound of the frequency range [Hz]
numberBands
number <optional> 40 the number of bands in the filter
numberCoefficients
number <optional> 13 the number of output cepstrum coefficients
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
silenceThreshold
number <optional> 1e-10 silence threshold for computing log-energy bands
type
string <optional> power use magnitude or power spectrum
Returns
Details
-
GapsDetector( frame [, attackTime [, frameSize [, hopSize [, kernelSize [, maximumTime [, minimumTime [, postpowerTime [, prepowerThreshold [, prepowerTime [, releaseTime [, sampleRate [, silenceThreshold ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm uses energy and time thresholds to detect gaps in the waveform. A median filter is used to remove spurious silent samples. The power of a small audio region before the detected gaps (prepower) is thresholded to detect intentional pauses as described in [1]. This technique isextended to the region after the gap. The algorithm was designed for a framewise use and returns the start and end timestamps related to the first frame processed. Call configure() or reset() in order to restart the count. Check https://essentia.upf.edu/reference/std_GapsDetector.html for more details.
Parameters
Name Type Attributes Default Description frame
VectorFloat the input frame (must be non-empty)
attackTime
number <optional> 0.05 the attack time of the first order lowpass in the attack phase [ms]
frameSize
number <optional> 2048 frame size used for the analysis. Should match the input frame size. Otherwise, an exception will be thrown
hopSize
number <optional> 1024 hop size used for the analysis
kernelSize
number <optional> 11 scalar giving the size of the median filter window. Must be odd
maximumTime
number <optional> 3500 time of the maximum gap duration [ms]
minimumTime
number <optional> 10 time of the minimum gap duration [ms]
postpowerTime
number <optional> 40 time for the postpower calculation [ms]
prepowerThreshold
number <optional> -30 prepower threshold [dB].
prepowerTime
number <optional> 40 time for the prepower calculation [ms]
releaseTime
number <optional> 0.05 the release time of the first order lowpass in the release phase [ms]
sampleRate
number <optional> 44100 sample rate used for the analysis
silenceThreshold
number <optional> -50 silence threshold [dB]
Returns
Details
-
GeometricMean( array ) → {object}
-
Description
This algorithm computes the geometric mean of an array of positive values. Check https://essentia.upf.edu/reference/std_GeometricMean.html for more details.
Parameters
Name Type Description array
VectorFloat the input array
Returns
Details
-
HFC( spectrum [, sampleRate [, type ] ] ) → {object}
-
Description
This algorithm computes the High Frequency Content of a spectrum. It can be computed according to the following techniques: - 'Masri' (default) which does: sum |X(n)|^2*k, - 'Jensen' which does: sum |X(n)|*k^2 - 'Brossier' which does: sum |X(n)|*k Check https://essentia.upf.edu/reference/std_HFC.html for more details.
Parameters
Name Type Attributes Default Description spectrum
VectorFloat the input audio spectrum
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
type
string <optional> Masri the type of HFC coefficient to be computed
Returns
Details
-
HPCP( frequencies, magnitudes [, bandPreset [, bandSplitFrequency [, harmonics [, maxFrequency [, maxShifted [, minFrequency [, nonLinear [, normalized [, referenceFrequency [, sampleRate [, size [, weightType [, windowSize ] ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
Computes a Harmonic Pitch Class Profile (HPCP) from the spectral peaks of a signal. HPCP is a k*12 dimensional vector which represents the intensities of the twelve (k==1) semitone pitch classes (corresponsing to notes from A to G#), or subdivisions of these (k>1). Check https://essentia.upf.edu/reference/std_HPCP.html for more details.
Parameters
Name Type Attributes Default Description frequencies
VectorFloat the frequencies of the spectral peaks [Hz]
magnitudes
VectorFloat the magnitudes of the spectral peaks
bandPreset
boolean <optional> true enables whether to use a band preset
bandSplitFrequency
number <optional> 500 the split frequency for low and high bands, not used if bandPreset is false [Hz]
harmonics
number <optional> 0 number of harmonics for frequency contribution, 0 indicates exclusive fundamental frequency contribution
maxFrequency
number <optional> 5000 the maximum frequency that contributes to the HPCP [Hz] (the difference between the max and split frequencies must not be less than 200.0 Hz)
maxShifted
boolean <optional> false whether to shift the HPCP vector so that the maximum peak is at index 0
minFrequency
number <optional> 40 the minimum frequency that contributes to the HPCP [Hz] (the difference between the min and split frequencies must not be less than 200.0 Hz)
nonLinear
boolean <optional> false apply non-linear post-processing to the output (use with normalized='unitMax'). Boosts values close to 1, decreases values close to 0.
normalized
string <optional> unitMax whether to normalize the HPCP vector
referenceFrequency
number <optional> 440 the reference frequency for semitone index calculation, corresponding to A3 [Hz]
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
size
number <optional> 12 the size of the output HPCP (must be a positive nonzero multiple of 12)
weightType
string <optional> squaredCosine type of weighting function for determining frequency contribution
windowSize
number <optional> 1 the size, in semitones, of the window used for the weighting
Returns
Details
-
HarmonicBpm( bpms [, bpm [, threshold [, tolerance ] ] ] ) → {object}
-
Description
This algorithm extracts bpms that are harmonically related to the tempo given by the 'bpm' parameter. The algorithm assumes a certain bpm is harmonically related to parameter bpm, when the greatest common divisor between both bpms is greater than threshold. The 'tolerance' parameter is needed in order to consider if two bpms are related. For instance, 120, 122 and 236 may be related or not depending on how much tolerance is given Check https://essentia.upf.edu/reference/std_HarmonicBpm.html for more details.
Parameters
Name Type Attributes Default Description bpms
VectorFloat list of bpm candidates
bpm
number <optional> 60 the bpm used to find its harmonics
threshold
number <optional> 20 bpm threshold below which greatest common divisors are discarded
tolerance
number <optional> 5 percentage tolerance to consider two bpms are equal or equal to a harmonic
Returns
Details
-
HarmonicPeaks( frequencies, magnitudes, pitch [, maxHarmonics [, tolerance ] ] ) → {object}
-
Description
This algorithm finds the harmonic peaks of a signal given its spectral peaks and its fundamental frequency. Note: - "tolerance" parameter defines the allowed fixed deviation from ideal harmonics, being a percentage over the F0. For example: if the F0 is 100Hz you may decide to allow a deviation of 20%, that is a fixed deviation of 20Hz; for the harmonic series it is: [180-220], [280-320], [380-420], etc. - If "pitch" is zero, it means its value is unknown, or the sound is unpitched, and in that case the HarmonicPeaks algorithm returns an empty vector. - The output frequency and magnitude vectors are of size "maxHarmonics". If a particular harmonic was not found among spectral peaks, its ideal frequency value is output together with 0 magnitude. This algorithm is intended to receive its "frequencies" and "magnitudes" inputs from the SpectralPeaks algorithm. - When input vectors differ in size or are empty, an exception is thrown. Input vectors must be ordered by ascending frequency excluding DC components and not contain duplicates, otherwise an exception is thrown. Check https://essentia.upf.edu/reference/std_HarmonicPeaks.html for more details.
Parameters
Name Type Attributes Default Description frequencies
VectorFloat the frequencies of the spectral peaks [Hz] (ascending order)
magnitudes
VectorFloat the magnitudes of the spectral peaks (ascending frequency order)
pitch
number an estimate of the fundamental frequency of the signal [Hz]
maxHarmonics
number <optional> 20 the number of harmonics to return including F0
tolerance
number <optional> 0.2 the allowed ratio deviation from ideal harmonics
Returns
Details
-
HighPass( signal [, cutoffFrequency [, sampleRate ] ] ) → {object}
-
Description
This algorithm implements a 1st order IIR high-pass filter. Because of its dependence on IIR, IIR's requirements are inherited. Check https://essentia.upf.edu/reference/std_HighPass.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input audio signal
cutoffFrequency
number <optional> 1500 the cutoff frequency for the filter [Hz]
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
HighResolutionFeatures( hpcp [, maxPeaks ] ) → {object}
-
Description
This algorithm computes high-resolution chroma features from an HPCP vector. The vector's size must be a multiple of 12 and it is recommended that it be larger than 120. In otherwords, the HPCP's resolution should be 10 Cents or more. The high-resolution features being computed are: Check https://essentia.upf.edu/reference/std_HighResolutionFeatures.html for more details.
Parameters
Name Type Attributes Default Description hpcp
VectorFloat the HPCPs, preferably of size >= 120
maxPeaks
number <optional> 24 maximum number of HPCP peaks to consider when calculating outputs
Returns
Details
-
Histogram( array [, maxValue [, minValue [, normalize [, numberBins ] ] ] ] ) → {object}
-
Description
This algorithm computes a histogram. Values outside the range are ignored Check https://essentia.upf.edu/reference/std_Histogram.html for more details.
Parameters
Name Type Attributes Default Description array
VectorFloat the input array
maxValue
number <optional> 1 the max value of the histogram
minValue
number <optional> 0 the min value of the histogram
normalize
string <optional> none the normalization setting.
numberBins
number <optional> 10 the number of bins
Returns
Details
-
HprModelAnal( frame, pitch [, fftSize [, freqDevOffset [, freqDevSlope [, harmDevSlope [, hopSize [, magnitudeThreshold [, maxFrequency [, maxPeaks [, maxnSines [, minFrequency [, nHarmonics [, orderBy [, sampleRate [, stocf ] ] ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes the harmonic plus residual model analysis. Check https://essentia.upf.edu/reference/std_HprModelAnal.html for more details.
Parameters
Name Type Attributes Default Description frame
VectorFloat the input frame
pitch
number external pitch input [Hz].
fftSize
number <optional> 2048 the size of the internal FFT size (full spectrum size)
freqDevOffset
number <optional> 20 minimum frequency deviation at 0Hz
freqDevSlope
number <optional> 0.01 slope increase of minimum frequency deviation
harmDevSlope
number <optional> 0.01 slope increase of minimum frequency deviation
hopSize
number <optional> 512 the hop size between frames
magnitudeThreshold
number <optional> 0 peaks below this given threshold are not outputted
maxFrequency
number <optional> 5000 the maximum frequency of the range to evaluate [Hz]
maxPeaks
number <optional> 100 the maximum number of returned peaks
maxnSines
number <optional> 100 maximum number of sines per frame
minFrequency
number <optional> 20 the minimum frequency of the range to evaluate [Hz]
nHarmonics
number <optional> 100 maximum number of harmonics per frame
orderBy
string <optional> frequency the ordering type of the outputted peaks (ascending by frequency or descending by magnitude)
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
stocf
number <optional> 0.2 decimation factor used for the stochastic approximation
Returns
Details
-
HpsModelAnal( frame, pitch [, fftSize [, freqDevOffset [, freqDevSlope [, harmDevSlope [, hopSize [, magnitudeThreshold [, maxFrequency [, maxPeaks [, maxnSines [, minFrequency [, nHarmonics [, orderBy [, sampleRate [, stocf ] ] ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes the harmonic plus stochastic model analysis. Check https://essentia.upf.edu/reference/std_HpsModelAnal.html for more details.
Parameters
Name Type Attributes Default Description frame
VectorFloat the input frame
pitch
number external pitch input [Hz].
fftSize
number <optional> 2048 the size of the internal FFT size (full spectrum size)
freqDevOffset
number <optional> 20 minimum frequency deviation at 0Hz
freqDevSlope
number <optional> 0.01 slope increase of minimum frequency deviation
harmDevSlope
number <optional> 0.01 slope increase of minimum frequency deviation
hopSize
number <optional> 512 the hop size between frames
magnitudeThreshold
number <optional> 0 peaks below this given threshold are not outputted
maxFrequency
number <optional> 5000 the maximum frequency of the range to evaluate [Hz]
maxPeaks
number <optional> 100 the maximum number of returned peaks
maxnSines
number <optional> 100 maximum number of sines per frame
minFrequency
number <optional> 20 the minimum frequency of the range to evaluate [Hz]
nHarmonics
number <optional> 100 maximum number of harmonics per frame
orderBy
string <optional> frequency the ordering type of the outputted peaks (ascending by frequency or descending by magnitude)
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
stocf
number <optional> 0.2 decimation factor used for the stochastic approximation
Returns
Details
-
IDCT( dct [, dctType [, inputSize [, liftering [, outputSize ] ] ] ] ) → {object}
-
Description
This algorithm computes the Inverse Discrete Cosine Transform of an array. It can be configured to perform the inverse DCT-II form, with the 1/sqrt(2) scaling factor for the first coefficient or the inverse DCT-III form based on the HTK implementation. Check https://essentia.upf.edu/reference/std_IDCT.html for more details.
Parameters
Name Type Attributes Default Description dct
VectorFloat the discrete cosine transform
dctType
number <optional> 2 the DCT type
inputSize
number <optional> 10 the size of the input array
liftering
number <optional> 0 the liftering coefficient. Use '0' to bypass it
outputSize
number <optional> 10 the number of output coefficients
Returns
Details
-
IIR( signal [, denominator [, numerator ] ] ) → {object}
-
Description
This algorithm implements a standard IIR filter. It filters the data in the input vector with the filter described by parameter vectors 'numerator' and 'denominator' to create the output filtered vector. In the litterature, the numerator is often referred to as the 'B' coefficients and the denominator as the 'A' coefficients. Check https://essentia.upf.edu/reference/std_IIR.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input signal
denominator
Array.<any> <optional> [1] the list of coefficients of the denominator. Often referred to as the A coefficient vector.
numerator
Array.<any> <optional> [1] the list of coefficients of the numerator. Often referred to as the B coefficient vector.
Returns
Details
-
Inharmonicity( frequencies, magnitudes ) → {object}
-
Description
This algorithm calculates the inharmonicity of a signal given its spectral peaks. The inharmonicity value is computed as an energy weighted divergence of the spectral components from their closest multiple of the fundamental frequency. The fundamental frequency is taken as the first spectral peak from the input. The inharmonicity value ranges from 0 (purely harmonic signal) to 1 (inharmonic signal). Check https://essentia.upf.edu/reference/std_Inharmonicity.html for more details.
Parameters
Name Type Description frequencies
VectorFloat the frequencies of the harmonic peaks [Hz] (in ascending order)
magnitudes
VectorFloat the magnitudes of the harmonic peaks (in frequency ascending order
Returns
Details
-
InstantPower( array ) → {object}
-
Description
This algorithm computes the instant power of an array. That is, the energy of the array over its size. Check https://essentia.upf.edu/reference/std_InstantPower.html for more details.
Parameters
Name Type Description array
VectorFloat the input array
Returns
Details
-
Intensity( signal [, sampleRate ] ) → {object}
-
Description
This algorithm classifies the input audio signal as either relaxed (-1), moderate (0), or aggressive (1). Check https://essentia.upf.edu/reference/std_Intensity.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input audio signal
sampleRate
number <optional> 44100 the input audio sampling rate [Hz]
Returns
Details
-
Key( pcp [, numHarmonics [, pcpSize [, profileType [, slope [, useMajMin [, usePolyphony [, useThreeChords ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes key estimate given a pitch class profile (HPCP). The algorithm was severely adapted and changed from the original implementation for readability and speed. Check https://essentia.upf.edu/reference/std_Key.html for more details.
Parameters
Name Type Attributes Default Description pcp
VectorFloat the input pitch class profile
numHarmonics
number <optional> 4 number of harmonics that should contribute to the polyphonic profile (1 only considers the fundamental harmonic)
pcpSize
number <optional> 36 number of array elements used to represent a semitone times 12 (this parameter is only a hint, during computation, the size of the input PCP is used instead)
profileType
string <optional> bgate the type of polyphic profile to use for correlation calculation
slope
number <optional> 0.6 value of the slope of the exponential harmonic contribution to the polyphonic profile
useMajMin
boolean <optional> false use a third profile called 'majmin' for ambiguous tracks [4]. Only avalable for the edma, bgate and braw profiles
usePolyphony
boolean <optional> true enables the use of polyphonic profiles to define key profiles (this includes the contributions from triads as well as pitch harmonics)
useThreeChords
boolean <optional> true consider only the 3 main triad chords of the key (T, D, SD) to build the polyphonic profiles
Returns
Details
-
KeyExtractor( audio [, averageDetuningCorrection [, frameSize [, hopSize [, hpcpSize [, maxFrequency [, maximumSpectralPeaks [, minFrequency [, pcpThreshold [, profileType [, sampleRate [, spectralPeaksThreshold [, tuningFrequency [, weightType [, windowType ] ] ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm extracts key/scale for an audio signal. It computes HPCP frames for the input signal and applies key estimation using the Key algorithm. Check https://essentia.upf.edu/reference/std_KeyExtractor.html for more details.
Parameters
Name Type Attributes Default Description audio
VectorFloat the audio input signal
averageDetuningCorrection
boolean <optional> true shifts a pcp to the nearest tempered bin
frameSize
number <optional> 4096 the framesize for computing tonal features
hopSize
number <optional> 4096 the hopsize for computing tonal features
hpcpSize
number <optional> 12 the size of the output HPCP (must be a positive nonzero multiple of 12)
maxFrequency
number <optional> 3500 max frequency to apply whitening to [Hz]
maximumSpectralPeaks
number <optional> 60 the maximum number of spectral peaks
minFrequency
number <optional> 25 min frequency to apply whitening to [Hz]
pcpThreshold
number <optional> 0.2 pcp bins below this value are set to 0
profileType
string <optional> bgate the type of polyphic profile to use for correlation calculation
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
spectralPeaksThreshold
number <optional> 0.0001 the threshold for the spectral peaks
tuningFrequency
number <optional> 440 the tuning frequency of the input signal
weightType
string <optional> cosine type of weighting function for determining frequency contribution
windowType
string <optional> hann the window type, which can be 'hamming', 'hann', 'triangular', 'square' or 'blackmanharrisXX'
Returns
Details
-
LPC( frame [, order [, sampleRate [, type ] ] ] ) → {object}
-
Description
This algorithm computes Linear Predictive Coefficients and associated reflection coefficients of a signal. Check https://essentia.upf.edu/reference/std_LPC.html for more details.
Parameters
Name Type Attributes Default Description frame
VectorFloat the input audio frame
order
number <optional> 10 the order of the LPC analysis (typically [8,14])
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
type
string <optional> regular the type of LPC (regular or warped)
Returns
Details
-
Larm( signal [, attackTime [, power [, releaseTime [, sampleRate ] ] ] ] ) → {object}
-
Description
This algorithm estimates the long-term loudness of an audio signal. The LARM model is based on the asymmetrical low-pass filtering of the Peak Program Meter (PPM), combined with Revised Low-frequency B-weighting (RLB) and power mean calculations. LARM has shown to be a reliable and objective loudness estimate of music and speech. Check https://essentia.upf.edu/reference/std_Larm.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the audio input signal
attackTime
number <optional> 10 the attack time of the first order lowpass in the attack phase [ms]
power
number <optional> 1.5 the power used for averaging
releaseTime
number <optional> 1500 the release time of the first order lowpass in the release phase [ms]
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
Leq( signal ) → {object}
-
Description
This algorithm computes the Equivalent sound level (Leq) of an audio signal. The Leq measure can be derived from the Revised Low-frequency B-weighting (RLB) or from the raw signal as described in [1]. If the signal contains no energy, Leq defaults to essentias definition of silence which is -90dB. This algorithm will throw an exception on empty input. Check https://essentia.upf.edu/reference/std_Leq.html for more details.
Parameters
Name Type Description signal
VectorFloat the input signal (must be non-empty)
Returns
Details
-
LevelExtractor( signal [, frameSize [, hopSize ] ] ) → {object}
-
Description
This algorithm extracts the loudness of an audio signal in frames using Loudness algorithm. Check https://essentia.upf.edu/reference/std_LevelExtractor.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the audio input signal
frameSize
number <optional> 88200 frame size to compute loudness
hopSize
number <optional> 44100 hop size to compute loudness
Returns
Details
-
LogAttackTime( signal [, sampleRate [, startAttackThreshold [, stopAttackThreshold ] ] ] ) → {object}
-
Description
This algorithm computes the log (base 10) of the attack time of a signal envelope. The attack time is defined as the time duration from when the sound becomes perceptually audible to when it reaches its maximum intensity. By default, the start of the attack is estimated as the point where the signal envelope reaches 20% of its maximum value in order to account for possible noise presence. Also by default, the end of the attack is estimated as as the point where the signal envelope has reached 90% of its maximum value, in order to account for the possibility that the max value occurres after the logAttack, as in trumpet sounds. Check https://essentia.upf.edu/reference/std_LogAttackTime.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input signal envelope (must be non-empty)
sampleRate
number <optional> 44100 the audio sampling rate [Hz]
startAttackThreshold
number <optional> 0.2 the percentage of the input signal envelope at which the starting point of the attack is considered
stopAttackThreshold
number <optional> 0.9 the percentage of the input signal envelope at which the ending point of the attack is considered
Returns
Details
-
LogSpectrum( spectrum [, binsPerSemitone [, frameSize [, rollOn [, sampleRate ] ] ] ] ) → {object}
-
Description
This algorithm computes spectrum with logarithmically distributed frequency bins. This code is ported from NNLS Chroma [1, 2].This algorithm also returns a local tuning that is retrieved for input frame and a global tuning that is updated with a moving average. Check https://essentia.upf.edu/reference/std_LogSpectrum.html for more details.
Parameters
Name Type Attributes Default Description spectrum
VectorFloat spectrum frame
binsPerSemitone
number <optional> 3 bins per semitone
frameSize
number <optional> 1025 the input frame size of the spectrum vector
rollOn
number <optional> 0 this removes low-frequency noise - useful in quiet recordings
sampleRate
number <optional> 44100 the input sample rate
Returns
Details
-
LoopBpmConfidence( signal, bpmEstimate [, sampleRate ] ) → {object}
-
Description
This algorithm takes an audio signal and a BPM estimate for that signal and predicts the reliability of the BPM estimate in a value from 0 to 1. The audio signal is assumed to be a musical loop with constant tempo. The confidence returned is based on comparing the duration of the signal with multiples of the BPM estimate (see [1] for more details). Check https://essentia.upf.edu/reference/std_LoopBpmConfidence.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat loop audio signal
bpmEstimate
number estimated BPM for the audio signal
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
LoopBpmEstimator( signal [, confidenceThreshold ] ) → {object}
-
Description
This algorithm estimates the BPM of audio loops. It internally uses PercivalBpmEstimator algorithm to produce a BPM estimate and LoopBpmConfidence to asses the reliability of the estimate. If the provided estimate is below the given confidenceThreshold, the algorithm outputs a BPM 0.0, otherwise it outputs the estimated BPM. For more details on the BPM estimation method and the confidence measure please check the used algorithms. Check https://essentia.upf.edu/reference/std_LoopBpmEstimator.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input signal
confidenceThreshold
number <optional> 0.95 confidence threshold below which bpm estimate will be considered unreliable
Returns
Details
-
Loudness( signal ) → {object}
-
Description
This algorithm computes the loudness of an audio signal defined by Steven's power law. It computes loudness as the energy of the signal raised to the power of 0.67. Check https://essentia.upf.edu/reference/std_Loudness.html for more details.
Parameters
Name Type Description signal
VectorFloat the input signal
Returns
Details
-
LoudnessVickers( signal [, sampleRate ] ) → {object}
-
Description
This algorithm computes Vickers's loudness of an audio signal. Currently, this algorithm only works for signals with a 44100Hz sampling rate. This algorithm is meant to be given frames of audio as input (not entire audio signals). The algorithm described in the paper performs a weighted average of the loudness value computed for each of the given frames, this step is left as a post processing step and is not performed by this algorithm. Check https://essentia.upf.edu/reference/std_LoudnessVickers.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input signal
sampleRate
number <optional> 44100 the audio sampling rate of the input signal which is used to create the weight vector [Hz] (currently, this algorithm only works on signals with a sampling rate of 44100Hz)
Returns
Details
-
LowLevelSpectralEqloudExtractor( signal [, frameSize [, hopSize [, sampleRate ] ] ] ) → {object}
-
Description
This algorithm extracts a set of level spectral features for which it is recommended to apply a preliminary equal-loudness filter over an input audio signal (according to the internal evaluations conducted at Music Technology Group). To this end, you are expected to provide the output of EqualLoudness algorithm as an input for this algorithm. Still, you are free to provide an unprocessed audio input in the case you want to compute these features without equal-loudness filter. Check https://essentia.upf.edu/reference/std_LowLevelSpectralEqloudExtractor.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input audio signal
frameSize
number <optional> 2048 the frame size for computing low level features
hopSize
number <optional> 1024 the hop size for computing low level features
sampleRate
number <optional> 44100 the audio sampling rate
Returns
Details
-
LowLevelSpectralExtractor( signal [, frameSize [, hopSize [, sampleRate ] ] ] ) → {object}
-
Description
This algorithm extracts all low-level spectral features, which do not require an equal-loudness filter for their computation, from an audio signal Check https://essentia.upf.edu/reference/std_LowLevelSpectralExtractor.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the audio input signal
frameSize
number <optional> 2048 the frame size for computing low level features
hopSize
number <optional> 1024 the hop size for computing low level features
sampleRate
number <optional> 44100 the audio sampling rate
Returns
Details
-
LowPass( signal [, cutoffFrequency [, sampleRate ] ] ) → {object}
-
Description
This algorithm implements a 1st order IIR low-pass filter. Because of its dependence on IIR, IIR's requirements are inherited. References: [1] U. Zölzer, DAFX - Digital Audio Effects, p. 40, John Wiley & Sons, 2002 Check https://essentia.upf.edu/reference/std_LowPass.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input audio signal
cutoffFrequency
number <optional> 1500 the cutoff frequency for the filter [Hz]
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
MFCC( spectrum [, dctType [, highFrequencyBound [, inputSize [, liftering [, logType [, lowFrequencyBound [, normalize [, numberBands [, numberCoefficients [, sampleRate [, silenceThreshold [, type [, warpingFormula [, weighting ] ] ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes the mel-frequency cepstrum coefficients of a spectrum. As there is no standard implementation, the MFCC-FB40 is used by default: - filterbank of 40 bands from 0 to 11000Hz - take the log value of the spectrum energy in each mel band. Bands energy values below silence threshold will be clipped to its value before computing log-energies - DCT of the 40 bands down to 13 mel coefficients There is a paper describing various MFCC implementations [1]. Check https://essentia.upf.edu/reference/std_MFCC.html for more details.
Parameters
Name Type Attributes Default Description spectrum
VectorFloat the audio spectrum
dctType
number <optional> 2 the DCT type
highFrequencyBound
number <optional> 11000 the upper bound of the frequency range [Hz]
inputSize
number <optional> 1025 the size of input spectrum
liftering
number <optional> 0 the liftering coefficient. Use '0' to bypass it
logType
string <optional> dbamp logarithmic compression type. Use 'dbpow' if working with power and 'dbamp' if working with magnitudes
lowFrequencyBound
number <optional> 0 the lower bound of the frequency range [Hz]
normalize
string <optional> unit_sum spectrum bin weights to use for each mel band: 'unit_max' to make each mel band vertex equal to 1, 'unit_sum' to make each mel band area equal to 1 summing the actual weights of spectrum bins, 'unit_area' to make each triangle mel band area equal to 1 normalizing the weights of each triangle by its bandwidth
numberBands
number <optional> 40 the number of mel-bands in the filter
numberCoefficients
number <optional> 13 the number of output mel coefficients
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
silenceThreshold
number <optional> 1e-10 silence threshold for computing log-energy bands
type
string <optional> power use magnitude or power spectrum
warpingFormula
string <optional> htkMel The scale implementation type: 'htkMel' scale from the HTK toolkit [2, 3] (default) or 'slaneyMel' scale from the Auditory toolbox [4]
weighting
string <optional> warping type of weighting function for determining triangle area
Returns
Details
-
MaxFilter( signal [, causal [, width ] ] ) → {object}
-
Description
This algorithm implements a maximum filter for 1d signal using van Herk/Gil-Werman (HGW) algorithm. Check https://essentia.upf.edu/reference/std_MaxFilter.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat signal to be filtered
causal
boolean <optional> true use casual filter (window is behind current element otherwise it is centered around)
width
number <optional> 3 the window size, has to be odd if the window is centered
Returns
Details
-
MaxMagFreq( spectrum [, sampleRate ] ) → {object}
-
Description
This algorithm computes the frequency with the largest magnitude in a spectrum. Note that a spectrum must contain at least two elements otherwise an exception is thrown Check https://essentia.upf.edu/reference/std_MaxMagFreq.html for more details.
Parameters
Name Type Attributes Default Description spectrum
VectorFloat the input spectrum (must have more than 1 element)
sampleRate
number <optional> 44100 the audio sampling rate [Hz]
Returns
Details
-
MaxToTotal( envelope ) → {object}
-
Description
This algorithm computes the ratio between the index of the maximum value of the envelope of a signal and the total length of the envelope. This ratio shows how much the maximum amplitude is off-center. Its value is close to 0 if the maximum is close to the beginning (e.g. Decrescendo or Impulsive sounds), close to 0.5 if it is close to the middle (e.g. Delta sounds) and close to 1 if it is close to the end of the sound (e.g. Crescendo sounds). This algorithm is intended to be fed by the output of the Envelope algorithm Check https://essentia.upf.edu/reference/std_MaxToTotal.html for more details.
Parameters
Name Type Description envelope
VectorFloat the envelope of the signal
Returns
Details
-
Mean( array ) → {object}
-
Description
This algorithm computes the mean of an array. Check https://essentia.upf.edu/reference/std_Mean.html for more details.
Parameters
Name Type Description array
VectorFloat the input array
Returns
Details
-
Median( array ) → {object}
-
Description
This algorithm computes the median of an array. When there is an odd number of numbers, the median is simply the middle number. For example, the median of 2, 4, and 7 is 4. When there is an even number of numbers, the median is the mean of the two middle numbers. Thus, the median of the numbers 2, 4, 7, 12 is (4+7)/2 = 5.5. See [1] for more info. Check https://essentia.upf.edu/reference/std_Median.html for more details.
Parameters
Name Type Description array
VectorFloat the input array (must be non-empty)
Returns
Details
-
MedianFilter( array [, kernelSize ] ) → {object}
-
Description
This algorithm computes the median filtered version of the input signal giving the kernel size as detailed in [1]. Check https://essentia.upf.edu/reference/std_MedianFilter.html for more details.
Parameters
Name Type Attributes Default Description array
VectorFloat the input array (must be non-empty)
kernelSize
number <optional> 11 scalar giving the size of the median filter window. Must be odd
Returns
Details
-
MelBands( spectrum [, highFrequencyBound [, inputSize [, log [, lowFrequencyBound [, normalize [, numberBands [, sampleRate [, type [, warpingFormula [, weighting ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes energy in mel bands of a spectrum. It applies a frequency-domain filterbank (MFCC FB-40, [1]), which consists of equal area triangular filters spaced according to the mel scale. The filterbank is normalized in such a way that the sum of coefficients for every filter equals one. It is recommended that the input "spectrum" be calculated by the Spectrum algorithm. Check https://essentia.upf.edu/reference/std_MelBands.html for more details.
Parameters
Name Type Attributes Default Description spectrum
VectorFloat the audio spectrum
highFrequencyBound
number <optional> 22050 an upper-bound limit for the frequencies to be included in the bands
inputSize
number <optional> 1025 the size of the spectrum
log
boolean <optional> false compute log-energies (log10 (1 + energy))
lowFrequencyBound
number <optional> 0 a lower-bound limit for the frequencies to be included in the bands
normalize
string <optional> unit_sum spectrum bin weights to use for each mel band: 'unit_max' to make each mel band vertex equal to 1, 'unit_sum' to make each mel band area equal to 1 summing the actual weights of spectrum bins, 'unit_area' to make each triangle mel band area equal to 1 normalizing the weights of each triangle by its bandwidth
numberBands
number <optional> 24 the number of output bands
sampleRate
number <optional> 44100 the sample rate
type
string <optional> power 'power' to output squared units, 'magnitude' to keep it as the input
warpingFormula
string <optional> htkMel The scale implementation type: 'htkMel' scale from the HTK toolkit [2, 3] (default) or 'slaneyMel' scale from the Auditory toolbox [4]
weighting
string <optional> warping type of weighting function for determining triangle area
Returns
Details
-
Meter( beatogram ) → {object}
-
Description
This algorithm estimates the time signature of a given beatogram by finding the highest correlation between beats. Check https://essentia.upf.edu/reference/std_Meter.html for more details.
Parameters
Name Type Description beatogram
VectorVectorFloat filtered matrix loudness
Returns
Details
-
MinMax( array [, type ] ) → {object}
-
Description
This algorithm calculates the minimum or maximum value of an array. If the array has more than one minimum or maximum value, the index of the first one is returned Check https://essentia.upf.edu/reference/std_MinMax.html for more details.
Parameters
Name Type Attributes Default Description array
VectorFloat the input array
type
string <optional> min the type of the operation
Returns
Details
-
MinToTotal( envelope ) → {object}
-
Description
This algorithm computes the ratio between the index of the minimum value of the envelope of a signal and the total length of the envelope. Check https://essentia.upf.edu/reference/std_MinToTotal.html for more details.
Parameters
Name Type Description envelope
VectorFloat the envelope of the signal
Returns
Details
-
MovingAverage( signal [, size ] ) → {object}
-
Description
This algorithm implements a FIR Moving Average filter. Because of its dependece on IIR, IIR's requirements are inherited. Check https://essentia.upf.edu/reference/std_MovingAverage.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input audio signal
size
number <optional> 6 the size of the window [audio samples]
Returns
Details
-
MultiPitchKlapuri( signal [, binResolution [, frameSize [, harmonicWeight [, hopSize [, magnitudeCompression [, magnitudeThreshold [, maxFrequency [, minFrequency [, numberHarmonics [, referenceFrequency [, sampleRate ] ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm estimates multiple pitch values corresponding to the melodic lines present in a polyphonic music signal (for example, string quartet, piano). This implementation is based on the algorithm in [1]: In each frame, a set of possible fundamental frequency candidates is extracted based on the principle of harmonic summation. In an optimization stage, the number of harmonic sources (polyphony) is estimated and the final set of fundamental frequencies determined. In contrast to the pich salience function proposed in [2], this implementation uses the pitch salience function described in [1]. The output is a vector for each frame containing the estimated melody pitch values. Check https://essentia.upf.edu/reference/std_MultiPitchKlapuri.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input signal
binResolution
number <optional> 10 salience function bin resolution [cents]
frameSize
number <optional> 2048 the frame size for computing pitch saliecnce
harmonicWeight
number <optional> 0.8 harmonic weighting parameter (weight decay ratio between two consequent harmonics, =1 for no decay)
hopSize
number <optional> 128 the hop size with which the pitch salience function was computed
magnitudeCompression
number <optional> 1 magnitude compression parameter for the salience function (=0 for maximum compression, =1 for no compression)
magnitudeThreshold
number <optional> 40 spectral peak magnitude threshold (maximum allowed difference from the highest peak in dBs)
maxFrequency
number <optional> 1760 the maximum allowed frequency for salience function peaks (ignore peaks above) [Hz]
minFrequency
number <optional> 80 the minimum allowed frequency for salience function peaks (ignore peaks below) [Hz]
numberHarmonics
number <optional> 10 number of considered harmonics
referenceFrequency
number <optional> 55 the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
MultiPitchMelodia( signal [, binResolution [, filterIterations [, frameSize [, guessUnvoiced [, harmonicWeight [, hopSize [, magnitudeCompression [, magnitudeThreshold [, maxFrequency [, minDuration [, minFrequency [, numberHarmonics [, peakDistributionThreshold [, peakFrameThreshold [, pitchContinuity [, referenceFrequency [, sampleRate [, timeContinuity ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm estimates multiple fundamental frequency contours from an audio signal. It is a multi pitch version of the MELODIA algorithm described in [1]. While the algorithm is originally designed to extract melody in polyphonic music, this implementation is adapted for multiple sources. The approach is based on the creation and characterization of pitch contours, time continuous sequences of pitch candidates grouped using auditory streaming cues. To this end, PitchSalienceFunction, PitchSalienceFunctionPeaks, PitchContours, and PitchContoursMonoMelody algorithms are employed. It is strongly advised to use the default parameter values which are optimized according to [1] (where further details are provided) except for minFrequency, maxFrequency, and voicingTolerance, which will depend on your application. Check https://essentia.upf.edu/reference/std_MultiPitchMelodia.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input signal
binResolution
number <optional> 10 salience function bin resolution [cents]
filterIterations
number <optional> 3 number of iterations for the octave errors / pitch outlier filtering process
frameSize
number <optional> 2048 the frame size for computing pitch saliecnce
guessUnvoiced
boolean <optional> false estimate pitch for non-voiced segments by using non-salient contours when no salient ones are present in a frame
harmonicWeight
number <optional> 0.8 harmonic weighting parameter (weight decay ratio between two consequent harmonics, =1 for no decay)
hopSize
number <optional> 128 the hop size with which the pitch salience function was computed
magnitudeCompression
number <optional> 1 magnitude compression parameter for the salience function (=0 for maximum compression, =1 for no compression)
magnitudeThreshold
number <optional> 40 spectral peak magnitude threshold (maximum allowed difference from the highest peak in dBs)
maxFrequency
number <optional> 20000 the minimum allowed frequency for salience function peaks (ignore contours with peaks above) [Hz]
minDuration
number <optional> 100 the minimum allowed contour duration [ms]
minFrequency
number <optional> 40 the minimum allowed frequency for salience function peaks (ignore contours with peaks below) [Hz]
numberHarmonics
number <optional> 20 number of considered harmonics
peakDistributionThreshold
number <optional> 0.9 allowed deviation below the peak salience mean over all frames (fraction of the standard deviation)
peakFrameThreshold
number <optional> 0.9 per-frame salience threshold factor (fraction of the highest peak salience in a frame)
pitchContinuity
number <optional> 27.5625 pitch continuity cue (maximum allowed pitch change during 1 ms time period) [cents]
referenceFrequency
number <optional> 55 the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
timeContinuity
number <optional> 100 time continuity cue (the maximum allowed gap duration for a pitch contour) [ms]
Returns
Details
-
Multiplexer( [ numberRealInputs [, numberVectorRealInputs ] ] ) → {object}
-
Description
This algorithm returns a single vector from a given number of real values and/or frames. Frames from different inputs are multiplexed onto a single stream in an alternating fashion. Check https://essentia.upf.edu/reference/std_Multiplexer.html for more details.
Parameters
Name Type Attributes Default Description numberRealInputs
number <optional> 0 the number of inputs of type Real to multiplex
numberVectorRealInputs
number <optional> 0 the number of inputs of type vector
to multiplex Returns
Details
-
NNLSChroma( logSpectrogram, meanTuning, localTuning [, chromaNormalization [, frameSize [, sampleRate [, spectralShape [, spectralWhitening [, tuningMode [, useNNLS ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm extracts treble and bass chromagrams from a sequence of log-frequency spectrum frames. On this representation, two processing steps are performed: -tuning, after which each centre bin (i.e. bin 2, 5, 8, ...) corresponds to a semitone, even if the tuning of the piece deviates from 440 Hz standard pitch. -running standardisation: subtraction of the running mean, division by the running standard deviation. This has a spectral whitening effect. This code is ported from NNLS Chroma [1, 2]. To achieve similar results follow this processing chain: frame slicing with sample rate = 44100, frame size = 16384, hop size = 2048 -> Windowing with Hann and no normalization -> Spectrum -> LogSpectrum. Check https://essentia.upf.edu/reference/std_NNLSChroma.html for more details.
Parameters
Name Type Attributes Default Description logSpectrogram
VectorVectorFloat log spectrum frames
meanTuning
VectorFloat mean tuning frames
localTuning
VectorFloat local tuning frames
chromaNormalization
string <optional> none determines whether or how the chromagrams are normalised
frameSize
number <optional> 1025 the input frame size of the spectrum vector
sampleRate
number <optional> 44100 the input sample rate
spectralShape
number <optional> 0.7 the shape of the notes in the NNLS dictionary
spectralWhitening
number <optional> 1 determines how much the log-frequency spectrum is whitened
tuningMode
string <optional> global local uses a local average for tuning, global uses all audio frames. Local tuning is only advisable when the tuning is likely to change over the audio
useNNLS
boolean <optional> true toggle between NNLS approximate transcription and linear spectral mapping
Returns
Details
-
NoiseAdder( signal [, fixSeed [, level ] ] ) → {object}
-
Description
This algorithm adds noise to an input signal. The average energy of the noise in dB is defined by the level parameter, and is generated using the Mersenne Twister random number generator. Check https://essentia.upf.edu/reference/std_NoiseAdder.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input signal
fixSeed
boolean <optional> false if true, 0 is used as the seed for generating random values
level
number <optional> -100 power level of the noise generator [dB]
Returns
Details
-
NoiseBurstDetector( frame [, alpha [, silenceThreshold [, threshold ] ] ] ) → {object}
-
Description
This algorithm detects noise bursts in the waveform by thresholding the peaks of the second derivative. The threshold is computed using an Exponential Moving Average filter over the RMS of the second derivative of the input frame. Check https://essentia.upf.edu/reference/std_NoiseBurstDetector.html for more details.
Parameters
Name Type Attributes Default Description frame
VectorFloat the input frame (must be non-empty)
alpha
number <optional> 0.9 alpha coefficient for the Exponential Moving Average threshold estimation.
silenceThreshold
number <optional> -50 threshold to skip silent frames
threshold
number <optional> 8 factor to control the dynamic theshold
Returns
Details
-
NoveltyCurve( frequencyBands [, frameRate [, normalize [, weightCurve [, weightCurveType ] ] ] ] ) → {object}
-
Description
This algorithm computes the "novelty curve" (Grosche & Müller, 2009) onset detection function. The algorithm expects as an input a frame-wise sequence of frequency-bands energies or spectrum magnitudes as originally proposed in [1] (see FrequencyBands and Spectrum algorithms). Novelty in each band (or frequency bin) is computed as a derivative between log-compressed energy (magnitude) values in consequent frames. The overall novelty value is then computed as a weighted sum that can be configured using 'weightCurve' parameter. The resulting novelty curve can be used for beat tracking and onset detection (see BpmHistogram and Onsets). Check https://essentia.upf.edu/reference/std_NoveltyCurve.html for more details.
Parameters
Name Type Attributes Default Description frequencyBands
VectorVectorFloat the frequency bands
frameRate
number <optional> 344.531 the sampling rate of the input audio
normalize
boolean <optional> false whether to normalize each band's energy
weightCurve
Array.<any> <optional> [] vector containing the weights for each frequency band. Only if weightCurveType==supplied
weightCurveType
string <optional> hybrid the type of weighting to be used for the bands novelty
Returns
Details
-
NoveltyCurveFixedBpmEstimator( novelty [, hopSize [, maxBpm [, minBpm [, sampleRate [, tolerance ] ] ] ] ] ) → {object}
-
Description
This algorithm outputs a histogram of the most probable bpms assuming the signal has constant tempo given the novelty curve. This algorithm is based on the autocorrelation of the novelty curve (see NoveltyCurve algorithm) and should only be used for signals that have a constant tempo or as a first tempo estimator to be used in conjunction with other algorithms such as BpmHistogram.It is a simplified version of the algorithm described in [1] as, in order to predict the best BPM candidate, it computes autocorrelation of the entire novelty curve instead of analyzing it on frames and histogramming the peaks over frames. Check https://essentia.upf.edu/reference/std_NoveltyCurveFixedBpmEstimator.html for more details.
Parameters
Name Type Attributes Default Description novelty
VectorFloat the novelty curve of the audio signal
hopSize
number <optional> 512 the hopSize used to computeh the novelty curve from the original signal
maxBpm
number <optional> 560 the maximum bpm to look for
minBpm
number <optional> 30 the minimum bpm to look for
sampleRate
number <optional> 44100 the sampling rate original audio signal [Hz]
tolerance
number <optional> 3 tolerance (in percentage) for considering bpms to be equal
Returns
Details
-
OddToEvenHarmonicEnergyRatio( frequencies, magnitudes ) → {object}
-
Description
This algorithm computes the ratio between a signal's odd and even harmonic energy given the signal's harmonic peaks. The odd to even harmonic energy ratio is a measure allowing to distinguish odd-harmonic-energy predominant sounds (such as from a clarinet) from equally important even-harmonic-energy sounds (such as from a trumpet). The required harmonic frequencies and magnitudes can be computed by the HarmonicPeaks algorithm. In the case when the even energy is zero, which may happen when only even harmonics where found or when only one peak was found, the algorithm outputs the maximum real number possible. Therefore, this algorithm should be used in conjunction with the harmonic peaks algorithm. If no peaks are supplied, the algorithm outputs a value of one, assuming either the spectrum was flat or it was silent. Check https://essentia.upf.edu/reference/std_OddToEvenHarmonicEnergyRatio.html for more details.
Parameters
Name Type Description frequencies
VectorFloat the frequencies of the harmonic peaks (at least two frequencies in frequency ascending order)
magnitudes
VectorFloat the magnitudes of the harmonic peaks (at least two magnitudes in frequency ascending order)
Returns
Details
-
OnsetDetection( spectrum, phase [, method [, sampleRate ] ] ) → {object}
-
Description
This algorithm computes various onset detection functions. The output of this algorithm should be post-processed in order to determine whether the frame contains an onset or not. Namely, it could be fed to the Onsets algorithm. It is recommended that the input "spectrum" is generated by the Spectrum algorithm. Four methods are available: - 'HFC', the High Frequency Content detection function which accurately detects percussive events (see HFC algorithm for details). - 'complex', the Complex-Domain spectral difference function [1] taking into account changes in magnitude and phase. It emphasizes note onsets either as a result of significant change in energy in the magnitude spectrum, and/or a deviation from the expected phase values in the phase spectrum, caused by a change in pitch. - 'complex_phase', the simplified Complex-Domain spectral difference function [2] taking into account phase changes, weighted by magnitude. TODO:It reacts better on tonal sounds such as bowed string, but tends to over-detect percussive events. - 'flux', the Spectral Flux detection function which characterizes changes in magnitude spectrum. See Flux algorithm for details. - 'melflux', the spectral difference function, similar to spectral flux, but using half-rectified energy changes in Mel-frequency bands of the spectrum [3]. - 'rms', the difference function, measuring the half-rectified change of the RMS of the magnitude spectrum (i.e., measuring overall energy flux) [4]. Check https://essentia.upf.edu/reference/std_OnsetDetection.html for more details.
Parameters
Name Type Attributes Default Description spectrum
VectorFloat the input spectrum
phase
VectorFloat the phase vector corresponding to this spectrum (used only by the "complex" method)
method
string <optional> hfc the method used for onset detection
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
OnsetDetectionGlobal( signal [, frameSize [, hopSize [, method [, sampleRate ] ] ] ] ) → {object}
-
Description
This algorithm computes various onset detection functions. Detection values are computed frame-wisely given an input signal. The output of this algorithm should be post-processed in order to determine whether the frame contains an onset or not. Namely, it could be fed to the Onsets algorithm. The following method are available: - 'infogain', the spectral difference measured by the modified information gain [1]. For each frame, it accounts for energy change in between preceding and consecutive frames, histogrammed together, in order to suppress short-term variations on frame-by-frame basis. - 'beat_emphasis', the beat emphasis function [1]. This function is a linear combination of onset detection functions (complex spectral differences) in a number of sub-bands, weighted by their beat strength computed over the entire input signal. Note: - 'infogain' onset detection has been optimized for the default sampleRate=44100Hz, frameSize=2048, hopSize=512. - 'beat_emphasis' is optimized for a fixed resolution of 11.6ms, which corresponds to the default sampleRate=44100Hz, frameSize=1024, hopSize=512. Optimal performance of beat detection with TempoTapDegara is not guaranteed for other settings. Check https://essentia.upf.edu/reference/std_OnsetDetectionGlobal.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input signal
frameSize
number <optional> 2048 the frame size for computing onset detection function
hopSize
number <optional> 512 the hop size for computing onset detection function
method
string <optional> infogain the method used for onset detection
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
OnsetRate( signal ) → {object}
-
Description
This algorithm computes the number of onsets per second and their position in time for an audio signal. Onset detection functions are computed using both high frequency content and complex-domain methods available in OnsetDetection algorithm. See OnsetDetection for more information. Please note that due to a dependence on the Onsets algorithm, this algorithm is only valid for audio signals with a sampling rate of 44100Hz. This algorithm throws an exception if the input signal is empty. Check https://essentia.upf.edu/reference/std_OnsetRate.html for more details.
Parameters
Name Type Description signal
VectorFloat the input signal
Returns
Details
-
OverlapAdd( signal [, frameSize [, gain [, hopSize ] ] ] ) → {object}
-
Description
This algorithm returns the output of an overlap-add process for a sequence of frames of an audio signal. It considers that the input audio frames are windowed audio signals. Giving the size of the frame and the hop size, overlapping and adding consecutive frames will produce a continuous signal. A normalization gain can be passed as a parameter. Check https://essentia.upf.edu/reference/std_OverlapAdd.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the windowed input audio frame
frameSize
number <optional> 2048 the frame size for computing the overlap-add process
gain
number <optional> 1 the normalization gain that scales the output signal. Useful for IFFT output
hopSize
number <optional> 128 the hop size with which the overlap-add function is computed
Returns
Details
-
PeakDetection( array [, interpolate [, maxPeaks [, maxPosition [, minPeakDistance [, minPosition [, orderBy [, range [, threshold ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm detects local maxima (peaks) in an array. The algorithm finds positive slopes and detects a peak when the slope changes sign and the peak is above the threshold. It optionally interpolates using parabolic curve fitting. When two consecutive peaks are closer than the
minPeakDistance
parameter, the smallest one is discarded. A value of 0 bypasses this feature. Check https://essentia.upf.edu/reference/std_PeakDetection.html for more details.Parameters
Name Type Attributes Default Description array
VectorFloat the input array
interpolate
boolean <optional> true boolean flag to enable interpolation
maxPeaks
number <optional> 100 the maximum number of returned peaks
maxPosition
number <optional> 1 the maximum value of the range to evaluate
minPeakDistance
number <optional> 0 minimum distance between consecutive peaks (0 to bypass this feature)
minPosition
number <optional> 0 the minimum value of the range to evaluate
orderBy
string <optional> position the ordering type of the output peaks (ascending by position or descending by value)
range
number <optional> 1 the input range
threshold
number <optional> -1e+06 peaks below this given threshold are not output
Returns
Details
-
PercivalBpmEstimator( signal [, frameSize [, frameSizeOSS [, hopSize [, hopSizeOSS [, maxBPM [, minBPM [, sampleRate ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm estimates the tempo in beats per minute (BPM) from an input signal as described in [1]. Check https://essentia.upf.edu/reference/std_PercivalBpmEstimator.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat input signal
frameSize
number <optional> 1024 frame size for the analysis of the input signal
frameSizeOSS
number <optional> 2048 frame size for the analysis of the Onset Strength Signal
hopSize
number <optional> 128 hop size for the analysis of the input signal
hopSizeOSS
number <optional> 128 hop size for the analysis of the Onset Strength Signal
maxBPM
number <optional> 210 maximum BPM to detect
minBPM
number <optional> 50 minimum BPM to detect
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
PercivalEnhanceHarmonics( array ) → {object}
-
Description
This algorithm implements the 'Enhance Harmonics' step as described in [1].Given an input autocorrelation signal, two time-stretched versions of it (by factors of 2 and 4) are added to the original.In this way, peaks with an harmonic relation are boosted. For more details check the referenced paper. Check https://essentia.upf.edu/reference/std_PercivalEnhanceHarmonics.html for more details.
Parameters
Name Type Description array
VectorFloat the input signal
Returns
Details
-
PercivalEvaluatePulseTrains( oss, positions ) → {object}
-
Description
This algorithm implements the 'Evaluate Pulse Trains' step as described in [1].Given an input onset strength signal (OSS) and a number of candidate tempo lag positions, the OSS is correlated with ideal expected pulse trains (for each candidate tempo lag) shifted in time by different amounts. The candidate tempo lag which generates the pulse train that better correlates with the OSS is returned as the preferred tempo candidate. For more details check the referenced paper. Check https://essentia.upf.edu/reference/std_PercivalEvaluatePulseTrains.html for more details.
Parameters
Name Type Description oss
VectorFloat onset strength signal (or other novelty curve)
positions
VectorFloat peak positions of BPM candidates
Returns
Details
-
PitchContourSegmentation( pitch, signal [, hopSize [, minDuration [, pitchDistanceThreshold [, rmsThreshold [, sampleRate [, tuningFrequency ] ] ] ] ] ] ) → {object}
-
Description
This algorithm converts a pitch sequence estimated from an audio signal into a set of discrete note events. Each note is defined by its onset time, duration and MIDI pitch value, quantized to the equal tempered scale. Check https://essentia.upf.edu/reference/std_PitchContourSegmentation.html for more details.
Parameters
Name Type Attributes Default Description pitch
VectorFloat estimated pitch contour [Hz]
signal
VectorFloat input audio signal
hopSize
number <optional> 128 hop size of the extracted pitch
minDuration
number <optional> 0.1 minimum note duration [s]
pitchDistanceThreshold
number <optional> 60 pitch threshold for note segmentation [cents]
rmsThreshold
number <optional> -2 zscore threshold for note segmentation
sampleRate
number <optional> 44100 sample rate of the audio signal
tuningFrequency
number <optional> 440 tuning reference frequency [Hz]
Returns
Details
-
PitchContours( peakBins, peakSaliences [, binResolution [, hopSize [, minDuration [, peakDistributionThreshold [, peakFrameThreshold [, pitchContinuity [, sampleRate [, timeContinuity ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm tracks a set of predominant pitch contours of an audio signal. This algorithm is intended to receive its "frequencies" and "magnitudes" inputs from the PitchSalienceFunctionPeaks algorithm outputs aggregated over all frames in the sequence. The output is a vector of estimated melody pitch values. Check https://essentia.upf.edu/reference/std_PitchContours.html for more details.
Parameters
Name Type Attributes Default Description peakBins
VectorVectorFloat frame-wise array of cent bins corresponding to pitch salience function peaks
peakSaliences
VectorVectorFloat frame-wise array of values of salience function peaks
binResolution
number <optional> 10 salience function bin resolution [cents]
hopSize
number <optional> 128 the hop size with which the pitch salience function was computed
minDuration
number <optional> 100 the minimum allowed contour duration [ms]
peakDistributionThreshold
number <optional> 0.9 allowed deviation below the peak salience mean over all frames (fraction of the standard deviation)
peakFrameThreshold
number <optional> 0.9 per-frame salience threshold factor (fraction of the highest peak salience in a frame)
pitchContinuity
number <optional> 27.5625 pitch continuity cue (maximum allowed pitch change durig 1 ms time period) [cents]
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
timeContinuity
number <optional> 100 time continuity cue (the maximum allowed gap duration for a pitch contour) [ms]
Returns
Details
-
PitchContoursMelody( contoursBins, contoursSaliences, contoursStartTimes, duration [, binResolution [, filterIterations [, guessUnvoiced [, hopSize [, maxFrequency [, minFrequency [, referenceFrequency [, sampleRate [, voiceVibrato [, voicingTolerance ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm converts a set of pitch contours into a sequence of predominant f0 values in Hz by taking the value of the most predominant contour in each frame. This algorithm is intended to receive its "contoursBins", "contoursSaliences", and "contoursStartTimes" inputs from the PitchContours algorithm. The "duration" input corresponds to the time duration of the input signal. The output is a vector of estimated pitch values and a vector of confidence values. Check https://essentia.upf.edu/reference/std_PitchContoursMelody.html for more details.
Parameters
Name Type Attributes Default Description contoursBins
VectorVectorFloat array of frame-wise vectors of cent bin values representing each contour
contoursSaliences
VectorVectorFloat array of frame-wise vectors of pitch saliences representing each contour
contoursStartTimes
VectorFloat array of the start times of each contour [s]
duration
number time duration of the input signal [s]
binResolution
number <optional> 10 salience function bin resolution [cents]
filterIterations
number <optional> 3 number of interations for the octave errors / pitch outlier filtering process
guessUnvoiced
boolean <optional> false Estimate pitch for non-voiced segments by using non-salient contours when no salient ones are present in a frame
hopSize
number <optional> 128 the hop size with which the pitch salience function was computed
maxFrequency
number <optional> 20000 the minimum allowed frequency for salience function peaks (ignore contours with peaks above) [Hz]
minFrequency
number <optional> 80 the minimum allowed frequency for salience function peaks (ignore contours with peaks below) [Hz]
referenceFrequency
number <optional> 55 the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin
sampleRate
number <optional> 44100 the sampling rate of the audio signal (Hz)
voiceVibrato
boolean <optional> false detect voice vibrato
voicingTolerance
number <optional> 0.2 allowed deviation below the average contour mean salience of all contours (fraction of the standard deviation)
Returns
Details
-
PitchContoursMonoMelody( contoursBins, contoursSaliences, contoursStartTimes, duration [, binResolution [, filterIterations [, guessUnvoiced [, hopSize [, maxFrequency [, minFrequency [, referenceFrequency [, sampleRate ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm converts a set of pitch contours into a sequence of f0 values in Hz by taking the value of the most salient contour in each frame. In contrast to pitchContoursMelody, it assumes a single source. This algorithm is intended to receive its "contoursBins", "contoursSaliences", and "contoursStartTimes" inputs from the PitchContours algorithm. The "duration" input corresponds to the time duration of the input signal. The output is a vector of estimated pitch values and a vector of confidence values. Check https://essentia.upf.edu/reference/std_PitchContoursMonoMelody.html for more details.
Parameters
Name Type Attributes Default Description contoursBins
VectorVectorFloat array of frame-wise vectors of cent bin values representing each contour
contoursSaliences
VectorVectorFloat array of frame-wise vectors of pitch saliences representing each contour
contoursStartTimes
VectorFloat array of the start times of each contour [s]
duration
number time duration of the input signal [s]
binResolution
number <optional> 10 salience function bin resolution [cents]
filterIterations
number <optional> 3 number of interations for the octave errors / pitch outlier filtering process
guessUnvoiced
boolean <optional> false Estimate pitch for non-voiced segments by using non-salient contours when no salient ones are present in a frame
hopSize
number <optional> 128 the hop size with which the pitch salience function was computed
maxFrequency
number <optional> 20000 the minimum allowed frequency for salience function peaks (ignore contours with peaks above) [Hz]
minFrequency
number <optional> 80 the minimum allowed frequency for salience function peaks (ignore contours with peaks below) [Hz]
referenceFrequency
number <optional> 55 the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin
sampleRate
number <optional> 44100 the sampling rate of the audio signal (Hz)
Returns
Details
-
PitchContoursMultiMelody( contoursBins, contoursSaliences, contoursStartTimes, duration [, binResolution [, filterIterations [, guessUnvoiced [, hopSize [, maxFrequency [, minFrequency [, referenceFrequency [, sampleRate ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm post-processes a set of pitch contours into a sequence of mutliple f0 values in Hz. This algorithm is intended to receive its "contoursBins", "contoursSaliences", and "contoursStartTimes" inputs from the PitchContours algorithm. The "duration" input corresponds to the time duration of the input signal. The output is a vector of estimated pitch values Check https://essentia.upf.edu/reference/std_PitchContoursMultiMelody.html for more details.
Parameters
Name Type Attributes Default Description contoursBins
VectorVectorFloat array of frame-wise vectors of cent bin values representing each contour
contoursSaliences
VectorVectorFloat array of frame-wise vectors of pitch saliences representing each contour
contoursStartTimes
VectorFloat array of the start times of each contour [s]
duration
number time duration of the input signal [s]
binResolution
number <optional> 10 salience function bin resolution [cents]
filterIterations
number <optional> 3 number of interations for the octave errors / pitch outlier filtering process
guessUnvoiced
boolean <optional> false Estimate pitch for non-voiced segments by using non-salient contours when no salient ones are present in a frame
hopSize
number <optional> 128 the hop size with which the pitch salience function was computed
maxFrequency
number <optional> 20000 the minimum allowed frequency for salience function peaks (ignore contours with peaks above) [Hz]
minFrequency
number <optional> 80 the minimum allowed frequency for salience function peaks (ignore contours with peaks below) [Hz]
referenceFrequency
number <optional> 55 the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin
sampleRate
number <optional> 44100 the sampling rate of the audio signal (Hz)
Returns
Details
-
PitchFilter( pitch, pitchConfidence [, confidenceThreshold [, minChunkSize [, useAbsolutePitchConfidence ] ] ] ) → {object}
-
Description
This algorithm corrects the fundamental frequency estimations for a sequence of frames given pitch values together with their confidence values. In particular, it removes non-confident parts and spurious jumps in pitch and applies octave corrections. Check https://essentia.upf.edu/reference/std_PitchFilter.html for more details.
Parameters
Name Type Attributes Default Description pitch
VectorFloat vector of pitch values for the input frames [Hz]
pitchConfidence
VectorFloat vector of pitch confidence values for the input frames
confidenceThreshold
number <optional> 36 ratio between the average confidence of the most confident chunk and the minimum allowed average confidence of a chunk
minChunkSize
number <optional> 30 minumum number of frames in non-zero pitch chunks
useAbsolutePitchConfidence
boolean <optional> false treat negative pitch confidence values as positive (use with melodia guessUnvoiced=True)
Returns
Details
-
PitchMelodia( signal [, binResolution [, filterIterations [, frameSize [, guessUnvoiced [, harmonicWeight [, hopSize [, magnitudeCompression [, magnitudeThreshold [, maxFrequency [, minDuration [, minFrequency [, numberHarmonics [, peakDistributionThreshold [, peakFrameThreshold [, pitchContinuity [, referenceFrequency [, sampleRate [, timeContinuity ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm estimates the fundamental frequency corresponding to the melody of a monophonic music signal based on the MELODIA algorithm. While the algorithm is originally designed to extract the predominant melody from polyphonic music [1], this implementation is adapted for monophonic signals. The approach is based on the creation and characterization of pitch contours, time continuous sequences of pitch candidates grouped using auditory streaming cues. To this end, PitchSalienceFunction, PitchSalienceFunctionPeaks, PitchContours, and PitchContoursMonoMelody algorithms are employed. It is strongly advised to use the default parameter values which are optimized according to [1] (where further details are provided) except for minFrequency and maxFrequency, which will depend on your application. Check https://essentia.upf.edu/reference/std_PitchMelodia.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input signal
binResolution
number <optional> 10 salience function bin resolution [cents]
filterIterations
number <optional> 3 number of iterations for the octave errors / pitch outlier filtering process
frameSize
number <optional> 2048 the frame size for computing pitch saliecnce
guessUnvoiced
boolean <optional> false estimate pitch for non-voiced segments by using non-salient contours when no salient ones are present in a frame
harmonicWeight
number <optional> 0.8 harmonic weighting parameter (weight decay ratio between two consequent harmonics, =1 for no decay)
hopSize
number <optional> 128 the hop size with which the pitch salience function was computed
magnitudeCompression
number <optional> 1 magnitude compression parameter for the salience function (=0 for maximum compression, =1 for no compression)
magnitudeThreshold
number <optional> 40 spectral peak magnitude threshold (maximum allowed difference from the highest peak in dBs)
maxFrequency
number <optional> 20000 the minimum allowed frequency for salience function peaks (ignore contours with peaks above) [Hz]
minDuration
number <optional> 100 the minimum allowed contour duration [ms]
minFrequency
number <optional> 40 the minimum allowed frequency for salience function peaks (ignore contours with peaks below) [Hz]
numberHarmonics
number <optional> 20 number of considered harmonics
peakDistributionThreshold
number <optional> 0.9 allowed deviation below the peak salience mean over all frames (fraction of the standard deviation)
peakFrameThreshold
number <optional> 0.9 per-frame salience threshold factor (fraction of the highest peak salience in a frame)
pitchContinuity
number <optional> 27.5625 pitch continuity cue (maximum allowed pitch change during 1 ms time period) [cents]
referenceFrequency
number <optional> 55 the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
timeContinuity
number <optional> 100 time continuity cue (the maximum allowed gap duration for a pitch contour) [ms]
Returns
Details
-
PitchSalience( spectrum [, highBoundary [, lowBoundary [, sampleRate ] ] ] ) → {object}
-
Description
This algorithm computes the pitch salience of a spectrum. The pitch salience is given by the ratio of the highest auto correlation value of the spectrum to the non-shifted auto correlation value. Pitch salience was designed as quick measure of tone sensation. Unpitched sounds (non-musical sound effects) and pure tones have an average pitch salience value close to 0 whereas sounds containing several harmonics in the spectrum tend to have a higher value. Check https://essentia.upf.edu/reference/std_PitchSalience.html for more details.
Parameters
Name Type Attributes Default Description spectrum
VectorFloat the input audio spectrum
highBoundary
number <optional> 5000 until which frequency we are looking for the minimum (must be smaller than half sampleRate) [Hz]
lowBoundary
number <optional> 100 from which frequency we are looking for the maximum (must not be larger than highBoundary) [Hz]
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
PitchSalienceFunction( frequencies, magnitudes [, binResolution [, harmonicWeight [, magnitudeCompression [, magnitudeThreshold [, numberHarmonics [, referenceFrequency ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes the pitch salience function of a signal frame given its spectral peaks. The salience function covers a pitch range of nearly five octaves (i.e., 6000 cents), starting from the "referenceFrequency", and is quantized into cent bins according to the specified "binResolution". The salience of a given frequency is computed as the sum of the weighted energies found at integer multiples (harmonics) of that frequency. Check https://essentia.upf.edu/reference/std_PitchSalienceFunction.html for more details.
Parameters
Name Type Attributes Default Description frequencies
VectorFloat the frequencies of the spectral peaks [Hz]
magnitudes
VectorFloat the magnitudes of the spectral peaks
binResolution
number <optional> 10 salience function bin resolution [cents]
harmonicWeight
number <optional> 0.8 harmonic weighting parameter (weight decay ratio between two consequent harmonics, =1 for no decay)
magnitudeCompression
number <optional> 1 magnitude compression parameter (=0 for maximum compression, =1 for no compression)
magnitudeThreshold
number <optional> 40 peak magnitude threshold (maximum allowed difference from the highest peak in dBs)
numberHarmonics
number <optional> 20 number of considered harmonics
referenceFrequency
number <optional> 55 the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin
Returns
Details
-
PitchSalienceFunctionPeaks( salienceFunction [, binResolution [, maxFrequency [, minFrequency [, referenceFrequency ] ] ] ] ) → {object}
-
Description
This algorithm computes the peaks of a given pitch salience function. Check https://essentia.upf.edu/reference/std_PitchSalienceFunctionPeaks.html for more details.
Parameters
Name Type Attributes Default Description salienceFunction
VectorFloat the array of salience function values corresponding to cent frequency bins
binResolution
number <optional> 10 salience function bin resolution [cents]
maxFrequency
number <optional> 1760 the maximum frequency to evaluate (ignore peaks above) [Hz]
minFrequency
number <optional> 55 the minimum frequency to evaluate (ignore peaks below) [Hz]
referenceFrequency
number <optional> 55 the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin
Returns
Details
-
PitchYin( signal [, frameSize [, interpolate [, maxFrequency [, minFrequency [, sampleRate [, tolerance ] ] ] ] ] ] ) → {object}
-
Description
This algorithm estimates the fundamental frequency given the frame of a monophonic music signal. It is an implementation of the Yin algorithm [1] for computations in the time domain. Check https://essentia.upf.edu/reference/std_PitchYin.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input signal frame
frameSize
number <optional> 2048 number of samples in the input frame (this is an optional parameter to optimize memory allocation)
interpolate
boolean <optional> true enable interpolation
maxFrequency
number <optional> 22050 the maximum allowed frequency [Hz]
minFrequency
number <optional> 20 the minimum allowed frequency [Hz]
sampleRate
number <optional> 44100 sampling rate of the input audio [Hz]
tolerance
number <optional> 0.15 tolerance for peak detection
Returns
Details
-
PitchYinFFT( spectrum [, frameSize [, interpolate [, maxFrequency [, minFrequency [, sampleRate [, tolerance ] ] ] ] ] ] ) → {object}
-
Description
This algorithm estimates the fundamental frequency given the spectrum of a monophonic music signal. It is an implementation of YinFFT algorithm [1], which is an optimized version of Yin algorithm for computation in the frequency domain. It is recommended to window the input spectrum with a Hann window. The raw spectrum can be computed with the Spectrum algorithm. Check https://essentia.upf.edu/reference/std_PitchYinFFT.html for more details.
Parameters
Name Type Attributes Default Description spectrum
VectorFloat the input spectrum (preferably created with a hann window)
frameSize
number <optional> 2048 number of samples in the input spectrum
interpolate
boolean <optional> true boolean flag to enable interpolation
maxFrequency
number <optional> 22050 the maximum allowed frequency [Hz]
minFrequency
number <optional> 20 the minimum allowed frequency [Hz]
sampleRate
number <optional> 44100 sampling rate of the input spectrum [Hz]
tolerance
number <optional> 1 tolerance for peak detection
Returns
Details
-
PitchYinProbabilistic( signal [, frameSize [, hopSize [, lowRMSThreshold [, outputUnvoiced [, preciseTime [, sampleRate ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes the pitch track of a mono audio signal using probabilistic Yin algorithm. Check https://essentia.upf.edu/reference/std_PitchYinProbabilistic.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input mono audio signal
frameSize
number <optional> 2048 the frame size of FFT
hopSize
number <optional> 256 the hop size with which the pitch is computed
lowRMSThreshold
number <optional> 0.1 the low RMS amplitude threshold
outputUnvoiced
string <optional> negative whether output unvoiced frame, zero: output non-voiced pitch as 0.; abs: output non-voiced pitch as absolute values; negative: output non-voiced pitch as negative values
preciseTime
boolean <optional> false use non-standard precise YIN timing (slow).
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
PitchYinProbabilities( signal [, frameSize [, lowAmp [, preciseTime [, sampleRate ] ] ] ] ) → {object}
-
Description
This algorithm estimates the fundamental frequencies, their probabilities given the frame of a monophonic music signal. It is a part of the implementation of the probabilistic Yin algorithm [1]. Check https://essentia.upf.edu/reference/std_PitchYinProbabilities.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input signal frame
frameSize
number <optional> 2048 number of samples in the input frame
lowAmp
number <optional> 0.1 the low RMS amplitude threshold
preciseTime
boolean <optional> false use non-standard precise YIN timing (slow).
sampleRate
number <optional> 44100 sampling rate of the input audio [Hz]
Returns
Details
-
PitchYinProbabilitiesHMM( pitchCandidates, probabilities [, minFrequency [, numberBinsPerSemitone [, selfTransition [, yinTrust ] ] ] ] ) → {object}
-
Description
This algorithm estimates the smoothed fundamental frequency given the pitch candidates and probabilities using hidden Markov models. It is a part of the implementation of the probabilistic Yin algorithm [1]. Check https://essentia.upf.edu/reference/std_PitchYinProbabilitiesHMM.html for more details.
Parameters
Name Type Attributes Default Description pitchCandidates
VectorVectorFloat the pitch candidates
probabilities
VectorVectorFloat the pitch probabilities
minFrequency
number <optional> 61.735 minimum detected frequency
numberBinsPerSemitone
number <optional> 5 number of bins per semitone
selfTransition
number <optional> 0.99 the self transition probabilities
yinTrust
number <optional> 0.5 the yin trust parameter
Returns
Details
-
PowerMean( array [, power ] ) → {object}
-
Description
This algorithm computes the power mean of an array. It accepts one parameter, p, which is the power (or order or degree) of the Power Mean. Note that if p=-1, the Power Mean is equal to the Harmonic Mean, if p=0, the Power Mean is equal to the Geometric Mean, if p=1, the Power Mean is equal to the Arithmetic Mean, if p=2, the Power Mean is equal to the Root Mean Square. Check https://essentia.upf.edu/reference/std_PowerMean.html for more details.
Parameters
Name Type Attributes Default Description array
VectorFloat the input array (must contain only positive real numbers)
power
number <optional> 1 the power to which to elevate each element before taking the mean
Returns
Details
-
PowerSpectrum( signal [, size ] ) → {object}
-
Description
This algorithm computes the power spectrum of an array of Reals. The resulting power spectrum has a size which is half the size of the input array plus one. Bins contain squared magnitude values. Check https://essentia.upf.edu/reference/std_PowerSpectrum.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input signal
size
number <optional> 2048 the expected size of the input frame (this is purely optional and only targeted at optimizing the creation time of the FFT object)
Returns
Details
-
PredominantPitchMelodia( signal [, binResolution [, filterIterations [, frameSize [, guessUnvoiced [, harmonicWeight [, hopSize [, magnitudeCompression [, magnitudeThreshold [, maxFrequency [, minDuration [, minFrequency [, numberHarmonics [, peakDistributionThreshold [, peakFrameThreshold [, pitchContinuity [, referenceFrequency [, sampleRate [, timeContinuity [, voiceVibrato [, voicingTolerance ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm estimates the fundamental frequency of the predominant melody from polyphonic music signals using the MELODIA algorithm. It is specifically suited for music with a predominent melodic element, for example the singing voice melody in an accompanied singing recording. The approach [1] is based on the creation and characterization of pitch contours, time continuous sequences of pitch candidates grouped using auditory streaming cues. It furthermore determines for each frame, if the predominant melody is present or not. To this end, PitchSalienceFunction, PitchSalienceFunctionPeaks, PitchContours, and PitchContoursMelody algorithms are employed. It is strongly advised to use the default parameter values which are optimized according to [1] (where further details are provided) except for minFrequency, maxFrequency, and voicingTolerance, which will depend on your application. Check https://essentia.upf.edu/reference/std_PredominantPitchMelodia.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input signal
binResolution
number <optional> 10 salience function bin resolution [cents]
filterIterations
number <optional> 3 number of iterations for the octave errors / pitch outlier filtering process
frameSize
number <optional> 2048 the frame size for computing pitch salience
guessUnvoiced
boolean <optional> false estimate pitch for non-voiced segments by using non-salient contours when no salient ones are present in a frame
harmonicWeight
number <optional> 0.8 harmonic weighting parameter (weight decay ratio between two consequent harmonics, =1 for no decay)
hopSize
number <optional> 128 the hop size with which the pitch salience function was computed
magnitudeCompression
number <optional> 1 magnitude compression parameter for the salience function (=0 for maximum compression, =1 for no compression)
magnitudeThreshold
number <optional> 40 spectral peak magnitude threshold (maximum allowed difference from the highest peak in dBs)
maxFrequency
number <optional> 20000 the minimum allowed frequency for salience function peaks (ignore contours with peaks above) [Hz]
minDuration
number <optional> 100 the minimum allowed contour duration [ms]
minFrequency
number <optional> 80 the minimum allowed frequency for salience function peaks (ignore contours with peaks below) [Hz]
numberHarmonics
number <optional> 20 number of considered harmonics
peakDistributionThreshold
number <optional> 0.9 allowed deviation below the peak salience mean over all frames (fraction of the standard deviation)
peakFrameThreshold
number <optional> 0.9 per-frame salience threshold factor (fraction of the highest peak salience in a frame)
pitchContinuity
number <optional> 27.5625 pitch continuity cue (maximum allowed pitch change during 1 ms time period) [cents]
referenceFrequency
number <optional> 55 the reference frequency for Hertz to cent conversion [Hz], corresponding to the 0th cent bin
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
timeContinuity
number <optional> 100 time continuity cue (the maximum allowed gap duration for a pitch contour) [ms]
voiceVibrato
boolean <optional> false detect voice vibrato
voicingTolerance
number <optional> 0.2 allowed deviation below the average contour mean salience of all contours (fraction of the standard deviation)
Returns
Details
-
RMS( array ) → {object}
-
Description
This algorithm computes the root mean square (quadratic mean) of an array. RMS is not defined for empty arrays. In such case, an exception will be thrown . References: [1] Root mean square - Wikipedia, the free encyclopedia, http://en.wikipedia.org/wiki/Root_mean_square Check https://essentia.upf.edu/reference/std_RMS.html for more details.
Parameters
Name Type Description array
VectorFloat the input array
Returns
Details
-
RawMoments( array [, range ] ) → {object}
-
Description
This algorithm computes the first 5 raw moments of an array. The output array is of size 6 because the zero-ith moment is used for padding so that the first moment corresponds to index 1. Check https://essentia.upf.edu/reference/std_RawMoments.html for more details.
Parameters
Name Type Attributes Default Description array
VectorFloat the input array
range
number <optional> 22050 the range of the input array, used for normalizing the results
Returns
Details
-
ReplayGain( signal [, sampleRate ] ) → {object}
-
Description
This algorithm computes the Replay Gain loudness value of an audio signal. The algorithm is described in detail in [1]. The value returned is the 'standard' ReplayGain value, not the value with 6dB preamplification as computed by lame, mp3gain, vorbisgain, and all widely used ReplayGain programs. Check https://essentia.upf.edu/reference/std_ReplayGain.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input audio signal (must be longer than 0.05ms)
sampleRate
number <optional> 44100 the sampling rate of the input audio signal [Hz]
Returns
Details
-
Resample( signal [, inputSampleRate [, outputSampleRate [, quality ] ] ] ) → {object}
-
Description
This algorithm resamples the input signal to the desired sampling rate. Check https://essentia.upf.edu/reference/std_Resample.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input signal
inputSampleRate
number <optional> 44100 the sampling rate of the input signal [Hz]
outputSampleRate
number <optional> 44100 the sampling rate of the output signal [Hz]
quality
number <optional> 1 the quality of the conversion, 0 for best quality
Returns
Details
-
ResampleFFT( input [, inSize [, outSize ] ] ) → {object}
-
Description
This algorithm resamples a sequence using FFT / IFFT. The input and output sizes must be an even number. (It is meant to be eqivalent to the resample function in Numpy). Check https://essentia.upf.edu/reference/std_ResampleFFT.html for more details.
Parameters
Name Type Attributes Default Description input
VectorFloat input array
inSize
number <optional> 128 the size of the input sequence. It needss to be even-sized.
outSize
number <optional> 128 the size of the output sequence. It needss to be even-sized.
Returns
Details
-
RhythmDescriptors( signal ) → {object}
-
Description
This algorithm computes rhythm features (bpm, beat positions, beat histogram peaks) for an audio signal. It combines RhythmExtractor2013 for beat tracking and BPM estimation with BpmHistogramDescriptors algorithms. Check https://essentia.upf.edu/reference/std_RhythmDescriptors.html for more details.
Parameters
Name Type Description signal
VectorFloat the audio input signal
Returns
Details
-
RhythmExtractor( signal [, frameHop [, frameSize [, hopSize [, lastBeatInterval [, maxTempo [, minTempo [, numberFrames [, sampleRate [, tempoHints [, tolerance [, useBands [, useOnset ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm estimates the tempo in bpm and beat positions given an audio signal. The algorithm combines several periodicity functions and estimates beats using TempoTap and TempoTapTicks. It combines: - onset detection functions based on high-frequency content (see OnsetDetection) - complex-domain spectral difference function (see OnsetDetection) - periodicity function based on energy bands (see FrequencyBands, TempoScaleBands) Check https://essentia.upf.edu/reference/std_RhythmExtractor.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the audio input signal
frameHop
number <optional> 1024 the number of feature frames separating two evaluations
frameSize
number <optional> 1024 the number audio samples used to compute a feature
hopSize
number <optional> 256 the number of audio samples per features
lastBeatInterval
number <optional> 0.1 the minimum interval between last beat and end of file [s]
maxTempo
number <optional> 208 the fastest tempo to detect [bpm]
minTempo
number <optional> 40 the slowest tempo to detect [bpm]
numberFrames
number <optional> 1024 the number of feature frames to buffer on
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
tempoHints
Array.<any> <optional> [] the optional list of initial beat locations, to favor the detection of pre-determined tempo period and beats alignment [s]
tolerance
number <optional> 0.24 the minimum interval between two consecutive beats [s]
useBands
boolean <optional> true whether or not to use band energy as periodicity function
useOnset
boolean <optional> true whether or not to use onsets as periodicity function
Returns
Details
-
RhythmExtractor2013( signal [, maxTempo [, method [, minTempo ] ] ] ) → {object}
-
Description
This algorithm extracts the beat positions and estimates their confidence as well as tempo in bpm for an audio signal. The beat locations can be computed using: - 'multifeature', the BeatTrackerMultiFeature algorithm - 'degara', the BeatTrackerDegara algorithm (note that there is no confidence estimation for this method, the output confidence value is always 0) Check https://essentia.upf.edu/reference/std_RhythmExtractor2013.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the audio input signal
maxTempo
number <optional> 208 the fastest tempo to detect [bpm]
method
string <optional> multifeature the method used for beat tracking
minTempo
number <optional> 40 the slowest tempo to detect [bpm]
Returns
Details
-
RhythmTransform( melBands [, frameSize [, hopSize ] ] ) → {object}
-
Description
This algorithm implements the rhythm transform. It computes a tempogram, a representation of rhythmic periodicities in the input signal in the rhythm domain, by using FFT similarly to computation of spectrum in the frequency domain [1]. Additional features, including rhythmic centroid and a rhythmic counterpart of MFCCs, can be derived from this rhythmic representation. Check https://essentia.upf.edu/reference/std_RhythmTransform.html for more details.
Parameters
Name Type Attributes Default Description melBands
VectorVectorFloat the energies in the mel bands
frameSize
number <optional> 256 the frame size to compute the rhythm trasform
hopSize
number <optional> 32 the hop size to compute the rhythm transform
Returns
Details
-
RollOff( spectrum [, cutoff [, sampleRate ] ] ) → {object}
-
Description
This algorithm computes the roll-off frequency of a spectrum. The roll-off frequency is defined as the frequency under which some percentage (cutoff) of the total energy of the spectrum is contained. The roll-off frequency can be used to distinguish between harmonic (below roll-off) and noisy sounds (above roll-off). Check https://essentia.upf.edu/reference/std_RollOff.html for more details.
Parameters
Name Type Attributes Default Description spectrum
VectorFloat the input audio spectrum (must have more than one elements)
cutoff
number <optional> 0.85 the ratio of total energy to attain before yielding the roll-off frequency
sampleRate
number <optional> 44100 the sampling rate of the audio signal (used to normalize rollOff) [Hz]
Returns
Details
-
SNR( frame [, MAAlpha [, MMSEAlpha [, NoiseAlpha [, frameSize [, noiseThreshold [, sampleRate [, useBroadbadNoiseCorrection ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes the SNR of the input audio in a frame-wise manner. The algorithm assumes that: 1. The noise is gaussian. 2. There is a region of noise (without signal) at the beginning of the stream in order to estimate the PSD of the noise.[1] Once the noise PSD is estimated, the algorithm relies on the Ephraim-Malah [2] recursion to estimate the SNR for each frequency bin. The algorithm also returns an overall (a single value for the whole spectrum) SNR estimation and an averaged overall SNR estimation using Exponential Moving Average filtering. This algorithm throws a Warning if less than 15 frames are used to estimte the noise PSD. Check https://essentia.upf.edu/reference/std_SNR.html for more details.
Parameters
Name Type Attributes Default Description frame
VectorFloat the input audio frame
MAAlpha
number <optional> 0.95 Alpha coefficient for the EMA SNR estimation [2]
MMSEAlpha
number <optional> 0.98 Alpha coefficient for the MMSE estimation [1].
NoiseAlpha
number <optional> 0.9 Alpha coefficient for the EMA noise estimation [2]
frameSize
number <optional> 512 the size of the input frame
noiseThreshold
number <optional> -40 Threshold to detect frames without signal
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
useBroadbadNoiseCorrection
boolean <optional> true flag to apply the -10 * log10(BW) broadband noise correction factor
Returns
Details
-
SaturationDetector( frame [, differentialThreshold [, energyThreshold [, frameSize [, hopSize [, minimumDuration [, sampleRate ] ] ] ] ] ] ) → {object}
-
Description
this algorithm outputs the staring/ending locations of the saturated regions in seconds. Saturated regions are found by means of a tripe criterion: 1. samples in a saturated region should have more energy than a given threshold. 2. the difference between the samples in a saturated region should be smaller than a given threshold. 3. the duration of the saturated region should be longer than a given threshold. Check https://essentia.upf.edu/reference/std_SaturationDetector.html for more details.
Parameters
Name Type Attributes Default Description frame
VectorFloat the input audio frame
differentialThreshold
number <optional> 0.001 minimum difference between contiguous samples of the salturated regions
energyThreshold
number <optional> -1 mininimum energy of the samples in the saturated regions [dB]
frameSize
number <optional> 512 expected input frame size
hopSize
number <optional> 256 hop size used for the analysis
minimumDuration
number <optional> 0.005 minimum duration of the saturated regions [ms]
sampleRate
number <optional> 44100 sample rate used for the analysis
Returns
Details
-
Scale( signal [, clipping [, factor [, maxAbsValue ] ] ] ) → {object}
-
Description
This algorithm scales the audio by the specified factor using clipping if required. Check https://essentia.upf.edu/reference/std_Scale.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input audio signal
clipping
boolean <optional> true boolean flag whether to apply clipping or not
factor
number <optional> 10 the multiplication factor by which the audio will be scaled
maxAbsValue
number <optional> 1 the maximum value above which to apply clipping
Returns
Details
-
SineSubtraction( frame, magnitudes, frequencies, phases [, fftSize [, hopSize [, sampleRate ] ] ] ) → {object}
-
Description
This algorithm subtracts the sinusoids computed with the sine model analysis from an input audio signal. It ouputs an audio signal. Check https://essentia.upf.edu/reference/std_SineSubtraction.html for more details.
Parameters
Name Type Attributes Default Description frame
VectorFloat the input audio frame to subtract from
magnitudes
VectorFloat the magnitudes of the sinusoidal peaks
frequencies
VectorFloat the frequencies of the sinusoidal peaks [Hz]
phases
VectorFloat the phases of the sinusoidal peaks
fftSize
number <optional> 512 the size of the FFT internal process (full spectrum size) and output frame. Minimum twice the hopsize.
hopSize
number <optional> 128 the hop size between frames
sampleRate
number <optional> 44100 the audio sampling rate [Hz]
Returns
Details
-
SingleBeatLoudness( beat [, beatDuration [, beatWindowDuration [, frequencyBands [, onsetStart [, sampleRate ] ] ] ] ] ) → {object}
-
Description
This algorithm computes the spectrum energy of a single beat across the whole frequency range and on each specified frequency band given an audio segment. It detects the onset of the beat within the input segment, computes spectrum on a window starting on this onset, and estimates energy (see Energy and EnergyBandRatio algorithms). The frequency bands used by default are: 0-200 Hz, 200-400 Hz, 400-800 Hz, 800-1600 Hz, 1600-3200 Hz, 3200-22000Hz, following E. Scheirer [1]. Check https://essentia.upf.edu/reference/std_SingleBeatLoudness.html for more details.
Parameters
Name Type Attributes Default Description beat
VectorFloat audio segement containing a beat
beatDuration
number <optional> 0.05 window size for the beat's energy computation (the window starts at the onset) [s]
beatWindowDuration
number <optional> 0.1 window size for the beat's onset detection [s]
frequencyBands
Array.<any> <optional> [0, 200, 400, 800, 1600, 3200, 22000] frequency bands
onsetStart
string <optional> sumEnergy criteria for finding the start of the beat
sampleRate
number <optional> 44100 the audio sampling rate [Hz]
Returns
Details
-
Slicer( audio [, endTimes [, sampleRate [, startTimes [, timeUnits ] ] ] ] ) → {object}
-
Description
This algorithm splits an audio signal into segments given their start and end times. Check https://essentia.upf.edu/reference/std_Slicer.html for more details.
Parameters
Name Type Attributes Default Description audio
VectorFloat the input audio signal
endTimes
Array.<any> <optional> [] the list of end times for the slices you want to extract
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
startTimes
Array.<any> <optional> [] the list of start times for the slices you want to extract
timeUnits
string <optional> seconds the units of time of the start and end times
Returns
Details
-
SpectralCentroidTime( array [, sampleRate ] ) → {object}
-
Description
This algorithm computes the spectral centroid of a signal in time domain. A first difference filter is applied to the input signal. Then the centroid is computed by dividing the norm of the resulting signal by the norm of the input signal. The centroid is given in hertz. References: [1] Udo Zölzer (2002). DAFX Digital Audio Effects pag.364-365 Check https://essentia.upf.edu/reference/std_SpectralCentroidTime.html for more details.
Parameters
Name Type Attributes Default Description array
VectorFloat the input array
sampleRate
number <optional> 44100 sampling rate of the input spectrum [Hz]
Returns
Details
-
SpectralComplexity( spectrum [, magnitudeThreshold [, sampleRate ] ] ) → {object}
-
Description
This algorithm computes the spectral complexity of a spectrum. The spectral complexity is based on the number of peaks in the input spectrum. Check https://essentia.upf.edu/reference/std_SpectralComplexity.html for more details.
Parameters
Name Type Attributes Default Description spectrum
VectorFloat the input spectrum
magnitudeThreshold
number <optional> 0.005 the minimum spectral-peak magnitude that contributes to spectral complexity
sampleRate
number <optional> 44100 the audio sampling rate [Hz]
Returns
Details
-
SpectralContrast( spectrum [, frameSize [, highFrequencyBound [, lowFrequencyBound [, neighbourRatio [, numberBands [, sampleRate [, staticDistribution ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes the Spectral Contrast feature of a spectrum. It is based on the Octave Based Spectral Contrast feature as described in [1]. The version implemented here is a modified version to improve discriminative power and robustness. The modifications are described in [2]. Check https://essentia.upf.edu/reference/std_SpectralContrast.html for more details.
Parameters
Name Type Attributes Default Description spectrum
VectorFloat the audio spectrum
frameSize
number <optional> 2048 the size of the fft frames
highFrequencyBound
number <optional> 11000 the upper bound of the highest band
lowFrequencyBound
number <optional> 20 the lower bound of the lowest band
neighbourRatio
number <optional> 0.4 the ratio of the bins in the sub band used to calculate the peak and valley
numberBands
number <optional> 6 the number of bands in the filter
sampleRate
number <optional> 22050 the sampling rate of the audio signal
staticDistribution
number <optional> 0.15 the ratio of the bins to distribute equally
Returns
Details
-
SpectralPeaks( spectrum [, magnitudeThreshold [, maxFrequency [, maxPeaks [, minFrequency [, orderBy [, sampleRate ] ] ] ] ] ] ) → {object}
-
Description
This algorithm extracts peaks from a spectrum. It is important to note that the peak algorithm is independent of an input that is linear or in dB, so one has to adapt the threshold to fit with the type of data fed to it. The algorithm relies on PeakDetection algorithm which is run with parabolic interpolation [1]. The exactness of the peak-searching depends heavily on the windowing type. It gives best results with dB input, a blackman-harris 92dB window and interpolation set to true. According to [1], spectral peak frequencies tend to be about twice as accurate when dB magnitude is used rather than just linear magnitude. For further information about the peak detection, see the description of the PeakDetection algorithm. Check https://essentia.upf.edu/reference/std_SpectralPeaks.html for more details.
Parameters
Name Type Attributes Default Description spectrum
VectorFloat the input spectrum
magnitudeThreshold
number <optional> 0 peaks below this given threshold are not outputted
maxFrequency
number <optional> 5000 the maximum frequency of the range to evaluate [Hz]
maxPeaks
number <optional> 100 the maximum number of returned peaks
minFrequency
number <optional> 0 the minimum frequency of the range to evaluate [Hz]
orderBy
string <optional> frequency the ordering type of the outputted peaks (ascending by frequency or descending by magnitude)
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
SpectralWhitening( spectrum, frequencies, magnitudes [, maxFrequency [, sampleRate ] ] ) → {object}
-
Description
Performs spectral whitening of spectral peaks of a spectrum. The algorithm works in dB scale, but the conversion is done by the algorithm so input should be in linear scale. The concept of 'whitening' refers to 'white noise' or a non-zero flat spectrum. It first computes a spectral envelope similar to the 'true envelope' in [1], and then modifies the amplitude of each peak relative to the envelope. For example, the predominant peaks will have a value close to 0dB because they are very close to the envelope. On the other hand, minor peaks between significant peaks will have lower amplitudes such as -30dB. Check https://essentia.upf.edu/reference/std_SpectralWhitening.html for more details.
Parameters
Name Type Attributes Default Description spectrum
VectorFloat the audio linear spectrum
frequencies
VectorFloat the spectral peaks' linear frequencies
magnitudes
VectorFloat the spectral peaks' linear magnitudes
maxFrequency
number <optional> 5000 max frequency to apply whitening to [Hz]
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
Spectrum( frame [, size ] ) → {object}
-
Description
This algorithm computes the magnitude spectrum of an array of Reals. The resulting magnitude spectrum has a size which is half the size of the input array plus one. Bins contain raw (linear) magnitude values. Check https://essentia.upf.edu/reference/std_Spectrum.html for more details.
Parameters
Name Type Attributes Default Description frame
VectorFloat the input audio frame
size
number <optional> 2048 the expected size of the input audio signal (this is an optional parameter to optimize memory allocation)
Returns
Details
-
SpectrumCQ( frame [, binsPerOctave [, minFrequency [, minimumKernelSize [, numberBins [, sampleRate [, scale [, threshold [, windowType [, zeroPhase ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes the magnitude of the Constant-Q spectrum. See ConstantQ algorithm for more details. Check https://essentia.upf.edu/reference/std_SpectrumCQ.html for more details.
Parameters
Name Type Attributes Default Description frame
VectorFloat the input audio frame
binsPerOctave
number <optional> 12 number of bins per octave
minFrequency
number <optional> 32.7 minimum frequency [Hz]
minimumKernelSize
number <optional> 4 minimum size allowed for frequency kernels
numberBins
number <optional> 84 number of frequency bins, starting at minFrequency
sampleRate
number <optional> 44100 FFT sampling rate [Hz]
scale
number <optional> 1 filters scale. Larger values use longer windows
threshold
number <optional> 0.01 bins whose magnitude is below this quantile are discarded
windowType
string <optional> hann the window type, which can be 'hamming', 'hann', 'triangular', 'square' or 'blackmanharrisXX'
zeroPhase
boolean <optional> true a boolean value that enables zero-phase windowing. Input audio frames should be windowed with the same phase mode
Returns
Details
-
SpectrumToCent( spectrum [, bands [, centBinResolution [, inputSize [, log [, minimumFrequency [, normalize [, sampleRate [, type ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes energy in triangular frequency bands of a spectrum equally spaced on the cent scale. Each band is computed to have a constant wideness in the cent scale. For each band the power-spectrum (mag-squared) is summed. Check https://essentia.upf.edu/reference/std_SpectrumToCent.html for more details.
Parameters
Name Type Attributes Default Description spectrum
VectorFloat the input spectrum (must be greater than size one)
bands
number <optional> 720 number of bins to compute. Default is 720 (6 octaves with the default 'centBinResolution')
centBinResolution
number <optional> 10 Width of each band in cents. Default is 10 cents
inputSize
number <optional> 32768 the size of the spectrum
log
boolean <optional> true compute log-energies (log10 (1 + energy))
minimumFrequency
number <optional> 164 central frequency of the first band of the bank [Hz]
normalize
string <optional> unit_sum use unit area or vertex equal to 1 triangles.
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
type
string <optional> power use magnitude or power spectrum
Returns
Details
-
Spline( x [, beta1 [, beta2 [, type [, xPoints [, yPoints ] ] ] ] ] ) → {object}
-
Description
Evaluates a piecewise spline of type b, beta or quadratic. The input value, i.e. the point at which the spline is to be evaluated typically should be between xPoins[0] and xPoinst[size-1]. If the value lies outside this range, extrapolation is used. Regarding spline types: - B: evaluates a cubic B spline approximant. - Beta: evaluates a cubic beta spline approximant. For beta splines parameters 'beta1' and 'beta2' can be supplied. For no bias set beta1 to 1 and for no tension set beta2 to 0. Note that if beta1=1 and beta2=0, the cubic beta becomes a cubic B spline. On the other hand if beta1=1 and beta2 is large the beta spline turns into a linear spline. - Quadratic: evaluates a piecewise quadratic spline at a point. Note that size of input must be odd. Check https://essentia.upf.edu/reference/std_Spline.html for more details.
Parameters
Name Type Attributes Default Description x
number the input coordinate (x-axis)
beta1
number <optional> 1 the skew or bias parameter (only available for type beta)
beta2
number <optional> 0 the tension parameter
type
string <optional> b the type of spline to be computed
xPoints
Array.<any> <optional> [0, 1] the x-coordinates where data is specified (the points must be arranged in ascending order and cannot contain duplicates)
yPoints
Array.<any> <optional> [0, 1] the y-coordinates to be interpolated (i.e. the known data)
Returns
Details
-
SprModelAnal( frame [, fftSize [, freqDevOffset [, freqDevSlope [, hopSize [, magnitudeThreshold [, maxFrequency [, maxPeaks [, maxnSines [, minFrequency [, orderBy [, sampleRate ] ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes the sinusoidal plus residual model analysis. Check https://essentia.upf.edu/reference/std_SprModelAnal.html for more details.
Parameters
Name Type Attributes Default Description frame
VectorFloat the input frame
fftSize
number <optional> 2048 the size of the internal FFT size (full spectrum size)
freqDevOffset
number <optional> 20 minimum frequency deviation at 0Hz
freqDevSlope
number <optional> 0.01 slope increase of minimum frequency deviation
hopSize
number <optional> 512 the hop size between frames
magnitudeThreshold
number <optional> 0 peaks below this given threshold are not outputted
maxFrequency
number <optional> 5000 the maximum frequency of the range to evaluate [Hz]
maxPeaks
number <optional> 100 the maximum number of returned peaks
maxnSines
number <optional> 100 maximum number of sines per frame
minFrequency
number <optional> 0 the minimum frequency of the range to evaluate [Hz]
orderBy
string <optional> frequency the ordering type of the outputted peaks (ascending by frequency or descending by magnitude)
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
SprModelSynth( magnitudes, frequencies, phases, res [, fftSize [, hopSize [, sampleRate ] ] ] ) → {object}
-
Description
This algorithm computes the sinusoidal plus residual model synthesis from SPS model analysis. Check https://essentia.upf.edu/reference/std_SprModelSynth.html for more details.
Parameters
Name Type Attributes Default Description magnitudes
VectorFloat the magnitudes of the sinusoidal peaks
frequencies
VectorFloat the frequencies of the sinusoidal peaks [Hz]
phases
VectorFloat the phases of the sinusoidal peaks
res
VectorFloat the residual frame
fftSize
number <optional> 2048 the size of the output FFT frame (full spectrum size)
hopSize
number <optional> 512 the hop size between frames
sampleRate
number <optional> 44100 the audio sampling rate [Hz]
Returns
Details
-
SpsModelAnal( frame [, fftSize [, freqDevOffset [, freqDevSlope [, hopSize [, magnitudeThreshold [, maxFrequency [, maxPeaks [, maxnSines [, minFrequency [, orderBy [, sampleRate [, stocf ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes the stochastic model analysis. Check https://essentia.upf.edu/reference/std_SpsModelAnal.html for more details.
Parameters
Name Type Attributes Default Description frame
VectorFloat the input frame
fftSize
number <optional> 2048 the size of the internal FFT size (full spectrum size)
freqDevOffset
number <optional> 20 minimum frequency deviation at 0Hz
freqDevSlope
number <optional> 0.01 slope increase of minimum frequency deviation
hopSize
number <optional> 512 the hop size between frames
magnitudeThreshold
number <optional> 0 peaks below this given threshold are not outputted
maxFrequency
number <optional> 5000 the maximum frequency of the range to evaluate [Hz]
maxPeaks
number <optional> 100 the maximum number of returned peaks
maxnSines
number <optional> 100 maximum number of sines per frame
minFrequency
number <optional> 0 the minimum frequency of the range to evaluate [Hz]
orderBy
string <optional> frequency the ordering type of the outputted peaks (ascending by frequency or descending by magnitude)
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
stocf
number <optional> 0.2 decimation factor used for the stochastic approximation
Returns
Details
-
SpsModelSynth( magnitudes, frequencies, phases, stocenv [, fftSize [, hopSize [, sampleRate [, stocf ] ] ] ] ) → {object}
-
Description
This algorithm computes the sinusoidal plus stochastic model synthesis from SPS model analysis. Check https://essentia.upf.edu/reference/std_SpsModelSynth.html for more details.
Parameters
Name Type Attributes Default Description magnitudes
VectorFloat the magnitudes of the sinusoidal peaks
frequencies
VectorFloat the frequencies of the sinusoidal peaks [Hz]
phases
VectorFloat the phases of the sinusoidal peaks
stocenv
VectorFloat the stochastic envelope
fftSize
number <optional> 2048 the size of the output FFT frame (full spectrum size)
hopSize
number <optional> 512 the hop size between frames
sampleRate
number <optional> 44100 the audio sampling rate [Hz]
stocf
number <optional> 0.2 decimation factor used for the stochastic approximation
Returns
Details
-
StartStopCut( audio [, frameSize [, hopSize [, maximumStartTime [, maximumStopTime [, sampleRate [, threshold ] ] ] ] ] ] ) → {object}
-
Description
This algorithm outputs if there is a cut at the beginning or at the end of the audio by locating the first and last non-silent frames and comparing their positions to the actual beginning and end of the audio. The input audio is considered to be cut at the beginning (or the end) and the corresponding flag is activated if the first (last) non-silent frame occurs before (after) the configurable time threshold. Check https://essentia.upf.edu/reference/std_StartStopCut.html for more details.
Parameters
Name Type Attributes Default Description audio
VectorFloat the input audio
frameSize
number <optional> 256 the frame size for the internal power analysis
hopSize
number <optional> 256 the hop size for the internal power analysis
maximumStartTime
number <optional> 10 if the first non-silent frame occurs before maximumStartTime startCut is activated [ms]
maximumStopTime
number <optional> 10 if the last non-silent frame occurs after maximumStopTime to the end stopCut is activated [ms]
sampleRate
number <optional> 44100 the sample rate
threshold
number <optional> -60 the threshold below which average energy is defined as silence [dB]
Returns
Details
-
StartStopSilence( frame [, threshold ] ) → {object}
-
Description
This algorithm outputs the frame at which sound begins and the frame at which sound ends. Check https://essentia.upf.edu/reference/std_StartStopSilence.html for more details.
Parameters
Name Type Attributes Default Description frame
VectorFloat the input audio frames
threshold
number <optional> -60 the threshold below which average energy is defined as silence [dB]
Returns
Details
-
StochasticModelAnal( frame [, fftSize [, hopSize [, sampleRate [, stocf ] ] ] ] ) → {object}
-
Description
This algorithm computes the stochastic model analysis. It gets the resampled spectral envelope of the stochastic component. Check https://essentia.upf.edu/reference/std_StochasticModelAnal.html for more details.
Parameters
Name Type Attributes Default Description frame
VectorFloat the input frame
fftSize
number <optional> 2048 the size of the internal FFT size (full spectrum size)
hopSize
number <optional> 512 the hop size between frames
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
stocf
number <optional> 0.2 decimation factor used for the stochastic approximation
Returns
Details
-
StochasticModelSynth( stocenv [, fftSize [, hopSize [, sampleRate [, stocf ] ] ] ] ) → {object}
-
Description
This algorithm computes the stochastic model synthesis. It generates the noisy spectrum from a resampled spectral envelope of the stochastic component. Check https://essentia.upf.edu/reference/std_StochasticModelSynth.html for more details.
Parameters
Name Type Attributes Default Description stocenv
VectorFloat the stochastic envelope input
fftSize
number <optional> 2048 the size of the internal FFT size (full spectrum size)
hopSize
number <optional> 512 the hop size between frames
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
stocf
number <optional> 0.2 decimation factor used for the stochastic approximation
Returns
Details
-
StrongDecay( signal [, sampleRate ] ) → {object}
-
Description
This algorithm computes the Strong Decay of an audio signal. The Strong Decay is built from the non-linear combination of the signal energy and the signal temporal centroid, the latter being the balance of the absolute value of the signal. A signal containing a temporal centroid near its start boundary and a strong energy is said to have a strong decay. Check https://essentia.upf.edu/reference/std_StrongDecay.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input audio signal
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
StrongPeak( spectrum ) → {object}
-
Description
This algorithm computes the Strong Peak of a spectrum. The Strong Peak is defined as the ratio between the spectrum's maximum peak's magnitude and the "bandwidth" of the peak above a threshold (half its amplitude). This ratio reveals whether the spectrum presents a very "pronounced" maximum peak (i.e. the thinner and the higher the maximum of the spectrum is, the higher the ratio value). Check https://essentia.upf.edu/reference/std_StrongPeak.html for more details.
Parameters
Name Type Description spectrum
VectorFloat the input spectrum (must be greater than one element and cannot contain negative values)
Returns
Details
-
SuperFluxExtractor( signal [, combine [, frameSize [, hopSize [, ratioThreshold [, sampleRate [, threshold ] ] ] ] ] ] ) → {object}
-
Description
This algorithm detects onsets given an audio signal using SuperFlux algorithm. This implementation is based on the available reference implementation in python [2]. The algorithm computes spectrum of the input signal, summarizes it into triangular band energies, and computes a onset detection function based on spectral flux tracking spectral trajectories with a maximum filter (SuperFluxNovelty). The peaks of the function are then detected (SuperFluxPeaks). Check https://essentia.upf.edu/reference/std_SuperFluxExtractor.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the audio input signal
combine
number <optional> 20 time threshold for double onsets detections (ms)
frameSize
number <optional> 2048 the frame size for computing low-level features
hopSize
number <optional> 256 the hop size for computing low-level features
ratioThreshold
number <optional> 16 ratio threshold for peak picking with respect to novelty_signal/novelty_average rate, use 0 to disable it (for low-energy onsets)
sampleRate
number <optional> 44100 the audio sampling rate [Hz]
threshold
number <optional> 0.05 threshold for peak peaking with respect to the difference between novelty_signal and average_signal (for onsets in ambient noise)
Returns
Details
-
SuperFluxNovelty( bands [, binWidth [, frameWidth ] ] ) → {object}
-
Description
Onset detection function for Superflux algorithm. See SuperFluxExtractor for more details. Check https://essentia.upf.edu/reference/std_SuperFluxNovelty.html for more details.
Parameters
Name Type Attributes Default Description bands
VectorVectorFloat the input bands spectrogram
binWidth
number <optional> 3 filter width (number of frequency bins)
frameWidth
number <optional> 2 differentiation offset (compute the difference with the N-th previous frame)
Returns
Details
-
SuperFluxPeaks( novelty [, combine [, frameRate [, pre_avg [, pre_max [, ratioThreshold [, threshold ] ] ] ] ] ] ) → {object}
-
Description
This algorithm detects peaks of an onset detection function computed by the SuperFluxNovelty algorithm. See SuperFluxExtractor for more details. Check https://essentia.upf.edu/reference/std_SuperFluxPeaks.html for more details.
Parameters
Name Type Attributes Default Description novelty
VectorFloat the input onset detection function
combine
number <optional> 30 time threshold for double onsets detections (ms)
frameRate
number <optional> 172 frameRate
pre_avg
number <optional> 100 look back duration for moving average filter [ms]
pre_max
number <optional> 30 look back duration for moving maximum filter [ms]
ratioThreshold
number <optional> 16 ratio threshold for peak picking with respect to novelty_signal/novelty_average rate, use 0 to disable it (for low-energy onsets)
threshold
number <optional> 0.05 threshold for peak peaking with respect to the difference between novelty_signal and average_signal (for onsets in ambient noise)
Returns
Details
-
TCToTotal( envelope ) → {object}
-
Description
This algorithm calculates the ratio of the temporal centroid to the total length of a signal envelope. This ratio shows how the sound is 'balanced'. Its value is close to 0 if most of the energy lies at the beginning of the sound (e.g. decrescendo or impulsive sounds), close to 0.5 if the sound is symetric (e.g. 'delta unvarying' sounds), and close to 1 if most of the energy lies at the end of the sound (e.g. crescendo sounds). Check https://essentia.upf.edu/reference/std_TCToTotal.html for more details.
Parameters
Name Type Description envelope
VectorFloat the envelope of the signal (its length must be greater than 1
Returns
Details
-
TempoScaleBands( bands [, bandsGain [, frameTime ] ] ) → {object}
-
Description
This algorithm computes features for tempo tracking to be used with the TempoTap algorithm. See standard_rhythmextractor_tempotap in examples folder. Check https://essentia.upf.edu/reference/std_TempoScaleBands.html for more details.
Parameters
Name Type Attributes Default Description bands
VectorFloat the audio power spectrum divided into bands
bandsGain
Array.<any> <optional> [2, 3, 2, 1, 1.20000004768, 2, 3, 2.5] gain for each bands
frameTime
number <optional> 512 the frame rate in samples
Returns
Details
-
TempoTap( featuresFrame [, frameHop [, frameSize [, maxTempo [, minTempo [, numberFrames [, sampleRate [, tempoHints ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm estimates the periods and phases of a periodic signal, represented by a sequence of values of any number of detection functions, such as energy bands, onsets locations, etc. It requires to be sequentially run on a vector of such values ("featuresFrame") for each particular audio frame in order to get estimations related to that frames. The estimations are done for each detection function separately, utilizing the latest "frameHop" frames, including the present one, to compute autocorrelation. Empty estimations will be returned until enough frames are accumulated in the algorithm's buffer. The algorithm uses elements of the following beat-tracking methods: - BeatIt, elaborated by Fabien Gouyon and Simon Dixon (input features) [1] - Multi-comb filter with Rayleigh weighting, Mathew Davies [2] Check https://essentia.upf.edu/reference/std_TempoTap.html for more details.
Parameters
Name Type Attributes Default Description featuresFrame
VectorFloat input temporal features of a frame
frameHop
number <optional> 1024 number of feature frames separating two evaluations
frameSize
number <optional> 256 number of audio samples in a frame
maxTempo
number <optional> 208 fastest tempo allowed to be detected [bpm]
minTempo
number <optional> 40 slowest tempo allowed to be detected [bpm]
numberFrames
number <optional> 1024 number of feature frames to buffer on
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
tempoHints
Array.<any> <optional> [] optional list of initial beat locations, to favor the detection of pre-determined tempo period and beats alignment [s]
Returns
Details
-
TempoTapDegara( onsetDetections [, maxTempo [, minTempo [, resample [, sampleRateODF ] ] ] ] ) → {object}
-
Description
This algorithm estimates beat positions given an onset detection function. The detection function is partitioned into 6-second frames with a 1.5-second increment, and the autocorrelation is computed for each frame, and is weighted by a tempo preference curve [2]. Periodicity estimations are done frame-wisely, searching for the best match with the Viterbi algorith [3]. The estimated periods are then passed to the probabilistic beat tracking algorithm [1], which computes beat positions. Check https://essentia.upf.edu/reference/std_TempoTapDegara.html for more details.
Parameters
Name Type Attributes Default Description onsetDetections
VectorFloat the input frame-wise vector of onset detection values
maxTempo
number <optional> 208 fastest tempo allowed to be detected [bpm]
minTempo
number <optional> 40 slowest tempo allowed to be detected [bpm]
resample
string <optional> none use upsampling of the onset detection function (may increase accuracy)
sampleRateODF
number <optional> 86.1328 the sampling rate of the onset detection function [Hz]
Returns
Details
-
TempoTapMaxAgreement( tickCandidates ) → {object}
-
Description
This algorithm outputs beat positions and confidence of their estimation based on the maximum mutual agreement between beat candidates estimated by different beat trackers (or using different features). Check https://essentia.upf.edu/reference/std_TempoTapMaxAgreement.html for more details.
Parameters
Name Type Description tickCandidates
VectorVectorFloat the tick candidates estimated using different beat trackers (or features) [s]
Returns
Details
-
TempoTapTicks( periods, phases [, frameHop [, hopSize [, sampleRate ] ] ] ) → {object}
-
Description
This algorithm builds the list of ticks from the period and phase candidates given by the TempoTap algorithm. Check https://essentia.upf.edu/reference/std_TempoTapTicks.html for more details.
Parameters
Name Type Attributes Default Description periods
VectorFloat tempo period candidates for the current frame, in frames
phases
VectorFloat tempo ticks phase candidates for the current frame, in frames
frameHop
number <optional> 512 number of feature frames separating two evaluations
hopSize
number <optional> 256 number of audio samples per features
sampleRate
number <optional> 44100 sampling rate of the audio signal [Hz]
Returns
Details
-
TensorflowInputMusiCNN( frame ) → {object}
-
Description
This algorithm computes mel-bands with a particular parametrization specific to MusiCNN based models. Check https://essentia.upf.edu/reference/std_TensorflowInputMusiCNN.html for more details.
Parameters
Name Type Description frame
VectorFloat the audio frame
Returns
Details
-
TensorflowInputVGGish( frame ) → {object}
-
Description
This algorithm computes mel-bands with a particular parametrization specific to VGGish based models. Check https://essentia.upf.edu/reference/std_TensorflowInputVGGish.html for more details.
Parameters
Name Type Description frame
VectorFloat the audio frame
Returns
Details
-
TonalExtractor( signal [, frameSize [, hopSize [, tuningFrequency ] ] ] ) → {object}
-
Description
This algorithm computes tonal features for an audio signal Check https://essentia.upf.edu/reference/std_TonalExtractor.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the audio input signal
frameSize
number <optional> 4096 the framesize for computing tonal features
hopSize
number <optional> 2048 the hopsize for computing tonal features
tuningFrequency
number <optional> 440 the tuning frequency of the input signal
Returns
Details
-
TonicIndianArtMusic( signal [, binResolution [, frameSize [, harmonicWeight [, hopSize [, magnitudeCompression [, magnitudeThreshold [, maxTonicFrequency [, minTonicFrequency [, numberHarmonics [, numberSaliencePeaks [, referenceFrequency [, sampleRate ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm estimates the tonic frequency of the lead artist in Indian art music. It uses multipitch representation of the audio signal (pitch salience) to compute a histogram using which the tonic is identified as one of its peak. The decision is made based on the distance between the prominent peaks, the classification is done using a decision tree. Check https://essentia.upf.edu/reference/std_TonicIndianArtMusic.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input signal
binResolution
number <optional> 10 salience function bin resolution [cents]
frameSize
number <optional> 2048 the frame size for computing pitch saliecnce
harmonicWeight
number <optional> 0.85 harmonic weighting parameter (weight decay ratio between two consequent harmonics, =1 for no decay)
hopSize
number <optional> 512 the hop size with which the pitch salience function was computed
magnitudeCompression
number <optional> 1 magnitude compression parameter (=0 for maximum compression, =1 for no compression)
magnitudeThreshold
number <optional> 40 peak magnitude threshold (maximum allowed difference from the highest peak in dBs)
maxTonicFrequency
number <optional> 375 the maximum allowed tonic frequency [Hz]
minTonicFrequency
number <optional> 100 the minimum allowed tonic frequency [Hz]
numberHarmonics
number <optional> 20 number of considered hamonics
numberSaliencePeaks
number <optional> 5 number of top peaks of the salience function which should be considered for constructing histogram
referenceFrequency
number <optional> 55 the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
TriangularBands( spectrum [, frequencyBands [, inputSize [, log [, normalize [, sampleRate [, type [, weighting ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes energy in triangular frequency bands of a spectrum. The arbitrary number of overlapping bands can be specified. For each band the power-spectrum (mag-squared) is summed. Check https://essentia.upf.edu/reference/std_TriangularBands.html for more details.
Parameters
Name Type Attributes Default Description spectrum
VectorFloat the input spectrum (must be greater than size one)
frequencyBands
Array.<any> <optional> [21.533203125, 43.06640625, 64.599609375, 86.1328125, 107.666015625, 129.19921875, 150.732421875, 172.265625, 193.798828125, 215.33203125, 236.865234375, 258.3984375, 279.931640625, 301.46484375, 322.998046875, 344.53125, 366.064453125, 387.59765625, 409.130859375, 430.6640625, 452.197265625, 473.73046875, 495.263671875, 516.796875, 538.330078125, 559.86328125, 581.396484375, 602.9296875, 624.462890625, 645.99609375, 667.529296875, 689.0625, 710.595703125, 732.12890625, 753.662109375, 775.1953125, 796.728515625, 839.794921875, 861.328125, 882.861328125, 904.39453125, 925.927734375, 968.994140625, 990.52734375, 1012.06054688, 1055.12695312, 1076.66015625, 1098.19335938, 1141.25976562, 1184.32617188, 1205.859375, 1248.92578125, 1270.45898438, 1313.52539062, 1356.59179688, 1399.65820312, 1442.72460938, 1485.79101562, 1528.85742188, 1571.92382812, 1614.99023438, 1658.05664062, 1701.12304688, 1765.72265625, 1808.7890625, 1873.38867188, 1916.45507812, 1981.0546875, 2024.12109375, 2088.72070312, 2153.3203125, 2217.91992188, 2282.51953125, 2347.11914062, 2411.71875, 2497.8515625, 2562.45117188, 2627.05078125, 2713.18359375, 2799.31640625, 2885.44921875, 2950.04882812, 3036.18164062, 3143.84765625, 3229.98046875, 3316.11328125, 3423.77929688, 3509.91210938, 3617.578125, 3725.24414062, 3832.91015625, 3940.57617188, 4069.77539062, 4177.44140625, 4306.640625, 4435.83984375, 4565.0390625, 4694.23828125, 4844.97070312, 4974.16992188, 5124.90234375, 5275.63476562, 5426.3671875, 5577.09960938, 5749.36523438, 5921.63085938, 6093.89648438, 6266.16210938, 6459.9609375, 6653.75976562, 6847.55859375, 7041.35742188, 7256.68945312, 7450.48828125, 7687.35351562, 7902.68554688, 8139.55078125, 8376.41601562, 8613.28125, 8871.6796875, 9130.078125, 9388.4765625, 9668.40820312, 9948.33984375, 10249.8046875, 10551.2695312, 10852.734375, 11175.7324219, 11498.7304688, 11843.2617188, 12187.7929688, 12553.8574219, 12919.921875, 13285.9863281, 13673.5839844, 14082.7148438, 14491.8457031, 14922.5097656, 15353.1738281, 15805.3710938, 16257.5683594] list of frequency ranges into which the spectrum is divided (these must be in ascending order and connot contain duplicates),each triangle is build as x(i-1)=0, x(i)=1, x(i+1)=0 over i, the resulting number of bands is size of input array - 2
inputSize
number <optional> 1025 the size of the spectrum
log
boolean <optional> true compute log-energies (log10 (1 + energy))
normalize
string <optional> unit_sum spectrum bin weights to use for each triangular band: 'unit_max' to make each triangle vertex equal to 1, 'unit_sum' to make each triangle area equal to 1 summing the actual weights of spectrum bins, 'unit_area' to make each triangle area equal to 1 normalizing the weights of each triangle by its bandwidth
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
type
string <optional> power use magnitude or power spectrum
weighting
string <optional> linear type of weighting function for determining triangle area
Returns
Details
-
TriangularBarkBands( spectrum [, highFrequencyBound [, inputSize [, log [, lowFrequencyBound [, normalize [, numberBands [, sampleRate [, type [, weighting ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes energy in the bark bands of a spectrum. It is different to the regular BarkBands algorithm in that is more configurable so that it can be used in the BFCC algorithm to produce output similar to Rastamat (http://www.ee.columbia.edu/ln/rosa/matlab/rastamat/) See the BFCC algorithm documentation for more information as to why you might want to choose this over Mel frequency analysis It is recommended that the input "spectrum" be calculated by the Spectrum algorithm. Check https://essentia.upf.edu/reference/std_TriangularBarkBands.html for more details.
Parameters
Name Type Attributes Default Description spectrum
VectorFloat the audio spectrum
highFrequencyBound
number <optional> 22050 an upper-bound limit for the frequencies to be included in the bands
inputSize
number <optional> 1025 the size of the spectrum
log
boolean <optional> false compute log-energies (log10 (1 + energy))
lowFrequencyBound
number <optional> 0 a lower-bound limit for the frequencies to be included in the bands
normalize
string <optional> unit_sum 'unit_max' makes the vertex of all the triangles equal to 1, 'unit_sum' makes the area of all the triangles equal to 1
numberBands
number <optional> 24 the number of output bands
sampleRate
number <optional> 44100 the sample rate
type
string <optional> power 'power' to output squared units, 'magnitude' to keep it as the input
weighting
string <optional> warping type of weighting function for determining triangle area
Returns
Details
-
Trimmer( signal [, checkRange [, endTime [, sampleRate [, startTime ] ] ] ] ) → {object}
-
Description
This algorithm extracts a segment of an audio signal given its start and end times. Giving "startTime" greater than "endTime" will raise an exception. Check https://essentia.upf.edu/reference/std_Trimmer.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input signal
checkRange
boolean <optional> false check whether the specified time range for a slice fits the size of input signal (throw exception if not)
endTime
number <optional> 1e+06 the end time of the slice you want to extract [s]
sampleRate
number <optional> 44100 the sampling rate of the input audio signal [Hz]
startTime
number <optional> 0 the start time of the slice you want to extract [s]
Returns
Details
-
Tristimulus( frequencies, magnitudes ) → {object}
-
Description
This algorithm calculates the tristimulus of a signal given its harmonic peaks. The tristimulus has been introduced as a timbre equivalent to the color attributes in the vision. Tristimulus measures the mixture of harmonics in a given sound, grouped into three sections. The first tristimulus measures the relative weight of the first harmonic; the second tristimulus measures the relative weight of the second, third, and fourth harmonics taken together; and the third tristimulus measures the relative weight of all the remaining harmonics. Check https://essentia.upf.edu/reference/std_Tristimulus.html for more details.
Parameters
Name Type Description frequencies
VectorFloat the frequencies of the harmonic peaks ordered by frequency
magnitudes
VectorFloat the magnitudes of the harmonic peaks ordered by frequency
Returns
Details
-
TruePeakDetector( signal [, blockDC [, emphasise [, oversamplingFactor [, quality [, sampleRate [, threshold [, version ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm implements a “true-peak” level meter for clipping detection. According to the ITU-R recommendations, “true-peak” values overcoming the full-scale range are potential sources of “clipping in subsequent processes, such as within particular D/A converters or during sample-rate conversion”. The ITU-R BS.1770-4[1] (by default) and the ITU-R BS.1770-2[2] signal-flows can be used. Go to the references for information about the differences. Only the peaks (if any) exceeding the configurable amplitude threshold are returned. Note: the parameters 'blockDC' and 'emphasise' work only when 'version' is set to 2. References: [1] Series, B. S. (2011). Recommendation ITU-R BS.1770-4. Algorithms to measure audio programme loudness and true-peak audio level, https://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.1770-4-201510-I!!PDF-E.pdf [2] Series, B. S. (2011). Recommendation ITU-R BS.1770-2. Algorithms to measure audio programme loudness and true-peak audio level, https://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.1770-2-201103-S!!PDF-E.pdf Check https://essentia.upf.edu/reference/std_TruePeakDetector.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input audio signal
blockDC
boolean <optional> false flag to activate the optional DC blocker
emphasise
boolean <optional> false flag to activate the optional emphasis filter
oversamplingFactor
number <optional> 4 times the signal is oversapled
quality
number <optional> 1 type of interpolation applied (see libresmple)
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
threshold
number <optional> -0.0002 threshold to detect peaks [dB]
version
number <optional> 4 algorithm version
Returns
Details
-
TuningFrequency( frequencies, magnitudes [, resolution ] ) → {object}
-
Description
This algorithm estimates the tuning frequency give a sequence/set of spectral peaks. The result is the tuning frequency in Hz, and its distance from 440Hz in cents. This version is slightly adapted from the original algorithm [1], but gives the same results. Check https://essentia.upf.edu/reference/std_TuningFrequency.html for more details.
Parameters
Name Type Attributes Default Description frequencies
VectorFloat the frequencies of the spectral peaks [Hz]
magnitudes
VectorFloat the magnitudes of the spectral peaks
resolution
number <optional> 1 resolution in cents (logarithmic scale, 100 cents = 1 semitone) for tuning frequency determination
Returns
Details
-
TuningFrequencyExtractor( signal [, frameSize [, hopSize ] ] ) → {object}
-
Description
This algorithm extracts the tuning frequency of an audio signal Check https://essentia.upf.edu/reference/std_TuningFrequencyExtractor.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the audio input signal
frameSize
number <optional> 4096 the frameSize for computing tuning frequency
hopSize
number <optional> 2048 the hopsize for computing tuning frequency
Returns
Details
-
UnaryOperator( array [, scale [, shift [, type ] ] ] ) → {object}
-
Description
This algorithm performs basic arithmetical operations element by element given an array. Note: - log and ln are equivalent to the natural logarithm - for log, ln, log10 and lin2db, x is clipped to 1e-30 for x<1e-30 - for x<0, sqrt(x) is invalid - scale and shift parameters define linear transformation to be applied to the resulting elements Check https://essentia.upf.edu/reference/std_UnaryOperator.html for more details.
Parameters
Name Type Attributes Default Description array
VectorFloat the input array
scale
number <optional> 1 multiply result by factor
shift
number <optional> 0 shift result by value (add value)
type
string <optional> identity the type of the unary operator to apply to input array
Returns
Details
-
UnaryOperatorStream( array [, scale [, shift [, type ] ] ] ) → {object}
-
Description
This algorithm performs basic arithmetical operations element by element given an array. Note: - log and ln are equivalent to the natural logarithm - for log, ln, log10 and lin2db, x is clipped to 1e-30 for x<1e-30 - for x<0, sqrt(x) is invalid - scale and shift parameters define linear transformation to be applied to the resulting elements Check https://essentia.upf.edu/reference/std_UnaryOperatorStream.html for more details.
Parameters
Name Type Attributes Default Description array
VectorFloat the input array
scale
number <optional> 1 multiply result by factor
shift
number <optional> 0 shift result by value (add value)
type
string <optional> identity the type of the unary operator to apply to input array
Returns
Details
-
Variance( array ) → {object}
-
Description
This algorithm computes the variance of an array. Check https://essentia.upf.edu/reference/std_Variance.html for more details.
Parameters
Name Type Description array
VectorFloat the input array
Returns
Details
-
Vibrato( pitch [, maxExtend [, maxFrequency [, minExtend [, minFrequency [, sampleRate ] ] ] ] ] ) → {object}
-
Description
This algorithm detects the presence of vibrato and estimates its parameters given a pitch contour [Hz]. The result is the vibrato frequency in Hz and the extent (peak to peak) in cents. If no vibrato is detected in a frame, the output of both values is zero. Check https://essentia.upf.edu/reference/std_Vibrato.html for more details.
Parameters
Name Type Attributes Default Description pitch
VectorFloat the pitch trajectory [Hz].
maxExtend
number <optional> 250 maximum considered vibrato extent [cents]
maxFrequency
number <optional> 8 maximum considered vibrato frequency [Hz]
minExtend
number <optional> 50 minimum considered vibrato extent [cents]
minFrequency
number <optional> 4 minimum considered vibrato frequency [Hz]
sampleRate
number <optional> 344.531 sample rate of the input pitch contour
Returns
Details
-
WarpedAutoCorrelation( array [, maxLag [, sampleRate ] ] ) → {object}
-
Description
This algorithm computes the warped auto-correlation of an audio signal. The implementation is an adapted version of K. Schmidt's implementation of the matlab algorithm from the 'warped toolbox' by Aki Harma and Matti Karjalainen found [2]. For a detailed explanation of the algorithm, see [1]. This algorithm is only defined for positive lambda = 1.0674sqrt(2.0atan(0.00006583*sampleRate)/PI) - 0.1916, thus it will throw an exception when the supplied sampling rate does not pass the requirements. If maxLag is larger than the size of the input array, an exception is thrown. Check https://essentia.upf.edu/reference/std_WarpedAutoCorrelation.html for more details.
Parameters
Name Type Attributes Default Description array
VectorFloat the array to be analyzed
maxLag
number <optional> 1 the maximum lag for which the auto-correlation is computed (inclusive) (must be smaller than signal size)
sampleRate
number <optional> 44100 the audio sampling rate [Hz]
Returns
Details
-
Welch( frame [, averagingFrames [, fftSize [, frameSize [, sampleRate [, scaling [, windowType ] ] ] ] ] ] ) → {object}
-
Description
This algorithm estimates the Power Spectral Density of the input signal using the Welch's method [1]. The input should be fed with the overlapped audio frames. The algorithm stores internally therequired past frames to compute each output. Call reset() to clear the buffers. This implentation is based on Scipy [2] Check https://essentia.upf.edu/reference/std_Welch.html for more details.
Parameters
Name Type Attributes Default Description frame
VectorFloat the input stereo audio signal
averagingFrames
number <optional> 10 amount of frames to average
fftSize
number <optional> 1024 size of the FFT. Zero padding is added if this is larger the input frame size.
frameSize
number <optional> 512 the expected size of the input audio signal (this is an optional parameter to optimize memory allocation)
sampleRate
number <optional> 44100 the sampling rate of the audio signal [Hz]
scaling
string <optional> density 'density' normalizes the result to the bandwidth while 'power' outputs the unnormalized power spectrum
windowType
string <optional> hann the window type, which can be 'hamming', 'hann', 'triangular', 'square' or 'blackmanharrisXX'
Returns
Details
-
Windowing( frame [, normalized [, size [, type [, zeroPadding [, zeroPhase ] ] ] ] ] ) → {object}
-
Description
This algorithm applies windowing to an audio signal. It optionally applies zero-phase windowing and optionally adds zero-padding. The resulting windowed frame size is equal to the incoming frame size plus the number of padded zeros. By default, the available windows are normalized (to have an area of 1) and then scaled by a factor of 2. Check https://essentia.upf.edu/reference/std_Windowing.html for more details.
Parameters
Name Type Attributes Default Description frame
VectorFloat the input audio frame
normalized
boolean <optional> true a boolean value to specify whether to normalize windows (to have an area of 1) and then scale by a factor of 2
size
number <optional> 1024 the window size
type
string <optional> hann the window type, which can be 'hamming', 'hann', 'triangular', 'square' or 'blackmanharrisXX'
zeroPadding
number <optional> 0 the size of the zero-padding
zeroPhase
boolean <optional> true a boolean value that enables zero-phase windowing
Returns
Details
-
ZeroCrossingRate( signal [, threshold ] ) → {object}
-
Description
This algorithm computes the zero-crossing rate of an audio signal. It is the number of sign changes between consecutive signal values divided by the total number of values. Noisy signals tend to have higher zero-crossing rate. In order to avoid small variations around zero caused by noise, a threshold around zero is given to consider a valid zerocrosing whenever the boundary is crossed. Check https://essentia.upf.edu/reference/std_ZeroCrossingRate.html for more details.
Parameters
Name Type Attributes Default Description signal
VectorFloat the input signal
threshold
number <optional> 0 the threshold which will be taken as the zero axis in both positive and negative sign
Returns
Details