Methods
-
<async> getAudioBufferFromURL( audioURL, webAudioCtx ) → {AudioBuffer}
-
Description
Decode and returns the audio buffer of a given audio url or blob uri using Web Audio API. (NOTE: This method doesn't works on Safari browser)
Parameters
Name Type Description audioURLstring web url or blob uri of a audio file
webAudioCtxAudioContext an instance of Web Audio API
AudioContextReturns
Details
-
<async> getAudioChannelDataFromURL( audioURL, webAudioCtx [, channel ] ) → {Float32Array}
-
Description
Decode and returns the audio channel data from an given audio url or blob uri using Web Audio API. (NOTE: This method doesn't works on Safari browser)
Parameters
Name Type Attributes Default Description audioURLstring web url or blob uri of a audio file
webAudioCtxAudioContext an instance of Web Audio API
AudioContextchannelnumber <optional> 0 audio channel number
Returns
Details
-
melSpectrumExtractor( audioFrame, sampleRate [, asVector [, config ] ] ) → {Array}
-
Description
Compute log-scaled mel spectrogram for a given audio signal frame along with an optional extractor profile configuration
Parameters
Name Type Attributes Default Description audioFrameFloat32Array a frame of decoded audio signal as Float32 typed array.
sampleRatenumber Sample rate of the input audio signal.
asVectorboolean <optional> false whether to output the spectrogram as a vector float type for chaining with other essentia algorithms.
config* <optional> this.profile Returns
Details
-
audioBufferToMonoSignal( buffer ) → {Float32Array}
-
Description
Convert an AudioBuffer object to a Mono audio signal array. The audio signal is downmixed to mono using essentia
MonoMixeralgorithm if the audio buffer has 2 channels of audio. Throws an expection if the input AudioBuffer object has more than 2 channels of audio.Parameters
Name Type Description bufferAudioBuffer AudioBufferobject decoded from an audio file.Returns
Details
-
shutdown()
-
Description
Method to shutdown essentia algorithm instance after it's use
Details
-
hpcpExtractor( audioFrame, sampleRate [, asVector [, config ] ] ) → {Array}
-
Description
Compute HPCP chroma feature for a given audio signal frame along with an optional extractor profile configuration
Parameters
Name Type Attributes Default Description audioFrameFloat32Array a decoded audio signal frame as Float32 typed array.
sampleRatenumber Sample rate of the input audio signal.
asVectorboolean <optional> false whether to output the hpcpgram as a vector float type for chaining with other essentia algorithms.
config* <optional> this.profile Returns
Details
-
reinstantiate()
-
Description
Method for re-instantiating essentia algorithms instance after using the shutdown method
Details
-
"delete"()
-
Description
Delete essentiajs class instance
Details
-
arrayToVector( inputArray ) → {VectorFloat}
-
Description
Convert an input JS array into VectorFloat type
Parameters
Name Type Description inputArrayFloat32Array input JS typed array
Returns
Details
-
vectorToArray( inputVector ) → {Float32Array}
-
Description
Convert an input VectorFloat array into typed JS Float32Array
Parameters
Name Type Description inputVectorVectorFloat input VectorFloat array
Returns
Details
-
FrameGenerator( inputAudioData [, frameSize [, hopSize ] ] ) → {VectorVectorFloat}
-
Description
Cuts an audio signal data into overlapping frames given frame size and hop size
Parameters
Name Type Attributes Default Description inputAudioDataFloat32Array a single channel audio channel data
frameSizenumber <optional> 2048 frame size for cutting the audio signal
hopSizenumber <optional> 1024 size of overlapping frame
Returns
Details
-
MonoMixer( leftChannel, rightChannel ) → {object}
-
Description
This algorithm downmixes the signal into a single channel given a stereo signal. It is a wrapper around https://essentia.upf.edu/reference/std_MonoMixer.html.
Parameters
Name Type Description leftChannelVectorFloat the left channel of the stereo audio signal
rightChannelVectorFloat the right channel of the stereo audio signal
Returns
Details
-
LoudnessEBUR128( leftChannel, rightChannel [, hopSize [, sampleRate [, startAtZero ] ] ] ) → {object}
-
Description
This algorithm computes the EBUR128 loudness descriptors of an audio signal. It is a wrapper around https://essentia.upf.edu/reference/std_LoudnessEBUR128.html.
Parameters
Name Type Attributes Default Description leftChannelVectorFloat the left channel of the stereo audio signal
rightChannelVectorFloat the right channel of the stereo audio signal
hopSizenumber <optional> 0.1 the hop size with which the loudness is computed [s]
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
startAtZeroboolean <optional> false start momentary/short-term loudness estimation at time 0 (zero-centered loudness estimation windows) if true; otherwise start both windows at time 0 (time positions for momentary and short-term values will not be syncronized)
Returns
Details
-
AfterMaxToBeforeMaxEnergyRatio( pitch ) → {object}
-
Description
This algorithm computes the ratio between the pitch energy after the pitch maximum and the pitch energy before the pitch maximum. Sounds having an monotonically ascending pitch or one unique pitch will show a value of (0,1], while sounds having a monotonically descending pitch will show a value of [1,inf). In case there is no energy before the max pitch, the algorithm will return the energy after the maximum pitch. Check https://essentia.upf.edu/reference/std_AfterMaxToBeforeMaxEnergyRatio.html for more details.
Parameters
Name Type Description pitchVectorFloat the array of pitch values [Hz]
Returns
Details
-
AllPass( signal [, bandwidth [, cutoffFrequency [, order [, sampleRate ] ] ] ] ) → {object}
-
Description
This algorithm implements a IIR all-pass filter of order 1 or 2. Because of its dependence on IIR, IIR's requirements are inherited. Check https://essentia.upf.edu/reference/std_AllPass.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input signal
bandwidthnumber <optional> 500 the bandwidth of the filter [Hz] (used only for 2nd-order filters)
cutoffFrequencynumber <optional> 1500 the cutoff frequency for the filter [Hz]
ordernumber <optional> 1 the order of the filter
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
AudioOnsetsMarker( signal [, onsets [, sampleRate [, type ] ] ] ) → {object}
-
Description
This algorithm creates a wave file in which a given audio signal is mixed with a series of time onsets. The sonification of the onsets can be heard as beeps, or as short white noise pulses if configured to do so. Check https://essentia.upf.edu/reference/std_AudioOnsetsMarker.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input signal
onsetsArray.<any> <optional> [] the list of onset locations [s]
sampleRatenumber <optional> 44100 the sampling rate of the output signal [Hz]
typestring <optional> beep the type of sound to be added on the event
Returns
Details
-
AutoCorrelation( array [, frequencyDomainCompression [, generalized [, normalization ] ] ] ) → {object}
-
Description
This algorithm computes the autocorrelation vector of a signal. It uses the version most commonly used in signal processing, which doesn't remove the mean from the observations. Using the 'generalized' option this algorithm computes autocorrelation as described in [3]. Check https://essentia.upf.edu/reference/std_AutoCorrelation.html for more details.
Parameters
Name Type Attributes Default Description arrayVectorFloat the array to be analyzed
frequencyDomainCompressionnumber <optional> 0.5 factor at which FFT magnitude is compressed (only used if 'generalized' is set to true, see [3])
generalizedboolean <optional> false bool value to indicate whether to compute the 'generalized' autocorrelation as described in [3]
normalizationstring <optional> standard type of normalization to compute: either 'standard' (default) or 'unbiased'
Returns
Details
-
BFCC( spectrum [, dctType [, highFrequencyBound [, inputSize [, liftering [, logType [, lowFrequencyBound [, normalize [, numberBands [, numberCoefficients [, sampleRate [, type [, weighting ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes the bark-frequency cepstrum coefficients of a spectrum. Bark bands and their subsequent usage in cepstral analysis have shown to be useful in percussive content [1, 2] This algorithm is implemented using the Bark scaling approach in the Rastamat version of the MFCC algorithm and in a similar manner to the MFCC-FB40 default specs: Check https://essentia.upf.edu/reference/std_BFCC.html for more details.
Parameters
Name Type Attributes Default Description spectrumVectorFloat the audio spectrum
dctTypenumber <optional> 2 the DCT type
highFrequencyBoundnumber <optional> 11000 the upper bound of the frequency range [Hz]
inputSizenumber <optional> 1025 the size of input spectrum
lifteringnumber <optional> 0 the liftering coefficient. Use '0' to bypass it
logTypestring <optional> dbamp logarithmic compression type. Use 'dbpow' if working with power and 'dbamp' if working with magnitudes
lowFrequencyBoundnumber <optional> 0 the lower bound of the frequency range [Hz]
normalizestring <optional> unit_sum 'unit_max' makes the vertex of all the triangles equal to 1, 'unit_sum' makes the area of all the triangles equal to 1
numberBandsnumber <optional> 40 the number of bark bands in the filter
numberCoefficientsnumber <optional> 13 the number of output cepstrum coefficients
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
typestring <optional> power use magnitude or power spectrum
weightingstring <optional> warping type of weighting function for determining triangle area
Returns
Details
-
BPF( x [, xPoints [, yPoints ] ] ) → {object}
-
Description
This algorithm implements a break point function which linearly interpolates between discrete xy-coordinates to construct a continuous function. Check https://essentia.upf.edu/reference/std_BPF.html for more details.
Parameters
Name Type Attributes Default Description xnumber the input coordinate (x-axis)
xPointsArray.<any> <optional> [0, 1] the x-coordinates of the points forming the break-point function (the points must be arranged in ascending order and cannot contain duplicates)
yPointsArray.<any> <optional> [0, 1] the y-coordinates of the points forming the break-point function
Returns
Details
-
BandPass( signal [, bandwidth [, cutoffFrequency [, sampleRate ] ] ] ) → {object}
-
Description
This algorithm implements a 2nd order IIR band-pass filter. Because of its dependence on IIR, IIR's requirements are inherited. Check https://essentia.upf.edu/reference/std_BandPass.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input audio signal
bandwidthnumber <optional> 500 the bandwidth of the filter [Hz]
cutoffFrequencynumber <optional> 1500 the cutoff frequency for the filter [Hz]
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
BandReject( signal [, bandwidth [, cutoffFrequency [, sampleRate ] ] ] ) → {object}
-
Description
This algorithm implements a 2nd order IIR band-reject filter. Because of its dependence on IIR, IIR's requirements are inherited. Check https://essentia.upf.edu/reference/std_BandReject.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input signal
bandwidthnumber <optional> 500 the bandwidth of the filter [Hz]
cutoffFrequencynumber <optional> 1500 the cutoff frequency for the filter [Hz]
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
BarkBands( spectrum [, numberBands [, sampleRate ] ] ) → {object}
-
Description
This algorithm computes energy in Bark bands of a spectrum. The band frequencies are: [0.0, 50.0, 100.0, 150.0, 200.0, 300.0, 400.0, 510.0, 630.0, 770.0, 920.0, 1080.0, 1270.0, 1480.0, 1720.0, 2000.0, 2320.0, 2700.0, 3150.0, 3700.0, 4400.0, 5300.0, 6400.0, 7700.0, 9500.0, 12000.0, 15500.0, 20500.0, 27000.0]. The first two Bark bands [0,100] and [100,200] have been split in half for better resolution (because of an observed better performance in beat detection). For each bark band the power-spectrum (mag-squared) is summed. Check https://essentia.upf.edu/reference/std_BarkBands.html for more details.
Parameters
Name Type Attributes Default Description spectrumVectorFloat the input spectrum
numberBandsnumber <optional> 27 the number of desired barkbands
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
BeatTrackerDegara( signal [, maxTempo [, minTempo ] ] ) → {object}
-
Description
This algorithm estimates the beat positions given an input signal. It computes 'complex spectral difference' onset detection function and utilizes the beat tracking algorithm (TempoTapDegara) to extract beats [1]. The algorithm works with the optimized settings of 2048/1024 frame/hop size for the computation of the detection function, with its posterior x2 resampling.) While it has a lower accuracy than BeatTrackerMultifeature (see the evaluation results in [2]), its computational speed is significantly higher, which makes reasonable to apply this algorithm for batch processings of large amounts of audio signals. Check https://essentia.upf.edu/reference/std_BeatTrackerDegara.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the audio input signal
maxTemponumber <optional> 208 the fastest tempo to detect [bpm]
minTemponumber <optional> 40 the slowest tempo to detect [bpm]
Returns
Details
-
BeatTrackerMultiFeature( signal [, maxTempo [, minTempo ] ] ) → {object}
-
Description
This algorithm estimates the beat positions given an input signal. It computes a number of onset detection functions and estimates beat location candidates from them using TempoTapDegara algorithm. Thereafter the best candidates are selected using TempoTapMaxAgreement. The employed detection functions, and the optimal frame/hop sizes used for their computation are: - complex spectral difference (see 'complex' method in OnsetDetection algorithm, 2048/1024 with posterior x2 upsample or the detection function) - energy flux (see 'rms' method in OnsetDetection algorithm, the same settings) - spectral flux in Mel-frequency bands (see 'melflux' method in OnsetDetection algorithm, the same settings) - beat emphasis function (see 'beat_emphasis' method in OnsetDetectionGlobal algorithm, 2048/512) - spectral flux between histogrammed spectrum frames, measured by the modified information gain (see 'infogain' method in OnsetDetectionGlobal algorithm, 2048/512) Check https://essentia.upf.edu/reference/std_BeatTrackerMultiFeature.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the audio input signal
maxTemponumber <optional> 208 the fastest tempo to detect [bpm]
minTemponumber <optional> 40 the slowest tempo to detect [bpm]
Returns
Details
-
Beatogram( loudness, loudnessBandRatio [, size ] ) → {object}
-
Description
This algorithm filters the loudness matrix given by BeatsLoudness algorithm in order to keep only the most salient beat band representation. This algorithm has been found to be useful for estimating time signatures. Check https://essentia.upf.edu/reference/std_Beatogram.html for more details.
Parameters
Name Type Attributes Default Description loudnessVectorFloat the loudness at each beat
loudnessBandRatioVectorVectorFloat matrix of loudness ratios at each band and beat
sizenumber <optional> 16 number of beats for dynamic filtering
Returns
Details
-
BeatsLoudness( signal [, beatDuration [, beatWindowDuration [, beats [, frequencyBands [, sampleRate ] ] ] ] ] ) → {object}
-
Description
This algorithm computes the spectrum energy of beats in an audio signal given their positions. The energy is computed both on the whole frequency range and for each of the specified frequency bands. See the SingleBeatLoudness algorithm for a more detailed explanation. Check https://essentia.upf.edu/reference/std_BeatsLoudness.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input audio signal
beatDurationnumber <optional> 0.05 the duration of the window in which the beat will be restricted [s]
beatWindowDurationnumber <optional> 0.1 the duration of the window in which to look for the beginning of the beat (centered around the positions in 'beats') [s]
beatsArray.<any> <optional> [] the list of beat positions (each position is in seconds)
frequencyBandsArray.<any> <optional> [20, 150, 400, 3200, 7000, 22000] the list of bands to compute energy ratios [Hz
sampleRatenumber <optional> 44100 the audio sampling rate [Hz]
Returns
Details
-
BinaryOperator( array1, array2 [, type ] ) → {object}
-
Description
This algorithm performs basic arithmetical operations element by element given two arrays. Note: - using this algorithm in streaming mode can cause diamond shape graphs which have not been tested with the current scheduler. There is NO GUARANTEE of its correct work for diamond shape graphs. - for y<0, x/y is invalid Check https://essentia.upf.edu/reference/std_BinaryOperator.html for more details.
Parameters
Name Type Attributes Default Description array1VectorFloat the first operand input array
array2VectorFloat the second operand input array
typestring <optional> add the type of the binary operator to apply to the input arrays
Returns
Details
-
BinaryOperatorStream( array1, array2 [, type ] ) → {object}
-
Description
This algorithm performs basic arithmetical operations element by element given two arrays. Note: - using this algorithm in streaming mode can cause diamond shape graphs which have not been tested with the current scheduler. There is NO GUARANTEE of its correct work for diamond shape graphs. - for y<0, x/y is invalid Check https://essentia.upf.edu/reference/std_BinaryOperatorStream.html for more details.
Parameters
Name Type Attributes Default Description array1VectorFloat the first operand input array
array2VectorFloat the second operand input array
typestring <optional> add the type of the binary operator to apply to the input arrays
Returns
Details
-
BpmHistogramDescriptors( bpmIntervals ) → {object}
-
Description
This algorithm computes beats per minute histogram and its statistics for the highest and second highest peak. Note: histogram vector contains occurance frequency for each bpm value, 0-th element corresponds to 0 bpm value. Check https://essentia.upf.edu/reference/std_BpmHistogramDescriptors.html for more details.
Parameters
Name Type Description bpmIntervalsVectorFloat the list of bpm intervals [s]
Returns
Details
-
BpmRubato( beats [, longRegionsPruningTime [, shortRegionsMergingTime [, tolerance ] ] ] ) → {object}
-
Description
This algorithm extracts the locations of large tempo changes from a list of beat ticks. Check https://essentia.upf.edu/reference/std_BpmRubato.html for more details.
Parameters
Name Type Attributes Default Description beatsVectorFloat list of detected beat ticks [s]
longRegionsPruningTimenumber <optional> 20 time for the longest constant tempo region inside a rubato region [s]
shortRegionsMergingTimenumber <optional> 4 time for the shortest constant tempo region from one tempo region to another [s]
tolerancenumber <optional> 0.08 minimum tempo deviation to look for
Returns
Details
-
CentralMoments( array [, mode [, range ] ] ) → {object}
-
Description
This algorithm extracts the 0th, 1st, 2nd, 3rd and 4th central moments of an array. It returns a 5-tuple in which the index corresponds to the order of the moment. Check https://essentia.upf.edu/reference/std_CentralMoments.html for more details.
Parameters
Name Type Attributes Default Description arrayVectorFloat the input array
modestring <optional> pdf compute central moments considering array values as a probability density function over array index or as sample points of a distribution
rangenumber <optional> 1 the range of the input array, used for normalizing the results in the 'pdf' mode
Returns
Details
-
Centroid( array [, range ] ) → {object}
-
Description
This algorithm computes the centroid of an array. The centroid is normalized to a specified range. This algorithm can be used to compute spectral centroid or temporal centroid. Check https://essentia.upf.edu/reference/std_Centroid.html for more details.
Parameters
Name Type Attributes Default Description arrayVectorFloat the input array
rangenumber <optional> 1 the range of the input array, used for normalizing the results
Returns
Details
-
ChordsDescriptors( chords, key, scale ) → {object}
-
Description
Given a chord progression this algorithm describes it by means of key, scale, histogram, and rate of change. Note: - chordsHistogram indexes follow the circle of fifths order, while being shifted to the input key and scale - key and scale are taken from the most frequent chord. In the case where multiple chords are equally frequent, the chord is hierarchically chosen from the circle of fifths. - chords should follow this name convention
<A-G>[<#/b><m>](i.e. C, C# or C#m are valid chords). Chord names not fitting this convention will throw an exception. Check https://essentia.upf.edu/reference/std_ChordsDescriptors.html for more details.Parameters
Name Type Description chordsVectorString the chord progression
keystring the key of the whole song, from A to G
scalestring the scale of the whole song (major or minor)
Returns
Details
-
ChordsDetection( pcp [, hopSize [, sampleRate [, windowSize ] ] ] ) → {object}
-
Description
This algorithm estimates chords given an input sequence of harmonic pitch class profiles (HPCPs). It finds the best matching major or minor triad and outputs the result as a string (e.g. A#, Bm, G#m, C). This algorithm uses the Sharp versions of each Flatted note (i.e. Bb -> A#). Check https://essentia.upf.edu/reference/std_ChordsDetection.html for more details.
Parameters
Name Type Attributes Default Description pcpVectorVectorFloat the pitch class profile from which to detect the chord
hopSizenumber <optional> 2048 the hop size with which the input PCPs were computed
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
windowSizenumber <optional> 2 the size of the window on which to estimate the chords [s]
Returns
Details
-
ChordsDetectionBeats( pcp, ticks [, chromaPick [, hopSize [, sampleRate ] ] ] ) → {object}
-
Description
This algorithm estimates chords using pitch profile classes on segments between beats. It is similar to ChordsDetection algorithm, but the chords are estimated on audio segments between each pair of consecutive beats. For each segment the estimation is done based on a chroma (HPCP) vector characterizing it, which can be computed by two methods: - 'interbeat_median', each resulting chroma vector component is a median of all the component values in the segment - 'starting_beat', chroma vector is sampled from the start of the segment (that is, its starting beat position) using its first frame. It makes sense if chroma is preliminary smoothed. Check https://essentia.upf.edu/reference/std_ChordsDetectionBeats.html for more details.
Parameters
Name Type Attributes Default Description pcpVectorVectorFloat the pitch class profile from which to detect the chord
ticksVectorFloat the list of beat positions (in seconds)
chromaPickstring <optional> interbeat_median method of calculating singleton chroma for interbeat interval
hopSizenumber <optional> 2048 the hop size with which the input PCPs were computed
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
ChromaCrossSimilarity( queryFeature, referenceFeature [, binarizePercentile [, frameStackSize [, frameStackStride [, noti [, oti [, otiBinary [, streaming ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes a binary cross similarity matrix from two chromagam feature vectors of a query and reference song. Check https://essentia.upf.edu/reference/std_ChromaCrossSimilarity.html for more details.
Parameters
Name Type Attributes Default Description queryFeatureVectorVectorFloat frame-wise chromagram of the query song (e.g., a HPCP)
referenceFeatureVectorVectorFloat frame-wise chromagram of the reference song (e.g., a HPCP)
binarizePercentilenumber <optional> 0.095 maximum percent of distance values to consider as similar in each row and each column
frameStackSizenumber <optional> 9 number of input frames to stack together and treat as a feature vector for similarity computation. Choose 'frameStackSize=1' to use the original input frames without stacking
frameStackStridenumber <optional> 1 stride size to form a stack of frames (e.g., 'frameStackStride'=1 to use consecutive frames; 'frameStackStride'=2 for using every second frame)
notinumber <optional> 12 number of circular shifts to be checked for Optimal Transposition Index [1]
otiboolean <optional> true whether to transpose the key of the reference song to the query song by Optimal Transposition Index [1]
otiBinaryboolean <optional> false whether to use the OTI-based chroma binary similarity method [3]
streamingboolean <optional> false whether to accumulate the input 'queryFeature' in the euclidean similarity matrix calculation on each compute() method call
Returns
Details
-
Chromagram( frame [, binsPerOctave [, minFrequency [, minimumKernelSize [, normalizeType [, numberBins [, sampleRate [, scale [, threshold [, windowType [, zeroPhase ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes the Constant-Q chromagram using FFT. See ConstantQ algorithm for more details. Check https://essentia.upf.edu/reference/std_Chromagram.html for more details.
Parameters
Name Type Attributes Default Description frameVectorFloat the input audio frame
binsPerOctavenumber <optional> 12 number of bins per octave
minFrequencynumber <optional> 32.7 minimum frequency [Hz]
minimumKernelSizenumber <optional> 4 minimum size allowed for frequency kernels
normalizeTypestring <optional> unit_max normalize type
numberBinsnumber <optional> 84 number of frequency bins, starting at minFrequency
sampleRatenumber <optional> 44100 FFT sampling rate [Hz]
scalenumber <optional> 1 filters scale. Larger values use longer windows
thresholdnumber <optional> 0.01 bins whose magnitude is below this quantile are discarded
windowTypestring <optional> hann the window type, which can be 'hamming', 'hann', 'triangular', 'square' or 'blackmanharrisXX'
zeroPhaseboolean <optional> true a boolean value that enables zero-phase windowing. Input audio frames should be windowed with the same phase mode
Returns
Details
-
ClickDetector( frame [, detectionThreshold [, frameSize [, hopSize [, order [, powerEstimationThreshold [, sampleRate [, silenceThreshold ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm detects the locations of impulsive noises (clicks and pops) on the input audio frame. It relies on LPC coefficients to inverse-filter the audio in order to attenuate the stationary part and enhance the prediction error (or excitation noise)[1]. After this, a matched filter is used to further enhance the impulsive peaks. The detection threshold is obtained from a robust estimate of the excitation noise power [2] plus a parametric gain value. Check https://essentia.upf.edu/reference/std_ClickDetector.html for more details.
Parameters
Name Type Attributes Default Description frameVectorFloat the input frame (must be non-empty)
detectionThresholdnumber <optional> 30 'detectionThreshold' the threshold is based on the instant power of the noisy excitation signal plus detectionThreshold dBs
frameSizenumber <optional> 512 the expected size of the input audio signal (this is an optional parameter to optimize memory allocation)
hopSizenumber <optional> 256 hop size used for the analysis. This parameter must be set correctly as it cannot be obtained from the input data
ordernumber <optional> 12 scalar giving the number of LPCs to use
powerEstimationThresholdnumber <optional> 10 the noisy excitation is clipped to 'powerEstimationThreshold' times its median.
sampleRatenumber <optional> 44100 sample rate used for the analysis
silenceThresholdnumber <optional> -50 threshold to skip silent frames
Returns
Details
-
Clipper( signal [, max [, min ] ] ) → {object}
-
Description
This algorithm clips the input signal to fit its values into a specified interval. Check https://essentia.upf.edu/reference/std_Clipper.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input signal
maxnumber <optional> 1 the maximum value above which the signal will be clipped
minnumber <optional> -1 the minimum value below which the signal will be clipped
Returns
Details
-
CoverSongSimilarity( inputArray [, alignmentType [, disExtension [, disOnset [, distanceType ] ] ] ] ) → {object}
-
Description
This algorithm computes a cover song similiarity measure from a binary cross similarity matrix input between two chroma vectors of a query and reference song using various alignment constraints of smith-waterman local-alignment algorithm. Check https://essentia.upf.edu/reference/std_CoverSongSimilarity.html for more details.
Parameters
Name Type Attributes Default Description inputArrayVectorVectorFloat a 2D binary cross-similarity matrix between two audio chroma vectors (query vs reference song) (refer 'ChromaCrossSimilarity' algorithm').
alignmentTypestring <optional> serra09 choose either one of the given local-alignment constraints for smith-waterman algorithm as described in [2] or [3] respectively.
disExtensionnumber <optional> 0.5 penalty for disruption extension
disOnsetnumber <optional> 0.5 penalty for disruption onset
distanceTypestring <optional> asymmetric choose the type of distance. By default the algorithm outputs a asymmetric disctance which is obtained by normalising the maximum score in the alignment score matrix with length of reference song
Returns
Details
-
Crest( array ) → {object}
-
Description
This algorithm computes the crest of an array. The crest is defined as the ratio between the maximum value and the arithmetic mean of an array. Typically it is used on the magnitude spectrum. Check https://essentia.upf.edu/reference/std_Crest.html for more details.
Parameters
Name Type Description arrayVectorFloat the input array (cannot contain negative values, and must be non-empty)
Returns
Details
-
CrossCorrelation( arrayX, arrayY [, maxLag [, minLag ] ] ) → {object}
-
Description
This algorithm computes the cross-correlation vector of two signals. It accepts 2 parameters, minLag and maxLag which define the range of the computation of the innerproduct. Check https://essentia.upf.edu/reference/std_CrossCorrelation.html for more details.
Parameters
Name Type Attributes Default Description arrayXVectorFloat the first input array
arrayYVectorFloat the second input array
maxLagnumber <optional> 1 the maximum lag to be computed between the two vectors
minLagnumber <optional> 0 the minimum lag to be computed between the two vectors
Returns
Details
-
CrossSimilarityMatrix( queryFeature, referenceFeature [, binarize [, binarizePercentile [, frameStackSize [, frameStackStride ] ] ] ] ) → {object}
-
Description
This algorithm computes a euclidean cross-similarity matrix of two sequences of frame features. Similarity values can be optionally binarized Check https://essentia.upf.edu/reference/std_CrossSimilarityMatrix.html for more details.
Parameters
Name Type Attributes Default Description queryFeatureVectorVectorFloat input frame features of the query song (e.g., a chromagram)
referenceFeatureVectorVectorFloat input frame features of the reference song (e.g., a chromagram)
binarizeboolean <optional> false whether to binarize the euclidean cross-similarity matrix
binarizePercentilenumber <optional> 0.095 maximum percent of distance values to consider as similar in each row and each column
frameStackSizenumber <optional> 1 number of input frames to stack together and treat as a feature vector for similarity computation. Choose 'frameStackSize=1' to use the original input frames without stacking
frameStackStridenumber <optional> 1 stride size to form a stack of frames (e.g., 'frameStackStride'=1 to use consecutive frames; 'frameStackStride'=2 for using every second frame)
Returns
Details
-
CubicSpline( x [, leftBoundaryFlag [, leftBoundaryValue [, rightBoundaryFlag [, rightBoundaryValue [, xPoints [, yPoints ] ] ] ] ] ] ) → {object}
-
Description
Computes the second derivatives of a piecewise cubic spline. The input value, i.e. the point at which the spline is to be evaluated typically should be between xPoints[0] and xPoints[size-1]. If the value lies outside this range, extrapolation is used. Regarding [left/right] boundary condition flag parameters: - 0: the cubic spline should be a quadratic over the first interval - 1: the first derivative at the [left/right] endpoint should be [left/right]BoundaryFlag - 2: the second derivative at the [left/right] endpoint should be [left/right]BoundaryFlag References: [1] Spline interpolation - Wikipedia, the free encyclopedia, http://en.wikipedia.org/wiki/Spline_interpolation Check https://essentia.upf.edu/reference/std_CubicSpline.html for more details.
Parameters
Name Type Attributes Default Description xnumber the input coordinate (x-axis)
leftBoundaryFlagnumber <optional> 0 type of boundary condition for the left boundary
leftBoundaryValuenumber <optional> 0 the value to be used in the left boundary, when leftBoundaryFlag is 1 or 2
rightBoundaryFlagnumber <optional> 0 type of boundary condition for the right boundary
rightBoundaryValuenumber <optional> 0 the value to be used in the right boundary, when rightBoundaryFlag is 1 or 2
xPointsArray.<any> <optional> [0, 1] the x-coordinates where data is specified (the points must be arranged in ascending order and cannot contain duplicates)
yPointsArray.<any> <optional> [0, 1] the y-coordinates to be interpolated (i.e. the known data)
Returns
Details
-
DCRemoval( signal [, cutoffFrequency [, sampleRate ] ] ) → {object}
-
Description
This algorithm removes the DC offset from a signal using a 1st order IIR highpass filter. Because of its dependence on IIR, IIR's requirements are inherited. Check https://essentia.upf.edu/reference/std_DCRemoval.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input audio signal
cutoffFrequencynumber <optional> 40 the cutoff frequency for the filter [Hz]
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
DCT( array [, dctType [, inputSize [, liftering [, outputSize ] ] ] ] ) → {object}
-
Description
This algorithm computes the Discrete Cosine Transform of an array. It uses the DCT-II form, with the 1/sqrt(2) scaling factor for the first coefficient. Check https://essentia.upf.edu/reference/std_DCT.html for more details.
Parameters
Name Type Attributes Default Description arrayVectorFloat the input array
dctTypenumber <optional> 2 the DCT type
inputSizenumber <optional> 10 the size of the input array
lifteringnumber <optional> 0 the liftering coefficient. Use '0' to bypass it
outputSizenumber <optional> 10 the number of output coefficients
Returns
Details
-
Danceability( signal [, maxTau [, minTau [, sampleRate [, tauMultiplier ] ] ] ] ) → {object}
-
Description
This algorithm estimates danceability of a given audio signal. The algorithm is derived from Detrended Fluctuation Analysis (DFA) described in [1]. The parameters minTau and maxTau are used to define the range of time over which DFA will be performed. The output of this algorithm is the danceability of the audio signal. These values usually range from 0 to 3 (higher values meaning more danceable). Check https://essentia.upf.edu/reference/std_Danceability.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input signal
maxTaunumber <optional> 8800 maximum segment length to consider [ms]
minTaunumber <optional> 310 minimum segment length to consider [ms]
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
tauMultipliernumber <optional> 1.1 multiplier to increment from min to max tau
Returns
Details
-
Decrease( array [, range ] ) → {object}
-
Description
This algorithm computes the decrease of an array defined as the linear regression coefficient. The range parameter is used to normalize the result. For a spectral centroid, the range should be equal to Nyquist and for an audio centroid the range should be equal to (audiosize - 1) / samplerate. The size of the input array must be at least two elements for "decrease" to be computed, otherwise an exception is thrown. References: [1] Least Squares Fitting -- from Wolfram MathWorld, http://mathworld.wolfram.com/LeastSquaresFitting.html Check https://essentia.upf.edu/reference/std_Decrease.html for more details.
Parameters
Name Type Attributes Default Description arrayVectorFloat the input array
rangenumber <optional> 1 the range of the input array, used for normalizing the results
Returns
Details
-
Derivative( signal ) → {object}
-
Description
This algorithm returns the first-order derivative of an input signal. That is, for each input value it returns the value minus the previous one. Check https://essentia.upf.edu/reference/std_Derivative.html for more details.
Parameters
Name Type Description signalVectorFloat the input signal
Returns
Details
-
DerivativeSFX( envelope ) → {object}
-
Description
This algorithm computes two descriptors that are based on the derivative of a signal envelope. Check https://essentia.upf.edu/reference/std_DerivativeSFX.html for more details.
Parameters
Name Type Description envelopeVectorFloat the envelope of the signal
Returns
Details
-
DiscontinuityDetector( frame [, detectionThreshold [, energyThreshold [, frameSize [, hopSize [, kernelSize [, order [, silenceThreshold [, subFrameSize ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm uses LPC and some heuristics to detect discontinuities in an audio signal. [1]. Check https://essentia.upf.edu/reference/std_DiscontinuityDetector.html for more details.
Parameters
Name Type Attributes Default Description frameVectorFloat the input frame (must be non-empty)
detectionThresholdnumber <optional> 8 'detectionThreshold' times the standard deviation plus the median of the frame is used as detection threshold
energyThresholdnumber <optional> -60 threshold in dB to detect silent subframes
frameSizenumber <optional> 512 the expected size of the input audio signal (this is an optional parameter to optimize memory allocation)
hopSizenumber <optional> 256 hop size used for the analysis. This parameter must be set correctly as it cannot be obtained from the input data
kernelSizenumber <optional> 7 scalar giving the size of the median filter window. Must be odd
ordernumber <optional> 3 scalar giving the number of LPCs to use
silenceThresholdnumber <optional> -50 threshold to skip silent frames
subFrameSizenumber <optional> 32 size of the window used to compute silent subframes
Returns
Details
-
Dissonance( frequencies, magnitudes ) → {object}
-
Description
This algorithm computes the sensory dissonance of an audio signal given its spectral peaks. Sensory dissonance (to be distinguished from musical or theoretical dissonance) measures perceptual roughness of the sound and is based on the roughness of its spectral peaks. Given the spectral peaks, the algorithm estimates total dissonance by summing up the normalized dissonance values for each pair of peaks. These values are computed using dissonance curves, which define dissonace between two spectral peaks according to their frequency and amplitude relations. The dissonance curves are based on perceptual experiments conducted in [1]. Exceptions are thrown when the size of the input vectors are not equal or if input frequencies are not ordered ascendantly References: [1] R. Plomp and W. J. M. Levelt, "Tonal Consonance and Critical Bandwidth," The Journal of the Acoustical Society of America, vol. 38, no. 4, pp. 548–560, 1965. Check https://essentia.upf.edu/reference/std_Dissonance.html for more details.
Parameters
Name Type Description frequenciesVectorFloat the frequencies of the spectral peaks (must be sorted by frequency)
magnitudesVectorFloat the magnitudes of the spectral peaks (must be sorted by frequency
Returns
Details
-
DistributionShape( centralMoments ) → {object}
-
Description
This algorithm computes the spread (variance), skewness and kurtosis of an array given its central moments. The extracted features are good indicators of the shape of the distribution. For the required input see CentralMoments algorithm. The size of the input array must be at least 5. An exception will be thrown otherwise. Check https://essentia.upf.edu/reference/std_DistributionShape.html for more details.
Parameters
Name Type Description centralMomentsVectorFloat the central moments of a distribution
Returns
Details
-
Duration( signal [, sampleRate ] ) → {object}
-
Description
This algorithm outputs the total duration of an audio signal. Check https://essentia.upf.edu/reference/std_Duration.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input signal
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
DynamicComplexity( signal [, frameSize [, sampleRate ] ] ) → {object}
-
Description
This algorithm computes the dynamic complexity defined as the average absolute deviation from the global loudness level estimate on the dB scale. It is related to the dynamic range and to the amount of fluctuation in loudness present in a recording. Silence at the beginning and at the end of a track are ignored in the computation in order not to deteriorate the results. Check https://essentia.upf.edu/reference/std_DynamicComplexity.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input audio signal
frameSizenumber <optional> 0.2 the frame size [s]
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
ERBBands( spectrum [, highFrequencyBound [, inputSize [, lowFrequencyBound [, numberBands [, sampleRate [, type [, width ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes energies/magnitudes in ERB bands of a spectrum. The Equivalent Rectangular Bandwidth (ERB) scale is used. The algorithm applies a frequency domain filterbank using gammatone filters. Adapted from matlab code in: D. P. W. Ellis (2009). 'Gammatone-like spectrograms', web resource [1]. Check https://essentia.upf.edu/reference/std_ERBBands.html for more details.
Parameters
Name Type Attributes Default Description spectrumVectorFloat the audio spectrum
highFrequencyBoundnumber <optional> 22050 an upper-bound limit for the frequencies to be included in the bands
inputSizenumber <optional> 1025 the size of the spectrum
lowFrequencyBoundnumber <optional> 50 a lower-bound limit for the frequencies to be included in the bands
numberBandsnumber <optional> 40 the number of output bands
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
typestring <optional> power use magnitude or power spectrum
widthnumber <optional> 1 filter width with respect to ERB
Returns
Details
-
EffectiveDuration( signal [, sampleRate [, thresholdRatio ] ] ) → {object}
-
Description
This algorithm computes the effective duration of an envelope signal. The effective duration is a measure of the time the signal is perceptually meaningful. This is approximated by the time the envelope is above or equal to a given threshold and is above the -90db noise floor. This measure allows to distinguish percussive sounds from sustained sounds but depends on the signal length. By default, this algorithm uses 40% of the envelope maximum as the threshold which is suited for short sounds. Note, that the 0% thresold corresponds to the duration of signal above -90db noise floor, while the 100% thresold corresponds to the number of times the envelope takes its maximum value. References: [1] G. Peeters, "A large set of audio features for sound description (similarity and classification) in the CUIDADO project," CUIDADO I.S.T. Project Report, 2004 Check https://essentia.upf.edu/reference/std_EffectiveDuration.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input signal
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
thresholdRationumber <optional> 0.4 the ratio of the envelope maximum to be used as the threshold
Returns
Details
-
Energy( array ) → {object}
-
Description
This algorithm computes the energy of an array. Check https://essentia.upf.edu/reference/std_Energy.html for more details.
Parameters
Name Type Description arrayVectorFloat the input array
Returns
Details
-
EnergyBand( spectrum [, sampleRate [, startCutoffFrequency [, stopCutoffFrequency ] ] ] ) → {object}
-
Description
This algorithm computes energy in a given frequency band of a spectrum including both start and stop cutoff frequencies. Note that exceptions will be thrown when input spectrum is empty and if startCutoffFrequency is greater than stopCutoffFrequency. Check https://essentia.upf.edu/reference/std_EnergyBand.html for more details.
Parameters
Name Type Attributes Default Description spectrumVectorFloat the input frequency spectrum
sampleRatenumber <optional> 44100 the audio sampling rate [Hz]
startCutoffFrequencynumber <optional> 0 the start frequency from which to sum the energy [Hz]
stopCutoffFrequencynumber <optional> 100 the stop frequency to which to sum the energy [Hz]
Returns
Details
-
EnergyBandRatio( spectrum [, sampleRate [, startFrequency [, stopFrequency ] ] ] ) → {object}
-
Description
This algorithm computes the ratio of the spectral energy in the range [startFrequency, stopFrequency] over the total energy. Check https://essentia.upf.edu/reference/std_EnergyBandRatio.html for more details.
Parameters
Name Type Attributes Default Description spectrumVectorFloat the input audio spectrum
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
startFrequencynumber <optional> 0 the frequency from which to start summing the energy [Hz]
stopFrequencynumber <optional> 100 the frequency up to which to sum the energy [Hz]
Returns
Details
-
Entropy( array ) → {object}
-
Description
This algorithm computes the Shannon entropy of an array. Entropy can be used to quantify the peakiness of a distribution. This has been used for voiced/unvoiced decision in automatic speech recognition. Check https://essentia.upf.edu/reference/std_Entropy.html for more details.
Parameters
Name Type Description arrayVectorFloat the input array (cannot contain negative values, and must be non-empty)
Returns
Details
-
Envelope( signal [, applyRectification [, attackTime [, releaseTime [, sampleRate ] ] ] ] ) → {object}
-
Description
This algorithm computes the envelope of a signal by applying a non-symmetric lowpass filter on a signal. By default it rectifies the signal, but that is optional. Check https://essentia.upf.edu/reference/std_Envelope.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input signal
applyRectificationboolean <optional> true whether to apply rectification (envelope based on the absolute value of signal)
attackTimenumber <optional> 10 the attack time of the first order lowpass in the attack phase [ms]
releaseTimenumber <optional> 1500 the release time of the first order lowpass in the release phase [ms]
sampleRatenumber <optional> 44100 the audio sampling rate [Hz]
Returns
Details
-
EqualLoudness( signal [, sampleRate ] ) → {object}
-
Description
This algorithm implements an equal-loudness filter. The human ear does not perceive sounds of all frequencies as having equal loudness, and to account for this, the signal is filtered by an inverted approximation of the equal-loudness curves. Technically, the filter is a cascade of a 10th order Yulewalk filter with a 2nd order Butterworth high pass filter. Check https://essentia.upf.edu/reference/std_EqualLoudness.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input signal
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
Flatness( array ) → {object}
-
Description
This algorithm computes the flatness of an array, which is defined as the ratio between the geometric mean and the arithmetic mean. Check https://essentia.upf.edu/reference/std_Flatness.html for more details.
Parameters
Name Type Description arrayVectorFloat the input array
Returns
Details
-
FlatnessDB( array ) → {object}
-
Description
This algorithm computes the flatness of an array, which is defined as the ratio between the geometric mean and the arithmetic mean converted to dB scale. Check https://essentia.upf.edu/reference/std_FlatnessDB.html for more details.
Parameters
Name Type Description arrayVectorFloat the input array
Returns
Details
-
FlatnessSFX( envelope ) → {object}
-
Description
This algorithm calculates the flatness coefficient of a signal envelope. Check https://essentia.upf.edu/reference/std_FlatnessSFX.html for more details.
Parameters
Name Type Description envelopeVectorFloat the envelope of the signal
Returns
Details
-
Flux( spectrum [, halfRectify [, norm ] ] ) → {object}
-
Description
This algorithm computes the spectral flux of a spectrum. Flux is defined as the L2-norm [1] or L1-norm [2] of the difference between two consecutive frames of the magnitude spectrum. The frames have to be of the same size in order to yield a meaningful result. The default L2-norm is used more commonly. Check https://essentia.upf.edu/reference/std_Flux.html for more details.
Parameters
Name Type Attributes Default Description spectrumVectorFloat the input spectrum
halfRectifyboolean <optional> false half-rectify the differences in each spectrum bin
normstring <optional> L2 the norm to use for difference computation
Returns
Details
-
FrameCutter( signal [, frameSize [, hopSize [, lastFrameToEndOfFile [, startFromZero [, validFrameThresholdRatio ] ] ] ] ] ) → {object}
-
Description
This algorithm slices the input buffer into frames. It returns a frame of a constant size and jumps a constant amount of samples forward in the buffer on every compute() call until no more frames can be extracted; empty frame vectors are returned afterwards. Incomplete frames (frames starting before the beginning of the input buffer or going past its end) are zero-padded or dropped according to the "validFrameThresholdRatio" parameter. Check https://essentia.upf.edu/reference/std_FrameCutter.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the buffer from which to read data
frameSizenumber <optional> 1024 the output frame size
hopSizenumber <optional> 512 the hop size between frames
lastFrameToEndOfFileboolean <optional> false whether the beginning of the last frame should reach the end of file. Only applicable if startFromZero is true
startFromZeroboolean <optional> false whether to start the first frame at time 0 (centered at frameSize/2) if true, or -frameSize/2 otherwise (zero-centered)
validFrameThresholdRationumber <optional> 0 frames smaller than this ratio will be discarded, those larger will be zero-padded to a full frame (i.e. a value of 0 will never discard frames and a value of 1 will only keep frames that are of length 'frameSize')
Returns
Details
-
FrameToReal( signal [, frameSize [, hopSize ] ] ) → {object}
-
Description
This algorithm converts a sequence of input audio signal frames into a sequence of audio samples. Check https://essentia.upf.edu/reference/std_FrameToReal.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input audio frame
frameSizenumber <optional> 2048 the frame size for computing the overlap-add process
hopSizenumber <optional> 128 the hop size with which the overlap-add function is computed
Returns
Details
-
FrequencyBands( spectrum [, frequencyBands [, sampleRate ] ] ) → {object}
-
Description
This algorithm computes energy in rectangular frequency bands of a spectrum. The bands are non-overlapping. For each band the power-spectrum (mag-squared) is summed. Check https://essentia.upf.edu/reference/std_FrequencyBands.html for more details.
Parameters
Name Type Attributes Default Description spectrumVectorFloat the input spectrum (must be greater than size one)
frequencyBandsArray.<any> <optional> [0, 50, 100, 150, 200, 300, 400, 510, 630, 770, 920, 1080, 1270, 1480, 1720, 2000, 2320, 2700, 3150, 3700, 4400, 5300, 6400, 7700, 9500, 12000, 15500, 20500, 27000] list of frequency ranges in to which the spectrum is divided (these must be in ascending order and connot contain duplicates)
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
GFCC( spectrum [, dctType [, highFrequencyBound [, inputSize [, logType [, lowFrequencyBound [, numberBands [, numberCoefficients [, sampleRate [, silenceThreshold [, type ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes the Gammatone-frequency cepstral coefficients of a spectrum. This is an equivalent of MFCCs, but using a gammatone filterbank (ERBBands) scaled on an Equivalent Rectangular Bandwidth (ERB) scale. Check https://essentia.upf.edu/reference/std_GFCC.html for more details.
Parameters
Name Type Attributes Default Description spectrumVectorFloat the audio spectrum
dctTypenumber <optional> 2 the DCT type
highFrequencyBoundnumber <optional> 22050 the upper bound of the frequency range [Hz]
inputSizenumber <optional> 1025 the size of input spectrum
logTypestring <optional> dbamp logarithmic compression type. Use 'dbpow' if working with power and 'dbamp' if working with magnitudes
lowFrequencyBoundnumber <optional> 40 the lower bound of the frequency range [Hz]
numberBandsnumber <optional> 40 the number of bands in the filter
numberCoefficientsnumber <optional> 13 the number of output cepstrum coefficients
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
silenceThresholdnumber <optional> 1e-10 silence threshold for computing log-energy bands
typestring <optional> power use magnitude or power spectrum
Returns
Details
-
GapsDetector( frame [, attackTime [, frameSize [, hopSize [, kernelSize [, maximumTime [, minimumTime [, postpowerTime [, prepowerThreshold [, prepowerTime [, releaseTime [, sampleRate [, silenceThreshold ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm uses energy and time thresholds to detect gaps in the waveform. A median filter is used to remove spurious silent samples. The power of a small audio region before the detected gaps (prepower) is thresholded to detect intentional pauses as described in [1]. This technique isextended to the region after the gap. The algorithm was designed for a framewise use and returns the start and end timestamps related to the first frame processed. Call configure() or reset() in order to restart the count. Check https://essentia.upf.edu/reference/std_GapsDetector.html for more details.
Parameters
Name Type Attributes Default Description frameVectorFloat the input frame (must be non-empty)
attackTimenumber <optional> 0.05 the attack time of the first order lowpass in the attack phase [ms]
frameSizenumber <optional> 2048 frame size used for the analysis. Should match the input frame size. Otherwise, an exception will be thrown
hopSizenumber <optional> 1024 hop size used for the analysis
kernelSizenumber <optional> 11 scalar giving the size of the median filter window. Must be odd
maximumTimenumber <optional> 3500 time of the maximum gap duration [ms]
minimumTimenumber <optional> 10 time of the minimum gap duration [ms]
postpowerTimenumber <optional> 40 time for the postpower calculation [ms]
prepowerThresholdnumber <optional> -30 prepower threshold [dB].
prepowerTimenumber <optional> 40 time for the prepower calculation [ms]
releaseTimenumber <optional> 0.05 the release time of the first order lowpass in the release phase [ms]
sampleRatenumber <optional> 44100 sample rate used for the analysis
silenceThresholdnumber <optional> -50 silence threshold [dB]
Returns
Details
-
GeometricMean( array ) → {object}
-
Description
This algorithm computes the geometric mean of an array of positive values. Check https://essentia.upf.edu/reference/std_GeometricMean.html for more details.
Parameters
Name Type Description arrayVectorFloat the input array
Returns
Details
-
HFC( spectrum [, sampleRate [, type ] ] ) → {object}
-
Description
This algorithm computes the High Frequency Content of a spectrum. It can be computed according to the following techniques: - 'Masri' (default) which does: sum |X(n)|^2*k, - 'Jensen' which does: sum |X(n)|*k^2 - 'Brossier' which does: sum |X(n)|*k Check https://essentia.upf.edu/reference/std_HFC.html for more details.
Parameters
Name Type Attributes Default Description spectrumVectorFloat the input audio spectrum
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
typestring <optional> Masri the type of HFC coefficient to be computed
Returns
Details
-
HPCP( frequencies, magnitudes [, bandPreset [, bandSplitFrequency [, harmonics [, maxFrequency [, maxShifted [, minFrequency [, nonLinear [, normalized [, referenceFrequency [, sampleRate [, size [, weightType [, windowSize ] ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
Computes a Harmonic Pitch Class Profile (HPCP) from the spectral peaks of a signal. HPCP is a k*12 dimensional vector which represents the intensities of the twelve (k==1) semitone pitch classes (corresponsing to notes from A to G#), or subdivisions of these (k>1). Check https://essentia.upf.edu/reference/std_HPCP.html for more details.
Parameters
Name Type Attributes Default Description frequenciesVectorFloat the frequencies of the spectral peaks [Hz]
magnitudesVectorFloat the magnitudes of the spectral peaks
bandPresetboolean <optional> true enables whether to use a band preset
bandSplitFrequencynumber <optional> 500 the split frequency for low and high bands, not used if bandPreset is false [Hz]
harmonicsnumber <optional> 0 number of harmonics for frequency contribution, 0 indicates exclusive fundamental frequency contribution
maxFrequencynumber <optional> 5000 the maximum frequency that contributes to the HPCP [Hz] (the difference between the max and split frequencies must not be less than 200.0 Hz)
maxShiftedboolean <optional> false whether to shift the HPCP vector so that the maximum peak is at index 0
minFrequencynumber <optional> 40 the minimum frequency that contributes to the HPCP [Hz] (the difference between the min and split frequencies must not be less than 200.0 Hz)
nonLinearboolean <optional> false apply non-linear post-processing to the output (use with normalized='unitMax'). Boosts values close to 1, decreases values close to 0.
normalizedstring <optional> unitMax whether to normalize the HPCP vector
referenceFrequencynumber <optional> 440 the reference frequency for semitone index calculation, corresponding to A3 [Hz]
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
sizenumber <optional> 12 the size of the output HPCP (must be a positive nonzero multiple of 12)
weightTypestring <optional> squaredCosine type of weighting function for determining frequency contribution
windowSizenumber <optional> 1 the size, in semitones, of the window used for the weighting
Returns
Details
-
HarmonicBpm( bpms [, bpm [, threshold [, tolerance ] ] ] ) → {object}
-
Description
This algorithm extracts bpms that are harmonically related to the tempo given by the 'bpm' parameter. The algorithm assumes a certain bpm is harmonically related to parameter bpm, when the greatest common divisor between both bpms is greater than threshold. The 'tolerance' parameter is needed in order to consider if two bpms are related. For instance, 120, 122 and 236 may be related or not depending on how much tolerance is given Check https://essentia.upf.edu/reference/std_HarmonicBpm.html for more details.
Parameters
Name Type Attributes Default Description bpmsVectorFloat list of bpm candidates
bpmnumber <optional> 60 the bpm used to find its harmonics
thresholdnumber <optional> 20 bpm threshold below which greatest common divisors are discarded
tolerancenumber <optional> 5 percentage tolerance to consider two bpms are equal or equal to a harmonic
Returns
Details
-
HarmonicPeaks( frequencies, magnitudes, pitch [, maxHarmonics [, tolerance ] ] ) → {object}
-
Description
This algorithm finds the harmonic peaks of a signal given its spectral peaks and its fundamental frequency. Note: - "tolerance" parameter defines the allowed fixed deviation from ideal harmonics, being a percentage over the F0. For example: if the F0 is 100Hz you may decide to allow a deviation of 20%, that is a fixed deviation of 20Hz; for the harmonic series it is: [180-220], [280-320], [380-420], etc. - If "pitch" is zero, it means its value is unknown, or the sound is unpitched, and in that case the HarmonicPeaks algorithm returns an empty vector. - The output frequency and magnitude vectors are of size "maxHarmonics". If a particular harmonic was not found among spectral peaks, its ideal frequency value is output together with 0 magnitude. This algorithm is intended to receive its "frequencies" and "magnitudes" inputs from the SpectralPeaks algorithm. - When input vectors differ in size or are empty, an exception is thrown. Input vectors must be ordered by ascending frequency excluding DC components and not contain duplicates, otherwise an exception is thrown. Check https://essentia.upf.edu/reference/std_HarmonicPeaks.html for more details.
Parameters
Name Type Attributes Default Description frequenciesVectorFloat the frequencies of the spectral peaks [Hz] (ascending order)
magnitudesVectorFloat the magnitudes of the spectral peaks (ascending frequency order)
pitchnumber an estimate of the fundamental frequency of the signal [Hz]
maxHarmonicsnumber <optional> 20 the number of harmonics to return including F0
tolerancenumber <optional> 0.2 the allowed ratio deviation from ideal harmonics
Returns
Details
-
HighPass( signal [, cutoffFrequency [, sampleRate ] ] ) → {object}
-
Description
This algorithm implements a 1st order IIR high-pass filter. Because of its dependence on IIR, IIR's requirements are inherited. Check https://essentia.upf.edu/reference/std_HighPass.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input audio signal
cutoffFrequencynumber <optional> 1500 the cutoff frequency for the filter [Hz]
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
HighResolutionFeatures( hpcp [, maxPeaks ] ) → {object}
-
Description
This algorithm computes high-resolution chroma features from an HPCP vector. The vector's size must be a multiple of 12 and it is recommended that it be larger than 120. In otherwords, the HPCP's resolution should be 10 Cents or more. The high-resolution features being computed are: Check https://essentia.upf.edu/reference/std_HighResolutionFeatures.html for more details.
Parameters
Name Type Attributes Default Description hpcpVectorFloat the HPCPs, preferably of size >= 120
maxPeaksnumber <optional> 24 maximum number of HPCP peaks to consider when calculating outputs
Returns
Details
-
Histogram( array [, maxValue [, minValue [, normalize [, numberBins ] ] ] ] ) → {object}
-
Description
This algorithm computes a histogram. Values outside the range are ignored Check https://essentia.upf.edu/reference/std_Histogram.html for more details.
Parameters
Name Type Attributes Default Description arrayVectorFloat the input array
maxValuenumber <optional> 1 the max value of the histogram
minValuenumber <optional> 0 the min value of the histogram
normalizestring <optional> none the normalization setting.
numberBinsnumber <optional> 10 the number of bins
Returns
Details
-
HprModelAnal( frame, pitch [, fftSize [, freqDevOffset [, freqDevSlope [, harmDevSlope [, hopSize [, magnitudeThreshold [, maxFrequency [, maxPeaks [, maxnSines [, minFrequency [, nHarmonics [, orderBy [, sampleRate [, stocf ] ] ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes the harmonic plus residual model analysis. Check https://essentia.upf.edu/reference/std_HprModelAnal.html for more details.
Parameters
Name Type Attributes Default Description frameVectorFloat the input frame
pitchnumber external pitch input [Hz].
fftSizenumber <optional> 2048 the size of the internal FFT size (full spectrum size)
freqDevOffsetnumber <optional> 20 minimum frequency deviation at 0Hz
freqDevSlopenumber <optional> 0.01 slope increase of minimum frequency deviation
harmDevSlopenumber <optional> 0.01 slope increase of minimum frequency deviation
hopSizenumber <optional> 512 the hop size between frames
magnitudeThresholdnumber <optional> 0 peaks below this given threshold are not outputted
maxFrequencynumber <optional> 5000 the maximum frequency of the range to evaluate [Hz]
maxPeaksnumber <optional> 100 the maximum number of returned peaks
maxnSinesnumber <optional> 100 maximum number of sines per frame
minFrequencynumber <optional> 20 the minimum frequency of the range to evaluate [Hz]
nHarmonicsnumber <optional> 100 maximum number of harmonics per frame
orderBystring <optional> frequency the ordering type of the outputted peaks (ascending by frequency or descending by magnitude)
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
stocfnumber <optional> 0.2 decimation factor used for the stochastic approximation
Returns
Details
-
HpsModelAnal( frame, pitch [, fftSize [, freqDevOffset [, freqDevSlope [, harmDevSlope [, hopSize [, magnitudeThreshold [, maxFrequency [, maxPeaks [, maxnSines [, minFrequency [, nHarmonics [, orderBy [, sampleRate [, stocf ] ] ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes the harmonic plus stochastic model analysis. Check https://essentia.upf.edu/reference/std_HpsModelAnal.html for more details.
Parameters
Name Type Attributes Default Description frameVectorFloat the input frame
pitchnumber external pitch input [Hz].
fftSizenumber <optional> 2048 the size of the internal FFT size (full spectrum size)
freqDevOffsetnumber <optional> 20 minimum frequency deviation at 0Hz
freqDevSlopenumber <optional> 0.01 slope increase of minimum frequency deviation
harmDevSlopenumber <optional> 0.01 slope increase of minimum frequency deviation
hopSizenumber <optional> 512 the hop size between frames
magnitudeThresholdnumber <optional> 0 peaks below this given threshold are not outputted
maxFrequencynumber <optional> 5000 the maximum frequency of the range to evaluate [Hz]
maxPeaksnumber <optional> 100 the maximum number of returned peaks
maxnSinesnumber <optional> 100 maximum number of sines per frame
minFrequencynumber <optional> 20 the minimum frequency of the range to evaluate [Hz]
nHarmonicsnumber <optional> 100 maximum number of harmonics per frame
orderBystring <optional> frequency the ordering type of the outputted peaks (ascending by frequency or descending by magnitude)
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
stocfnumber <optional> 0.2 decimation factor used for the stochastic approximation
Returns
Details
-
IDCT( dct [, dctType [, inputSize [, liftering [, outputSize ] ] ] ] ) → {object}
-
Description
This algorithm computes the Inverse Discrete Cosine Transform of an array. It can be configured to perform the inverse DCT-II form, with the 1/sqrt(2) scaling factor for the first coefficient or the inverse DCT-III form based on the HTK implementation. Check https://essentia.upf.edu/reference/std_IDCT.html for more details.
Parameters
Name Type Attributes Default Description dctVectorFloat the discrete cosine transform
dctTypenumber <optional> 2 the DCT type
inputSizenumber <optional> 10 the size of the input array
lifteringnumber <optional> 0 the liftering coefficient. Use '0' to bypass it
outputSizenumber <optional> 10 the number of output coefficients
Returns
Details
-
IIR( signal [, denominator [, numerator ] ] ) → {object}
-
Description
This algorithm implements a standard IIR filter. It filters the data in the input vector with the filter described by parameter vectors 'numerator' and 'denominator' to create the output filtered vector. In the litterature, the numerator is often referred to as the 'B' coefficients and the denominator as the 'A' coefficients. Check https://essentia.upf.edu/reference/std_IIR.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input signal
denominatorArray.<any> <optional> [1] the list of coefficients of the denominator. Often referred to as the A coefficient vector.
numeratorArray.<any> <optional> [1] the list of coefficients of the numerator. Often referred to as the B coefficient vector.
Returns
Details
-
Inharmonicity( frequencies, magnitudes ) → {object}
-
Description
This algorithm calculates the inharmonicity of a signal given its spectral peaks. The inharmonicity value is computed as an energy weighted divergence of the spectral components from their closest multiple of the fundamental frequency. The fundamental frequency is taken as the first spectral peak from the input. The inharmonicity value ranges from 0 (purely harmonic signal) to 1 (inharmonic signal). Check https://essentia.upf.edu/reference/std_Inharmonicity.html for more details.
Parameters
Name Type Description frequenciesVectorFloat the frequencies of the harmonic peaks [Hz] (in ascending order)
magnitudesVectorFloat the magnitudes of the harmonic peaks (in frequency ascending order
Returns
Details
-
InstantPower( array ) → {object}
-
Description
This algorithm computes the instant power of an array. That is, the energy of the array over its size. Check https://essentia.upf.edu/reference/std_InstantPower.html for more details.
Parameters
Name Type Description arrayVectorFloat the input array
Returns
Details
-
Intensity( signal [, sampleRate ] ) → {object}
-
Description
This algorithm classifies the input audio signal as either relaxed (-1), moderate (0), or aggressive (1). Check https://essentia.upf.edu/reference/std_Intensity.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input audio signal
sampleRatenumber <optional> 44100 the input audio sampling rate [Hz]
Returns
Details
-
Key( pcp [, numHarmonics [, pcpSize [, profileType [, slope [, useMajMin [, usePolyphony [, useThreeChords ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes key estimate given a pitch class profile (HPCP). The algorithm was severely adapted and changed from the original implementation for readability and speed. Check https://essentia.upf.edu/reference/std_Key.html for more details.
Parameters
Name Type Attributes Default Description pcpVectorFloat the input pitch class profile
numHarmonicsnumber <optional> 4 number of harmonics that should contribute to the polyphonic profile (1 only considers the fundamental harmonic)
pcpSizenumber <optional> 36 number of array elements used to represent a semitone times 12 (this parameter is only a hint, during computation, the size of the input PCP is used instead)
profileTypestring <optional> bgate the type of polyphic profile to use for correlation calculation
slopenumber <optional> 0.6 value of the slope of the exponential harmonic contribution to the polyphonic profile
useMajMinboolean <optional> false use a third profile called 'majmin' for ambiguous tracks [4]. Only avalable for the edma, bgate and braw profiles
usePolyphonyboolean <optional> true enables the use of polyphonic profiles to define key profiles (this includes the contributions from triads as well as pitch harmonics)
useThreeChordsboolean <optional> true consider only the 3 main triad chords of the key (T, D, SD) to build the polyphonic profiles
Returns
Details
-
KeyExtractor( audio [, averageDetuningCorrection [, frameSize [, hopSize [, hpcpSize [, maxFrequency [, maximumSpectralPeaks [, minFrequency [, pcpThreshold [, profileType [, sampleRate [, spectralPeaksThreshold [, tuningFrequency [, weightType [, windowType ] ] ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm extracts key/scale for an audio signal. It computes HPCP frames for the input signal and applies key estimation using the Key algorithm. Check https://essentia.upf.edu/reference/std_KeyExtractor.html for more details.
Parameters
Name Type Attributes Default Description audioVectorFloat the audio input signal
averageDetuningCorrectionboolean <optional> true shifts a pcp to the nearest tempered bin
frameSizenumber <optional> 4096 the framesize for computing tonal features
hopSizenumber <optional> 4096 the hopsize for computing tonal features
hpcpSizenumber <optional> 12 the size of the output HPCP (must be a positive nonzero multiple of 12)
maxFrequencynumber <optional> 3500 max frequency to apply whitening to [Hz]
maximumSpectralPeaksnumber <optional> 60 the maximum number of spectral peaks
minFrequencynumber <optional> 25 min frequency to apply whitening to [Hz]
pcpThresholdnumber <optional> 0.2 pcp bins below this value are set to 0
profileTypestring <optional> bgate the type of polyphic profile to use for correlation calculation
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
spectralPeaksThresholdnumber <optional> 0.0001 the threshold for the spectral peaks
tuningFrequencynumber <optional> 440 the tuning frequency of the input signal
weightTypestring <optional> cosine type of weighting function for determining frequency contribution
windowTypestring <optional> hann the window type, which can be 'hamming', 'hann', 'triangular', 'square' or 'blackmanharrisXX'
Returns
Details
-
LPC( frame [, order [, sampleRate [, type ] ] ] ) → {object}
-
Description
This algorithm computes Linear Predictive Coefficients and associated reflection coefficients of a signal. Check https://essentia.upf.edu/reference/std_LPC.html for more details.
Parameters
Name Type Attributes Default Description frameVectorFloat the input audio frame
ordernumber <optional> 10 the order of the LPC analysis (typically [8,14])
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
typestring <optional> regular the type of LPC (regular or warped)
Returns
Details
-
Larm( signal [, attackTime [, power [, releaseTime [, sampleRate ] ] ] ] ) → {object}
-
Description
This algorithm estimates the long-term loudness of an audio signal. The LARM model is based on the asymmetrical low-pass filtering of the Peak Program Meter (PPM), combined with Revised Low-frequency B-weighting (RLB) and power mean calculations. LARM has shown to be a reliable and objective loudness estimate of music and speech. Check https://essentia.upf.edu/reference/std_Larm.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the audio input signal
attackTimenumber <optional> 10 the attack time of the first order lowpass in the attack phase [ms]
powernumber <optional> 1.5 the power used for averaging
releaseTimenumber <optional> 1500 the release time of the first order lowpass in the release phase [ms]
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
Leq( signal ) → {object}
-
Description
This algorithm computes the Equivalent sound level (Leq) of an audio signal. The Leq measure can be derived from the Revised Low-frequency B-weighting (RLB) or from the raw signal as described in [1]. If the signal contains no energy, Leq defaults to essentias definition of silence which is -90dB. This algorithm will throw an exception on empty input. Check https://essentia.upf.edu/reference/std_Leq.html for more details.
Parameters
Name Type Description signalVectorFloat the input signal (must be non-empty)
Returns
Details
-
LevelExtractor( signal [, frameSize [, hopSize ] ] ) → {object}
-
Description
This algorithm extracts the loudness of an audio signal in frames using Loudness algorithm. Check https://essentia.upf.edu/reference/std_LevelExtractor.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the audio input signal
frameSizenumber <optional> 88200 frame size to compute loudness
hopSizenumber <optional> 44100 hop size to compute loudness
Returns
Details
-
LogAttackTime( signal [, sampleRate [, startAttackThreshold [, stopAttackThreshold ] ] ] ) → {object}
-
Description
This algorithm computes the log (base 10) of the attack time of a signal envelope. The attack time is defined as the time duration from when the sound becomes perceptually audible to when it reaches its maximum intensity. By default, the start of the attack is estimated as the point where the signal envelope reaches 20% of its maximum value in order to account for possible noise presence. Also by default, the end of the attack is estimated as as the point where the signal envelope has reached 90% of its maximum value, in order to account for the possibility that the max value occurres after the logAttack, as in trumpet sounds. Check https://essentia.upf.edu/reference/std_LogAttackTime.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input signal envelope (must be non-empty)
sampleRatenumber <optional> 44100 the audio sampling rate [Hz]
startAttackThresholdnumber <optional> 0.2 the percentage of the input signal envelope at which the starting point of the attack is considered
stopAttackThresholdnumber <optional> 0.9 the percentage of the input signal envelope at which the ending point of the attack is considered
Returns
Details
-
LogSpectrum( spectrum [, binsPerSemitone [, frameSize [, rollOn [, sampleRate ] ] ] ] ) → {object}
-
Description
This algorithm computes spectrum with logarithmically distributed frequency bins. This code is ported from NNLS Chroma [1, 2].This algorithm also returns a local tuning that is retrieved for input frame and a global tuning that is updated with a moving average. Check https://essentia.upf.edu/reference/std_LogSpectrum.html for more details.
Parameters
Name Type Attributes Default Description spectrumVectorFloat spectrum frame
binsPerSemitonenumber <optional> 3 bins per semitone
frameSizenumber <optional> 1025 the input frame size of the spectrum vector
rollOnnumber <optional> 0 this removes low-frequency noise - useful in quiet recordings
sampleRatenumber <optional> 44100 the input sample rate
Returns
Details
-
LoopBpmConfidence( signal, bpmEstimate [, sampleRate ] ) → {object}
-
Description
This algorithm takes an audio signal and a BPM estimate for that signal and predicts the reliability of the BPM estimate in a value from 0 to 1. The audio signal is assumed to be a musical loop with constant tempo. The confidence returned is based on comparing the duration of the signal with multiples of the BPM estimate (see [1] for more details). Check https://essentia.upf.edu/reference/std_LoopBpmConfidence.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat loop audio signal
bpmEstimatenumber estimated BPM for the audio signal
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
LoopBpmEstimator( signal [, confidenceThreshold ] ) → {object}
-
Description
This algorithm estimates the BPM of audio loops. It internally uses PercivalBpmEstimator algorithm to produce a BPM estimate and LoopBpmConfidence to asses the reliability of the estimate. If the provided estimate is below the given confidenceThreshold, the algorithm outputs a BPM 0.0, otherwise it outputs the estimated BPM. For more details on the BPM estimation method and the confidence measure please check the used algorithms. Check https://essentia.upf.edu/reference/std_LoopBpmEstimator.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input signal
confidenceThresholdnumber <optional> 0.95 confidence threshold below which bpm estimate will be considered unreliable
Returns
Details
-
Loudness( signal ) → {object}
-
Description
This algorithm computes the loudness of an audio signal defined by Steven's power law. It computes loudness as the energy of the signal raised to the power of 0.67. Check https://essentia.upf.edu/reference/std_Loudness.html for more details.
Parameters
Name Type Description signalVectorFloat the input signal
Returns
Details
-
LoudnessVickers( signal [, sampleRate ] ) → {object}
-
Description
This algorithm computes Vickers's loudness of an audio signal. Currently, this algorithm only works for signals with a 44100Hz sampling rate. This algorithm is meant to be given frames of audio as input (not entire audio signals). The algorithm described in the paper performs a weighted average of the loudness value computed for each of the given frames, this step is left as a post processing step and is not performed by this algorithm. Check https://essentia.upf.edu/reference/std_LoudnessVickers.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input signal
sampleRatenumber <optional> 44100 the audio sampling rate of the input signal which is used to create the weight vector [Hz] (currently, this algorithm only works on signals with a sampling rate of 44100Hz)
Returns
Details
-
LowLevelSpectralEqloudExtractor( signal [, frameSize [, hopSize [, sampleRate ] ] ] ) → {object}
-
Description
This algorithm extracts a set of level spectral features for which it is recommended to apply a preliminary equal-loudness filter over an input audio signal (according to the internal evaluations conducted at Music Technology Group). To this end, you are expected to provide the output of EqualLoudness algorithm as an input for this algorithm. Still, you are free to provide an unprocessed audio input in the case you want to compute these features without equal-loudness filter. Check https://essentia.upf.edu/reference/std_LowLevelSpectralEqloudExtractor.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input audio signal
frameSizenumber <optional> 2048 the frame size for computing low level features
hopSizenumber <optional> 1024 the hop size for computing low level features
sampleRatenumber <optional> 44100 the audio sampling rate
Returns
Details
-
LowLevelSpectralExtractor( signal [, frameSize [, hopSize [, sampleRate ] ] ] ) → {object}
-
Description
This algorithm extracts all low-level spectral features, which do not require an equal-loudness filter for their computation, from an audio signal Check https://essentia.upf.edu/reference/std_LowLevelSpectralExtractor.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the audio input signal
frameSizenumber <optional> 2048 the frame size for computing low level features
hopSizenumber <optional> 1024 the hop size for computing low level features
sampleRatenumber <optional> 44100 the audio sampling rate
Returns
Details
-
LowPass( signal [, cutoffFrequency [, sampleRate ] ] ) → {object}
-
Description
This algorithm implements a 1st order IIR low-pass filter. Because of its dependence on IIR, IIR's requirements are inherited. References: [1] U. Zölzer, DAFX - Digital Audio Effects, p. 40, John Wiley & Sons, 2002 Check https://essentia.upf.edu/reference/std_LowPass.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input audio signal
cutoffFrequencynumber <optional> 1500 the cutoff frequency for the filter [Hz]
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
MFCC( spectrum [, dctType [, highFrequencyBound [, inputSize [, liftering [, logType [, lowFrequencyBound [, normalize [, numberBands [, numberCoefficients [, sampleRate [, silenceThreshold [, type [, warpingFormula [, weighting ] ] ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes the mel-frequency cepstrum coefficients of a spectrum. As there is no standard implementation, the MFCC-FB40 is used by default: - filterbank of 40 bands from 0 to 11000Hz - take the log value of the spectrum energy in each mel band. Bands energy values below silence threshold will be clipped to its value before computing log-energies - DCT of the 40 bands down to 13 mel coefficients There is a paper describing various MFCC implementations [1]. Check https://essentia.upf.edu/reference/std_MFCC.html for more details.
Parameters
Name Type Attributes Default Description spectrumVectorFloat the audio spectrum
dctTypenumber <optional> 2 the DCT type
highFrequencyBoundnumber <optional> 11000 the upper bound of the frequency range [Hz]
inputSizenumber <optional> 1025 the size of input spectrum
lifteringnumber <optional> 0 the liftering coefficient. Use '0' to bypass it
logTypestring <optional> dbamp logarithmic compression type. Use 'dbpow' if working with power and 'dbamp' if working with magnitudes
lowFrequencyBoundnumber <optional> 0 the lower bound of the frequency range [Hz]
normalizestring <optional> unit_sum spectrum bin weights to use for each mel band: 'unit_max' to make each mel band vertex equal to 1, 'unit_sum' to make each mel band area equal to 1 summing the actual weights of spectrum bins, 'unit_area' to make each triangle mel band area equal to 1 normalizing the weights of each triangle by its bandwidth
numberBandsnumber <optional> 40 the number of mel-bands in the filter
numberCoefficientsnumber <optional> 13 the number of output mel coefficients
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
silenceThresholdnumber <optional> 1e-10 silence threshold for computing log-energy bands
typestring <optional> power use magnitude or power spectrum
warpingFormulastring <optional> htkMel The scale implementation type: 'htkMel' scale from the HTK toolkit [2, 3] (default) or 'slaneyMel' scale from the Auditory toolbox [4]
weightingstring <optional> warping type of weighting function for determining triangle area
Returns
Details
-
MaxFilter( signal [, causal [, width ] ] ) → {object}
-
Description
This algorithm implements a maximum filter for 1d signal using van Herk/Gil-Werman (HGW) algorithm. Check https://essentia.upf.edu/reference/std_MaxFilter.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat signal to be filtered
causalboolean <optional> true use casual filter (window is behind current element otherwise it is centered around)
widthnumber <optional> 3 the window size, has to be odd if the window is centered
Returns
Details
-
MaxMagFreq( spectrum [, sampleRate ] ) → {object}
-
Description
This algorithm computes the frequency with the largest magnitude in a spectrum. Note that a spectrum must contain at least two elements otherwise an exception is thrown Check https://essentia.upf.edu/reference/std_MaxMagFreq.html for more details.
Parameters
Name Type Attributes Default Description spectrumVectorFloat the input spectrum (must have more than 1 element)
sampleRatenumber <optional> 44100 the audio sampling rate [Hz]
Returns
Details
-
MaxToTotal( envelope ) → {object}
-
Description
This algorithm computes the ratio between the index of the maximum value of the envelope of a signal and the total length of the envelope. This ratio shows how much the maximum amplitude is off-center. Its value is close to 0 if the maximum is close to the beginning (e.g. Decrescendo or Impulsive sounds), close to 0.5 if it is close to the middle (e.g. Delta sounds) and close to 1 if it is close to the end of the sound (e.g. Crescendo sounds). This algorithm is intended to be fed by the output of the Envelope algorithm Check https://essentia.upf.edu/reference/std_MaxToTotal.html for more details.
Parameters
Name Type Description envelopeVectorFloat the envelope of the signal
Returns
Details
-
Mean( array ) → {object}
-
Description
This algorithm computes the mean of an array. Check https://essentia.upf.edu/reference/std_Mean.html for more details.
Parameters
Name Type Description arrayVectorFloat the input array
Returns
Details
-
Median( array ) → {object}
-
Description
This algorithm computes the median of an array. When there is an odd number of numbers, the median is simply the middle number. For example, the median of 2, 4, and 7 is 4. When there is an even number of numbers, the median is the mean of the two middle numbers. Thus, the median of the numbers 2, 4, 7, 12 is (4+7)/2 = 5.5. See [1] for more info. Check https://essentia.upf.edu/reference/std_Median.html for more details.
Parameters
Name Type Description arrayVectorFloat the input array (must be non-empty)
Returns
Details
-
MedianFilter( array [, kernelSize ] ) → {object}
-
Description
This algorithm computes the median filtered version of the input signal giving the kernel size as detailed in [1]. Check https://essentia.upf.edu/reference/std_MedianFilter.html for more details.
Parameters
Name Type Attributes Default Description arrayVectorFloat the input array (must be non-empty)
kernelSizenumber <optional> 11 scalar giving the size of the median filter window. Must be odd
Returns
Details
-
MelBands( spectrum [, highFrequencyBound [, inputSize [, log [, lowFrequencyBound [, normalize [, numberBands [, sampleRate [, type [, warpingFormula [, weighting ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes energy in mel bands of a spectrum. It applies a frequency-domain filterbank (MFCC FB-40, [1]), which consists of equal area triangular filters spaced according to the mel scale. The filterbank is normalized in such a way that the sum of coefficients for every filter equals one. It is recommended that the input "spectrum" be calculated by the Spectrum algorithm. Check https://essentia.upf.edu/reference/std_MelBands.html for more details.
Parameters
Name Type Attributes Default Description spectrumVectorFloat the audio spectrum
highFrequencyBoundnumber <optional> 22050 an upper-bound limit for the frequencies to be included in the bands
inputSizenumber <optional> 1025 the size of the spectrum
logboolean <optional> false compute log-energies (log10 (1 + energy))
lowFrequencyBoundnumber <optional> 0 a lower-bound limit for the frequencies to be included in the bands
normalizestring <optional> unit_sum spectrum bin weights to use for each mel band: 'unit_max' to make each mel band vertex equal to 1, 'unit_sum' to make each mel band area equal to 1 summing the actual weights of spectrum bins, 'unit_area' to make each triangle mel band area equal to 1 normalizing the weights of each triangle by its bandwidth
numberBandsnumber <optional> 24 the number of output bands
sampleRatenumber <optional> 44100 the sample rate
typestring <optional> power 'power' to output squared units, 'magnitude' to keep it as the input
warpingFormulastring <optional> htkMel The scale implementation type: 'htkMel' scale from the HTK toolkit [2, 3] (default) or 'slaneyMel' scale from the Auditory toolbox [4]
weightingstring <optional> warping type of weighting function for determining triangle area
Returns
Details
-
Meter( beatogram ) → {object}
-
Description
This algorithm estimates the time signature of a given beatogram by finding the highest correlation between beats. Check https://essentia.upf.edu/reference/std_Meter.html for more details.
Parameters
Name Type Description beatogramVectorVectorFloat filtered matrix loudness
Returns
Details
-
MinMax( array [, type ] ) → {object}
-
Description
This algorithm calculates the minimum or maximum value of an array. If the array has more than one minimum or maximum value, the index of the first one is returned Check https://essentia.upf.edu/reference/std_MinMax.html for more details.
Parameters
Name Type Attributes Default Description arrayVectorFloat the input array
typestring <optional> min the type of the operation
Returns
Details
-
MinToTotal( envelope ) → {object}
-
Description
This algorithm computes the ratio between the index of the minimum value of the envelope of a signal and the total length of the envelope. Check https://essentia.upf.edu/reference/std_MinToTotal.html for more details.
Parameters
Name Type Description envelopeVectorFloat the envelope of the signal
Returns
Details
-
MovingAverage( signal [, size ] ) → {object}
-
Description
This algorithm implements a FIR Moving Average filter. Because of its dependece on IIR, IIR's requirements are inherited. Check https://essentia.upf.edu/reference/std_MovingAverage.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input audio signal
sizenumber <optional> 6 the size of the window [audio samples]
Returns
Details
-
MultiPitchKlapuri( signal [, binResolution [, frameSize [, harmonicWeight [, hopSize [, magnitudeCompression [, magnitudeThreshold [, maxFrequency [, minFrequency [, numberHarmonics [, referenceFrequency [, sampleRate ] ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm estimates multiple pitch values corresponding to the melodic lines present in a polyphonic music signal (for example, string quartet, piano). This implementation is based on the algorithm in [1]: In each frame, a set of possible fundamental frequency candidates is extracted based on the principle of harmonic summation. In an optimization stage, the number of harmonic sources (polyphony) is estimated and the final set of fundamental frequencies determined. In contrast to the pich salience function proposed in [2], this implementation uses the pitch salience function described in [1]. The output is a vector for each frame containing the estimated melody pitch values. Check https://essentia.upf.edu/reference/std_MultiPitchKlapuri.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input signal
binResolutionnumber <optional> 10 salience function bin resolution [cents]
frameSizenumber <optional> 2048 the frame size for computing pitch saliecnce
harmonicWeightnumber <optional> 0.8 harmonic weighting parameter (weight decay ratio between two consequent harmonics, =1 for no decay)
hopSizenumber <optional> 128 the hop size with which the pitch salience function was computed
magnitudeCompressionnumber <optional> 1 magnitude compression parameter for the salience function (=0 for maximum compression, =1 for no compression)
magnitudeThresholdnumber <optional> 40 spectral peak magnitude threshold (maximum allowed difference from the highest peak in dBs)
maxFrequencynumber <optional> 1760 the maximum allowed frequency for salience function peaks (ignore peaks above) [Hz]
minFrequencynumber <optional> 80 the minimum allowed frequency for salience function peaks (ignore peaks below) [Hz]
numberHarmonicsnumber <optional> 10 number of considered harmonics
referenceFrequencynumber <optional> 55 the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
MultiPitchMelodia( signal [, binResolution [, filterIterations [, frameSize [, guessUnvoiced [, harmonicWeight [, hopSize [, magnitudeCompression [, magnitudeThreshold [, maxFrequency [, minDuration [, minFrequency [, numberHarmonics [, peakDistributionThreshold [, peakFrameThreshold [, pitchContinuity [, referenceFrequency [, sampleRate [, timeContinuity ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm estimates multiple fundamental frequency contours from an audio signal. It is a multi pitch version of the MELODIA algorithm described in [1]. While the algorithm is originally designed to extract melody in polyphonic music, this implementation is adapted for multiple sources. The approach is based on the creation and characterization of pitch contours, time continuous sequences of pitch candidates grouped using auditory streaming cues. To this end, PitchSalienceFunction, PitchSalienceFunctionPeaks, PitchContours, and PitchContoursMonoMelody algorithms are employed. It is strongly advised to use the default parameter values which are optimized according to [1] (where further details are provided) except for minFrequency, maxFrequency, and voicingTolerance, which will depend on your application. Check https://essentia.upf.edu/reference/std_MultiPitchMelodia.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input signal
binResolutionnumber <optional> 10 salience function bin resolution [cents]
filterIterationsnumber <optional> 3 number of iterations for the octave errors / pitch outlier filtering process
frameSizenumber <optional> 2048 the frame size for computing pitch saliecnce
guessUnvoicedboolean <optional> false estimate pitch for non-voiced segments by using non-salient contours when no salient ones are present in a frame
harmonicWeightnumber <optional> 0.8 harmonic weighting parameter (weight decay ratio between two consequent harmonics, =1 for no decay)
hopSizenumber <optional> 128 the hop size with which the pitch salience function was computed
magnitudeCompressionnumber <optional> 1 magnitude compression parameter for the salience function (=0 for maximum compression, =1 for no compression)
magnitudeThresholdnumber <optional> 40 spectral peak magnitude threshold (maximum allowed difference from the highest peak in dBs)
maxFrequencynumber <optional> 20000 the minimum allowed frequency for salience function peaks (ignore contours with peaks above) [Hz]
minDurationnumber <optional> 100 the minimum allowed contour duration [ms]
minFrequencynumber <optional> 40 the minimum allowed frequency for salience function peaks (ignore contours with peaks below) [Hz]
numberHarmonicsnumber <optional> 20 number of considered harmonics
peakDistributionThresholdnumber <optional> 0.9 allowed deviation below the peak salience mean over all frames (fraction of the standard deviation)
peakFrameThresholdnumber <optional> 0.9 per-frame salience threshold factor (fraction of the highest peak salience in a frame)
pitchContinuitynumber <optional> 27.5625 pitch continuity cue (maximum allowed pitch change during 1 ms time period) [cents]
referenceFrequencynumber <optional> 55 the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
timeContinuitynumber <optional> 100 time continuity cue (the maximum allowed gap duration for a pitch contour) [ms]
Returns
Details
-
Multiplexer( [ numberRealInputs [, numberVectorRealInputs ] ] ) → {object}
-
Description
This algorithm returns a single vector from a given number of real values and/or frames. Frames from different inputs are multiplexed onto a single stream in an alternating fashion. Check https://essentia.upf.edu/reference/std_Multiplexer.html for more details.
Parameters
Name Type Attributes Default Description numberRealInputsnumber <optional> 0 the number of inputs of type Real to multiplex
numberVectorRealInputsnumber <optional> 0 the number of inputs of type vector
to multiplex Returns
Details
-
NNLSChroma( logSpectrogram, meanTuning, localTuning [, chromaNormalization [, frameSize [, sampleRate [, spectralShape [, spectralWhitening [, tuningMode [, useNNLS ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm extracts treble and bass chromagrams from a sequence of log-frequency spectrum frames. On this representation, two processing steps are performed: -tuning, after which each centre bin (i.e. bin 2, 5, 8, ...) corresponds to a semitone, even if the tuning of the piece deviates from 440 Hz standard pitch. -running standardisation: subtraction of the running mean, division by the running standard deviation. This has a spectral whitening effect. This code is ported from NNLS Chroma [1, 2]. To achieve similar results follow this processing chain: frame slicing with sample rate = 44100, frame size = 16384, hop size = 2048 -> Windowing with Hann and no normalization -> Spectrum -> LogSpectrum. Check https://essentia.upf.edu/reference/std_NNLSChroma.html for more details.
Parameters
Name Type Attributes Default Description logSpectrogramVectorVectorFloat log spectrum frames
meanTuningVectorFloat mean tuning frames
localTuningVectorFloat local tuning frames
chromaNormalizationstring <optional> none determines whether or how the chromagrams are normalised
frameSizenumber <optional> 1025 the input frame size of the spectrum vector
sampleRatenumber <optional> 44100 the input sample rate
spectralShapenumber <optional> 0.7 the shape of the notes in the NNLS dictionary
spectralWhiteningnumber <optional> 1 determines how much the log-frequency spectrum is whitened
tuningModestring <optional> global local uses a local average for tuning, global uses all audio frames. Local tuning is only advisable when the tuning is likely to change over the audio
useNNLSboolean <optional> true toggle between NNLS approximate transcription and linear spectral mapping
Returns
Details
-
NoiseAdder( signal [, fixSeed [, level ] ] ) → {object}
-
Description
This algorithm adds noise to an input signal. The average energy of the noise in dB is defined by the level parameter, and is generated using the Mersenne Twister random number generator. Check https://essentia.upf.edu/reference/std_NoiseAdder.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input signal
fixSeedboolean <optional> false if true, 0 is used as the seed for generating random values
levelnumber <optional> -100 power level of the noise generator [dB]
Returns
Details
-
NoiseBurstDetector( frame [, alpha [, silenceThreshold [, threshold ] ] ] ) → {object}
-
Description
This algorithm detects noise bursts in the waveform by thresholding the peaks of the second derivative. The threshold is computed using an Exponential Moving Average filter over the RMS of the second derivative of the input frame. Check https://essentia.upf.edu/reference/std_NoiseBurstDetector.html for more details.
Parameters
Name Type Attributes Default Description frameVectorFloat the input frame (must be non-empty)
alphanumber <optional> 0.9 alpha coefficient for the Exponential Moving Average threshold estimation.
silenceThresholdnumber <optional> -50 threshold to skip silent frames
thresholdnumber <optional> 8 factor to control the dynamic theshold
Returns
Details
-
NoveltyCurve( frequencyBands [, frameRate [, normalize [, weightCurve [, weightCurveType ] ] ] ] ) → {object}
-
Description
This algorithm computes the "novelty curve" (Grosche & Müller, 2009) onset detection function. The algorithm expects as an input a frame-wise sequence of frequency-bands energies or spectrum magnitudes as originally proposed in [1] (see FrequencyBands and Spectrum algorithms). Novelty in each band (or frequency bin) is computed as a derivative between log-compressed energy (magnitude) values in consequent frames. The overall novelty value is then computed as a weighted sum that can be configured using 'weightCurve' parameter. The resulting novelty curve can be used for beat tracking and onset detection (see BpmHistogram and Onsets). Check https://essentia.upf.edu/reference/std_NoveltyCurve.html for more details.
Parameters
Name Type Attributes Default Description frequencyBandsVectorVectorFloat the frequency bands
frameRatenumber <optional> 344.531 the sampling rate of the input audio
normalizeboolean <optional> false whether to normalize each band's energy
weightCurveArray.<any> <optional> [] vector containing the weights for each frequency band. Only if weightCurveType==supplied
weightCurveTypestring <optional> hybrid the type of weighting to be used for the bands novelty
Returns
Details
-
NoveltyCurveFixedBpmEstimator( novelty [, hopSize [, maxBpm [, minBpm [, sampleRate [, tolerance ] ] ] ] ] ) → {object}
-
Description
This algorithm outputs a histogram of the most probable bpms assuming the signal has constant tempo given the novelty curve. This algorithm is based on the autocorrelation of the novelty curve (see NoveltyCurve algorithm) and should only be used for signals that have a constant tempo or as a first tempo estimator to be used in conjunction with other algorithms such as BpmHistogram.It is a simplified version of the algorithm described in [1] as, in order to predict the best BPM candidate, it computes autocorrelation of the entire novelty curve instead of analyzing it on frames and histogramming the peaks over frames. Check https://essentia.upf.edu/reference/std_NoveltyCurveFixedBpmEstimator.html for more details.
Parameters
Name Type Attributes Default Description noveltyVectorFloat the novelty curve of the audio signal
hopSizenumber <optional> 512 the hopSize used to computeh the novelty curve from the original signal
maxBpmnumber <optional> 560 the maximum bpm to look for
minBpmnumber <optional> 30 the minimum bpm to look for
sampleRatenumber <optional> 44100 the sampling rate original audio signal [Hz]
tolerancenumber <optional> 3 tolerance (in percentage) for considering bpms to be equal
Returns
Details
-
OddToEvenHarmonicEnergyRatio( frequencies, magnitudes ) → {object}
-
Description
This algorithm computes the ratio between a signal's odd and even harmonic energy given the signal's harmonic peaks. The odd to even harmonic energy ratio is a measure allowing to distinguish odd-harmonic-energy predominant sounds (such as from a clarinet) from equally important even-harmonic-energy sounds (such as from a trumpet). The required harmonic frequencies and magnitudes can be computed by the HarmonicPeaks algorithm. In the case when the even energy is zero, which may happen when only even harmonics where found or when only one peak was found, the algorithm outputs the maximum real number possible. Therefore, this algorithm should be used in conjunction with the harmonic peaks algorithm. If no peaks are supplied, the algorithm outputs a value of one, assuming either the spectrum was flat or it was silent. Check https://essentia.upf.edu/reference/std_OddToEvenHarmonicEnergyRatio.html for more details.
Parameters
Name Type Description frequenciesVectorFloat the frequencies of the harmonic peaks (at least two frequencies in frequency ascending order)
magnitudesVectorFloat the magnitudes of the harmonic peaks (at least two magnitudes in frequency ascending order)
Returns
Details
-
OnsetDetection( spectrum, phase [, method [, sampleRate ] ] ) → {object}
-
Description
This algorithm computes various onset detection functions. The output of this algorithm should be post-processed in order to determine whether the frame contains an onset or not. Namely, it could be fed to the Onsets algorithm. It is recommended that the input "spectrum" is generated by the Spectrum algorithm. Four methods are available: - 'HFC', the High Frequency Content detection function which accurately detects percussive events (see HFC algorithm for details). - 'complex', the Complex-Domain spectral difference function [1] taking into account changes in magnitude and phase. It emphasizes note onsets either as a result of significant change in energy in the magnitude spectrum, and/or a deviation from the expected phase values in the phase spectrum, caused by a change in pitch. - 'complex_phase', the simplified Complex-Domain spectral difference function [2] taking into account phase changes, weighted by magnitude. TODO:It reacts better on tonal sounds such as bowed string, but tends to over-detect percussive events. - 'flux', the Spectral Flux detection function which characterizes changes in magnitude spectrum. See Flux algorithm for details. - 'melflux', the spectral difference function, similar to spectral flux, but using half-rectified energy changes in Mel-frequency bands of the spectrum [3]. - 'rms', the difference function, measuring the half-rectified change of the RMS of the magnitude spectrum (i.e., measuring overall energy flux) [4]. Check https://essentia.upf.edu/reference/std_OnsetDetection.html for more details.
Parameters
Name Type Attributes Default Description spectrumVectorFloat the input spectrum
phaseVectorFloat the phase vector corresponding to this spectrum (used only by the "complex" method)
methodstring <optional> hfc the method used for onset detection
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
OnsetDetectionGlobal( signal [, frameSize [, hopSize [, method [, sampleRate ] ] ] ] ) → {object}
-
Description
This algorithm computes various onset detection functions. Detection values are computed frame-wisely given an input signal. The output of this algorithm should be post-processed in order to determine whether the frame contains an onset or not. Namely, it could be fed to the Onsets algorithm. The following method are available: - 'infogain', the spectral difference measured by the modified information gain [1]. For each frame, it accounts for energy change in between preceding and consecutive frames, histogrammed together, in order to suppress short-term variations on frame-by-frame basis. - 'beat_emphasis', the beat emphasis function [1]. This function is a linear combination of onset detection functions (complex spectral differences) in a number of sub-bands, weighted by their beat strength computed over the entire input signal. Note: - 'infogain' onset detection has been optimized for the default sampleRate=44100Hz, frameSize=2048, hopSize=512. - 'beat_emphasis' is optimized for a fixed resolution of 11.6ms, which corresponds to the default sampleRate=44100Hz, frameSize=1024, hopSize=512. Optimal performance of beat detection with TempoTapDegara is not guaranteed for other settings. Check https://essentia.upf.edu/reference/std_OnsetDetectionGlobal.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input signal
frameSizenumber <optional> 2048 the frame size for computing onset detection function
hopSizenumber <optional> 512 the hop size for computing onset detection function
methodstring <optional> infogain the method used for onset detection
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
OnsetRate( signal ) → {object}
-
Description
This algorithm computes the number of onsets per second and their position in time for an audio signal. Onset detection functions are computed using both high frequency content and complex-domain methods available in OnsetDetection algorithm. See OnsetDetection for more information. Please note that due to a dependence on the Onsets algorithm, this algorithm is only valid for audio signals with a sampling rate of 44100Hz. This algorithm throws an exception if the input signal is empty. Check https://essentia.upf.edu/reference/std_OnsetRate.html for more details.
Parameters
Name Type Description signalVectorFloat the input signal
Returns
Details
-
OverlapAdd( signal [, frameSize [, gain [, hopSize ] ] ] ) → {object}
-
Description
This algorithm returns the output of an overlap-add process for a sequence of frames of an audio signal. It considers that the input audio frames are windowed audio signals. Giving the size of the frame and the hop size, overlapping and adding consecutive frames will produce a continuous signal. A normalization gain can be passed as a parameter. Check https://essentia.upf.edu/reference/std_OverlapAdd.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the windowed input audio frame
frameSizenumber <optional> 2048 the frame size for computing the overlap-add process
gainnumber <optional> 1 the normalization gain that scales the output signal. Useful for IFFT output
hopSizenumber <optional> 128 the hop size with which the overlap-add function is computed
Returns
Details
-
PeakDetection( array [, interpolate [, maxPeaks [, maxPosition [, minPeakDistance [, minPosition [, orderBy [, range [, threshold ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm detects local maxima (peaks) in an array. The algorithm finds positive slopes and detects a peak when the slope changes sign and the peak is above the threshold. It optionally interpolates using parabolic curve fitting. When two consecutive peaks are closer than the
minPeakDistanceparameter, the smallest one is discarded. A value of 0 bypasses this feature. Check https://essentia.upf.edu/reference/std_PeakDetection.html for more details.Parameters
Name Type Attributes Default Description arrayVectorFloat the input array
interpolateboolean <optional> true boolean flag to enable interpolation
maxPeaksnumber <optional> 100 the maximum number of returned peaks
maxPositionnumber <optional> 1 the maximum value of the range to evaluate
minPeakDistancenumber <optional> 0 minimum distance between consecutive peaks (0 to bypass this feature)
minPositionnumber <optional> 0 the minimum value of the range to evaluate
orderBystring <optional> position the ordering type of the output peaks (ascending by position or descending by value)
rangenumber <optional> 1 the input range
thresholdnumber <optional> -1e+06 peaks below this given threshold are not output
Returns
Details
-
PercivalBpmEstimator( signal [, frameSize [, frameSizeOSS [, hopSize [, hopSizeOSS [, maxBPM [, minBPM [, sampleRate ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm estimates the tempo in beats per minute (BPM) from an input signal as described in [1]. Check https://essentia.upf.edu/reference/std_PercivalBpmEstimator.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat input signal
frameSizenumber <optional> 1024 frame size for the analysis of the input signal
frameSizeOSSnumber <optional> 2048 frame size for the analysis of the Onset Strength Signal
hopSizenumber <optional> 128 hop size for the analysis of the input signal
hopSizeOSSnumber <optional> 128 hop size for the analysis of the Onset Strength Signal
maxBPMnumber <optional> 210 maximum BPM to detect
minBPMnumber <optional> 50 minimum BPM to detect
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
PercivalEnhanceHarmonics( array ) → {object}
-
Description
This algorithm implements the 'Enhance Harmonics' step as described in [1].Given an input autocorrelation signal, two time-stretched versions of it (by factors of 2 and 4) are added to the original.In this way, peaks with an harmonic relation are boosted. For more details check the referenced paper. Check https://essentia.upf.edu/reference/std_PercivalEnhanceHarmonics.html for more details.
Parameters
Name Type Description arrayVectorFloat the input signal
Returns
Details
-
PercivalEvaluatePulseTrains( oss, positions ) → {object}
-
Description
This algorithm implements the 'Evaluate Pulse Trains' step as described in [1].Given an input onset strength signal (OSS) and a number of candidate tempo lag positions, the OSS is correlated with ideal expected pulse trains (for each candidate tempo lag) shifted in time by different amounts. The candidate tempo lag which generates the pulse train that better correlates with the OSS is returned as the preferred tempo candidate. For more details check the referenced paper. Check https://essentia.upf.edu/reference/std_PercivalEvaluatePulseTrains.html for more details.
Parameters
Name Type Description ossVectorFloat onset strength signal (or other novelty curve)
positionsVectorFloat peak positions of BPM candidates
Returns
Details
-
PitchContourSegmentation( pitch, signal [, hopSize [, minDuration [, pitchDistanceThreshold [, rmsThreshold [, sampleRate [, tuningFrequency ] ] ] ] ] ] ) → {object}
-
Description
This algorithm converts a pitch sequence estimated from an audio signal into a set of discrete note events. Each note is defined by its onset time, duration and MIDI pitch value, quantized to the equal tempered scale. Check https://essentia.upf.edu/reference/std_PitchContourSegmentation.html for more details.
Parameters
Name Type Attributes Default Description pitchVectorFloat estimated pitch contour [Hz]
signalVectorFloat input audio signal
hopSizenumber <optional> 128 hop size of the extracted pitch
minDurationnumber <optional> 0.1 minimum note duration [s]
pitchDistanceThresholdnumber <optional> 60 pitch threshold for note segmentation [cents]
rmsThresholdnumber <optional> -2 zscore threshold for note segmentation
sampleRatenumber <optional> 44100 sample rate of the audio signal
tuningFrequencynumber <optional> 440 tuning reference frequency [Hz]
Returns
Details
-
PitchContours( peakBins, peakSaliences [, binResolution [, hopSize [, minDuration [, peakDistributionThreshold [, peakFrameThreshold [, pitchContinuity [, sampleRate [, timeContinuity ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm tracks a set of predominant pitch contours of an audio signal. This algorithm is intended to receive its "frequencies" and "magnitudes" inputs from the PitchSalienceFunctionPeaks algorithm outputs aggregated over all frames in the sequence. The output is a vector of estimated melody pitch values. Check https://essentia.upf.edu/reference/std_PitchContours.html for more details.
Parameters
Name Type Attributes Default Description peakBinsVectorVectorFloat frame-wise array of cent bins corresponding to pitch salience function peaks
peakSaliencesVectorVectorFloat frame-wise array of values of salience function peaks
binResolutionnumber <optional> 10 salience function bin resolution [cents]
hopSizenumber <optional> 128 the hop size with which the pitch salience function was computed
minDurationnumber <optional> 100 the minimum allowed contour duration [ms]
peakDistributionThresholdnumber <optional> 0.9 allowed deviation below the peak salience mean over all frames (fraction of the standard deviation)
peakFrameThresholdnumber <optional> 0.9 per-frame salience threshold factor (fraction of the highest peak salience in a frame)
pitchContinuitynumber <optional> 27.5625 pitch continuity cue (maximum allowed pitch change durig 1 ms time period) [cents]
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
timeContinuitynumber <optional> 100 time continuity cue (the maximum allowed gap duration for a pitch contour) [ms]
Returns
Details
-
PitchContoursMelody( contoursBins, contoursSaliences, contoursStartTimes, duration [, binResolution [, filterIterations [, guessUnvoiced [, hopSize [, maxFrequency [, minFrequency [, referenceFrequency [, sampleRate [, voiceVibrato [, voicingTolerance ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm converts a set of pitch contours into a sequence of predominant f0 values in Hz by taking the value of the most predominant contour in each frame. This algorithm is intended to receive its "contoursBins", "contoursSaliences", and "contoursStartTimes" inputs from the PitchContours algorithm. The "duration" input corresponds to the time duration of the input signal. The output is a vector of estimated pitch values and a vector of confidence values. Check https://essentia.upf.edu/reference/std_PitchContoursMelody.html for more details.
Parameters
Name Type Attributes Default Description contoursBinsVectorVectorFloat array of frame-wise vectors of cent bin values representing each contour
contoursSaliencesVectorVectorFloat array of frame-wise vectors of pitch saliences representing each contour
contoursStartTimesVectorFloat array of the start times of each contour [s]
durationnumber time duration of the input signal [s]
binResolutionnumber <optional> 10 salience function bin resolution [cents]
filterIterationsnumber <optional> 3 number of interations for the octave errors / pitch outlier filtering process
guessUnvoicedboolean <optional> false Estimate pitch for non-voiced segments by using non-salient contours when no salient ones are present in a frame
hopSizenumber <optional> 128 the hop size with which the pitch salience function was computed
maxFrequencynumber <optional> 20000 the minimum allowed frequency for salience function peaks (ignore contours with peaks above) [Hz]
minFrequencynumber <optional> 80 the minimum allowed frequency for salience function peaks (ignore contours with peaks below) [Hz]
referenceFrequencynumber <optional> 55 the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin
sampleRatenumber <optional> 44100 the sampling rate of the audio signal (Hz)
voiceVibratoboolean <optional> false detect voice vibrato
voicingTolerancenumber <optional> 0.2 allowed deviation below the average contour mean salience of all contours (fraction of the standard deviation)
Returns
Details
-
PitchContoursMonoMelody( contoursBins, contoursSaliences, contoursStartTimes, duration [, binResolution [, filterIterations [, guessUnvoiced [, hopSize [, maxFrequency [, minFrequency [, referenceFrequency [, sampleRate ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm converts a set of pitch contours into a sequence of f0 values in Hz by taking the value of the most salient contour in each frame. In contrast to pitchContoursMelody, it assumes a single source. This algorithm is intended to receive its "contoursBins", "contoursSaliences", and "contoursStartTimes" inputs from the PitchContours algorithm. The "duration" input corresponds to the time duration of the input signal. The output is a vector of estimated pitch values and a vector of confidence values. Check https://essentia.upf.edu/reference/std_PitchContoursMonoMelody.html for more details.
Parameters
Name Type Attributes Default Description contoursBinsVectorVectorFloat array of frame-wise vectors of cent bin values representing each contour
contoursSaliencesVectorVectorFloat array of frame-wise vectors of pitch saliences representing each contour
contoursStartTimesVectorFloat array of the start times of each contour [s]
durationnumber time duration of the input signal [s]
binResolutionnumber <optional> 10 salience function bin resolution [cents]
filterIterationsnumber <optional> 3 number of interations for the octave errors / pitch outlier filtering process
guessUnvoicedboolean <optional> false Estimate pitch for non-voiced segments by using non-salient contours when no salient ones are present in a frame
hopSizenumber <optional> 128 the hop size with which the pitch salience function was computed
maxFrequencynumber <optional> 20000 the minimum allowed frequency for salience function peaks (ignore contours with peaks above) [Hz]
minFrequencynumber <optional> 80 the minimum allowed frequency for salience function peaks (ignore contours with peaks below) [Hz]
referenceFrequencynumber <optional> 55 the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin
sampleRatenumber <optional> 44100 the sampling rate of the audio signal (Hz)
Returns
Details
-
PitchContoursMultiMelody( contoursBins, contoursSaliences, contoursStartTimes, duration [, binResolution [, filterIterations [, guessUnvoiced [, hopSize [, maxFrequency [, minFrequency [, referenceFrequency [, sampleRate ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm post-processes a set of pitch contours into a sequence of mutliple f0 values in Hz. This algorithm is intended to receive its "contoursBins", "contoursSaliences", and "contoursStartTimes" inputs from the PitchContours algorithm. The "duration" input corresponds to the time duration of the input signal. The output is a vector of estimated pitch values Check https://essentia.upf.edu/reference/std_PitchContoursMultiMelody.html for more details.
Parameters
Name Type Attributes Default Description contoursBinsVectorVectorFloat array of frame-wise vectors of cent bin values representing each contour
contoursSaliencesVectorVectorFloat array of frame-wise vectors of pitch saliences representing each contour
contoursStartTimesVectorFloat array of the start times of each contour [s]
durationnumber time duration of the input signal [s]
binResolutionnumber <optional> 10 salience function bin resolution [cents]
filterIterationsnumber <optional> 3 number of interations for the octave errors / pitch outlier filtering process
guessUnvoicedboolean <optional> false Estimate pitch for non-voiced segments by using non-salient contours when no salient ones are present in a frame
hopSizenumber <optional> 128 the hop size with which the pitch salience function was computed
maxFrequencynumber <optional> 20000 the minimum allowed frequency for salience function peaks (ignore contours with peaks above) [Hz]
minFrequencynumber <optional> 80 the minimum allowed frequency for salience function peaks (ignore contours with peaks below) [Hz]
referenceFrequencynumber <optional> 55 the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin
sampleRatenumber <optional> 44100 the sampling rate of the audio signal (Hz)
Returns
Details
-
PitchFilter( pitch, pitchConfidence [, confidenceThreshold [, minChunkSize [, useAbsolutePitchConfidence ] ] ] ) → {object}
-
Description
This algorithm corrects the fundamental frequency estimations for a sequence of frames given pitch values together with their confidence values. In particular, it removes non-confident parts and spurious jumps in pitch and applies octave corrections. Check https://essentia.upf.edu/reference/std_PitchFilter.html for more details.
Parameters
Name Type Attributes Default Description pitchVectorFloat vector of pitch values for the input frames [Hz]
pitchConfidenceVectorFloat vector of pitch confidence values for the input frames
confidenceThresholdnumber <optional> 36 ratio between the average confidence of the most confident chunk and the minimum allowed average confidence of a chunk
minChunkSizenumber <optional> 30 minumum number of frames in non-zero pitch chunks
useAbsolutePitchConfidenceboolean <optional> false treat negative pitch confidence values as positive (use with melodia guessUnvoiced=True)
Returns
Details
-
PitchMelodia( signal [, binResolution [, filterIterations [, frameSize [, guessUnvoiced [, harmonicWeight [, hopSize [, magnitudeCompression [, magnitudeThreshold [, maxFrequency [, minDuration [, minFrequency [, numberHarmonics [, peakDistributionThreshold [, peakFrameThreshold [, pitchContinuity [, referenceFrequency [, sampleRate [, timeContinuity ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm estimates the fundamental frequency corresponding to the melody of a monophonic music signal based on the MELODIA algorithm. While the algorithm is originally designed to extract the predominant melody from polyphonic music [1], this implementation is adapted for monophonic signals. The approach is based on the creation and characterization of pitch contours, time continuous sequences of pitch candidates grouped using auditory streaming cues. To this end, PitchSalienceFunction, PitchSalienceFunctionPeaks, PitchContours, and PitchContoursMonoMelody algorithms are employed. It is strongly advised to use the default parameter values which are optimized according to [1] (where further details are provided) except for minFrequency and maxFrequency, which will depend on your application. Check https://essentia.upf.edu/reference/std_PitchMelodia.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input signal
binResolutionnumber <optional> 10 salience function bin resolution [cents]
filterIterationsnumber <optional> 3 number of iterations for the octave errors / pitch outlier filtering process
frameSizenumber <optional> 2048 the frame size for computing pitch saliecnce
guessUnvoicedboolean <optional> false estimate pitch for non-voiced segments by using non-salient contours when no salient ones are present in a frame
harmonicWeightnumber <optional> 0.8 harmonic weighting parameter (weight decay ratio between two consequent harmonics, =1 for no decay)
hopSizenumber <optional> 128 the hop size with which the pitch salience function was computed
magnitudeCompressionnumber <optional> 1 magnitude compression parameter for the salience function (=0 for maximum compression, =1 for no compression)
magnitudeThresholdnumber <optional> 40 spectral peak magnitude threshold (maximum allowed difference from the highest peak in dBs)
maxFrequencynumber <optional> 20000 the minimum allowed frequency for salience function peaks (ignore contours with peaks above) [Hz]
minDurationnumber <optional> 100 the minimum allowed contour duration [ms]
minFrequencynumber <optional> 40 the minimum allowed frequency for salience function peaks (ignore contours with peaks below) [Hz]
numberHarmonicsnumber <optional> 20 number of considered harmonics
peakDistributionThresholdnumber <optional> 0.9 allowed deviation below the peak salience mean over all frames (fraction of the standard deviation)
peakFrameThresholdnumber <optional> 0.9 per-frame salience threshold factor (fraction of the highest peak salience in a frame)
pitchContinuitynumber <optional> 27.5625 pitch continuity cue (maximum allowed pitch change during 1 ms time period) [cents]
referenceFrequencynumber <optional> 55 the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
timeContinuitynumber <optional> 100 time continuity cue (the maximum allowed gap duration for a pitch contour) [ms]
Returns
Details
-
PitchSalience( spectrum [, highBoundary [, lowBoundary [, sampleRate ] ] ] ) → {object}
-
Description
This algorithm computes the pitch salience of a spectrum. The pitch salience is given by the ratio of the highest auto correlation value of the spectrum to the non-shifted auto correlation value. Pitch salience was designed as quick measure of tone sensation. Unpitched sounds (non-musical sound effects) and pure tones have an average pitch salience value close to 0 whereas sounds containing several harmonics in the spectrum tend to have a higher value. Check https://essentia.upf.edu/reference/std_PitchSalience.html for more details.
Parameters
Name Type Attributes Default Description spectrumVectorFloat the input audio spectrum
highBoundarynumber <optional> 5000 until which frequency we are looking for the minimum (must be smaller than half sampleRate) [Hz]
lowBoundarynumber <optional> 100 from which frequency we are looking for the maximum (must not be larger than highBoundary) [Hz]
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
PitchSalienceFunction( frequencies, magnitudes [, binResolution [, harmonicWeight [, magnitudeCompression [, magnitudeThreshold [, numberHarmonics [, referenceFrequency ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes the pitch salience function of a signal frame given its spectral peaks. The salience function covers a pitch range of nearly five octaves (i.e., 6000 cents), starting from the "referenceFrequency", and is quantized into cent bins according to the specified "binResolution". The salience of a given frequency is computed as the sum of the weighted energies found at integer multiples (harmonics) of that frequency. Check https://essentia.upf.edu/reference/std_PitchSalienceFunction.html for more details.
Parameters
Name Type Attributes Default Description frequenciesVectorFloat the frequencies of the spectral peaks [Hz]
magnitudesVectorFloat the magnitudes of the spectral peaks
binResolutionnumber <optional> 10 salience function bin resolution [cents]
harmonicWeightnumber <optional> 0.8 harmonic weighting parameter (weight decay ratio between two consequent harmonics, =1 for no decay)
magnitudeCompressionnumber <optional> 1 magnitude compression parameter (=0 for maximum compression, =1 for no compression)
magnitudeThresholdnumber <optional> 40 peak magnitude threshold (maximum allowed difference from the highest peak in dBs)
numberHarmonicsnumber <optional> 20 number of considered harmonics
referenceFrequencynumber <optional> 55 the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin
Returns
Details
-
PitchSalienceFunctionPeaks( salienceFunction [, binResolution [, maxFrequency [, minFrequency [, referenceFrequency ] ] ] ] ) → {object}
-
Description
This algorithm computes the peaks of a given pitch salience function. Check https://essentia.upf.edu/reference/std_PitchSalienceFunctionPeaks.html for more details.
Parameters
Name Type Attributes Default Description salienceFunctionVectorFloat the array of salience function values corresponding to cent frequency bins
binResolutionnumber <optional> 10 salience function bin resolution [cents]
maxFrequencynumber <optional> 1760 the maximum frequency to evaluate (ignore peaks above) [Hz]
minFrequencynumber <optional> 55 the minimum frequency to evaluate (ignore peaks below) [Hz]
referenceFrequencynumber <optional> 55 the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin
Returns
Details
-
PitchYin( signal [, frameSize [, interpolate [, maxFrequency [, minFrequency [, sampleRate [, tolerance ] ] ] ] ] ] ) → {object}
-
Description
This algorithm estimates the fundamental frequency given the frame of a monophonic music signal. It is an implementation of the Yin algorithm [1] for computations in the time domain. Check https://essentia.upf.edu/reference/std_PitchYin.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input signal frame
frameSizenumber <optional> 2048 number of samples in the input frame (this is an optional parameter to optimize memory allocation)
interpolateboolean <optional> true enable interpolation
maxFrequencynumber <optional> 22050 the maximum allowed frequency [Hz]
minFrequencynumber <optional> 20 the minimum allowed frequency [Hz]
sampleRatenumber <optional> 44100 sampling rate of the input audio [Hz]
tolerancenumber <optional> 0.15 tolerance for peak detection
Returns
Details
-
PitchYinFFT( spectrum [, frameSize [, interpolate [, maxFrequency [, minFrequency [, sampleRate [, tolerance ] ] ] ] ] ] ) → {object}
-
Description
This algorithm estimates the fundamental frequency given the spectrum of a monophonic music signal. It is an implementation of YinFFT algorithm [1], which is an optimized version of Yin algorithm for computation in the frequency domain. It is recommended to window the input spectrum with a Hann window. The raw spectrum can be computed with the Spectrum algorithm. Check https://essentia.upf.edu/reference/std_PitchYinFFT.html for more details.
Parameters
Name Type Attributes Default Description spectrumVectorFloat the input spectrum (preferably created with a hann window)
frameSizenumber <optional> 2048 number of samples in the input spectrum
interpolateboolean <optional> true boolean flag to enable interpolation
maxFrequencynumber <optional> 22050 the maximum allowed frequency [Hz]
minFrequencynumber <optional> 20 the minimum allowed frequency [Hz]
sampleRatenumber <optional> 44100 sampling rate of the input spectrum [Hz]
tolerancenumber <optional> 1 tolerance for peak detection
Returns
Details
-
PitchYinProbabilistic( signal [, frameSize [, hopSize [, lowRMSThreshold [, outputUnvoiced [, preciseTime [, sampleRate ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes the pitch track of a mono audio signal using probabilistic Yin algorithm. Check https://essentia.upf.edu/reference/std_PitchYinProbabilistic.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input mono audio signal
frameSizenumber <optional> 2048 the frame size of FFT
hopSizenumber <optional> 256 the hop size with which the pitch is computed
lowRMSThresholdnumber <optional> 0.1 the low RMS amplitude threshold
outputUnvoicedstring <optional> negative whether output unvoiced frame, zero: output non-voiced pitch as 0.; abs: output non-voiced pitch as absolute values; negative: output non-voiced pitch as negative values
preciseTimeboolean <optional> false use non-standard precise YIN timing (slow).
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
PitchYinProbabilities( signal [, frameSize [, lowAmp [, preciseTime [, sampleRate ] ] ] ] ) → {object}
-
Description
This algorithm estimates the fundamental frequencies, their probabilities given the frame of a monophonic music signal. It is a part of the implementation of the probabilistic Yin algorithm [1]. Check https://essentia.upf.edu/reference/std_PitchYinProbabilities.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input signal frame
frameSizenumber <optional> 2048 number of samples in the input frame
lowAmpnumber <optional> 0.1 the low RMS amplitude threshold
preciseTimeboolean <optional> false use non-standard precise YIN timing (slow).
sampleRatenumber <optional> 44100 sampling rate of the input audio [Hz]
Returns
Details
-
PitchYinProbabilitiesHMM( pitchCandidates, probabilities [, minFrequency [, numberBinsPerSemitone [, selfTransition [, yinTrust ] ] ] ] ) → {object}
-
Description
This algorithm estimates the smoothed fundamental frequency given the pitch candidates and probabilities using hidden Markov models. It is a part of the implementation of the probabilistic Yin algorithm [1]. Check https://essentia.upf.edu/reference/std_PitchYinProbabilitiesHMM.html for more details.
Parameters
Name Type Attributes Default Description pitchCandidatesVectorVectorFloat the pitch candidates
probabilitiesVectorVectorFloat the pitch probabilities
minFrequencynumber <optional> 61.735 minimum detected frequency
numberBinsPerSemitonenumber <optional> 5 number of bins per semitone
selfTransitionnumber <optional> 0.99 the self transition probabilities
yinTrustnumber <optional> 0.5 the yin trust parameter
Returns
Details
-
PowerMean( array [, power ] ) → {object}
-
Description
This algorithm computes the power mean of an array. It accepts one parameter, p, which is the power (or order or degree) of the Power Mean. Note that if p=-1, the Power Mean is equal to the Harmonic Mean, if p=0, the Power Mean is equal to the Geometric Mean, if p=1, the Power Mean is equal to the Arithmetic Mean, if p=2, the Power Mean is equal to the Root Mean Square. Check https://essentia.upf.edu/reference/std_PowerMean.html for more details.
Parameters
Name Type Attributes Default Description arrayVectorFloat the input array (must contain only positive real numbers)
powernumber <optional> 1 the power to which to elevate each element before taking the mean
Returns
Details
-
PowerSpectrum( signal [, size ] ) → {object}
-
Description
This algorithm computes the power spectrum of an array of Reals. The resulting power spectrum has a size which is half the size of the input array plus one. Bins contain squared magnitude values. Check https://essentia.upf.edu/reference/std_PowerSpectrum.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input signal
sizenumber <optional> 2048 the expected size of the input frame (this is purely optional and only targeted at optimizing the creation time of the FFT object)
Returns
Details
-
PredominantPitchMelodia( signal [, binResolution [, filterIterations [, frameSize [, guessUnvoiced [, harmonicWeight [, hopSize [, magnitudeCompression [, magnitudeThreshold [, maxFrequency [, minDuration [, minFrequency [, numberHarmonics [, peakDistributionThreshold [, peakFrameThreshold [, pitchContinuity [, referenceFrequency [, sampleRate [, timeContinuity [, voiceVibrato [, voicingTolerance ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm estimates the fundamental frequency of the predominant melody from polyphonic music signals using the MELODIA algorithm. It is specifically suited for music with a predominent melodic element, for example the singing voice melody in an accompanied singing recording. The approach [1] is based on the creation and characterization of pitch contours, time continuous sequences of pitch candidates grouped using auditory streaming cues. It furthermore determines for each frame, if the predominant melody is present or not. To this end, PitchSalienceFunction, PitchSalienceFunctionPeaks, PitchContours, and PitchContoursMelody algorithms are employed. It is strongly advised to use the default parameter values which are optimized according to [1] (where further details are provided) except for minFrequency, maxFrequency, and voicingTolerance, which will depend on your application. Check https://essentia.upf.edu/reference/std_PredominantPitchMelodia.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input signal
binResolutionnumber <optional> 10 salience function bin resolution [cents]
filterIterationsnumber <optional> 3 number of iterations for the octave errors / pitch outlier filtering process
frameSizenumber <optional> 2048 the frame size for computing pitch salience
guessUnvoicedboolean <optional> false estimate pitch for non-voiced segments by using non-salient contours when no salient ones are present in a frame
harmonicWeightnumber <optional> 0.8 harmonic weighting parameter (weight decay ratio between two consequent harmonics, =1 for no decay)
hopSizenumber <optional> 128 the hop size with which the pitch salience function was computed
magnitudeCompressionnumber <optional> 1 magnitude compression parameter for the salience function (=0 for maximum compression, =1 for no compression)
magnitudeThresholdnumber <optional> 40 spectral peak magnitude threshold (maximum allowed difference from the highest peak in dBs)
maxFrequencynumber <optional> 20000 the minimum allowed frequency for salience function peaks (ignore contours with peaks above) [Hz]
minDurationnumber <optional> 100 the minimum allowed contour duration [ms]
minFrequencynumber <optional> 80 the minimum allowed frequency for salience function peaks (ignore contours with peaks below) [Hz]
numberHarmonicsnumber <optional> 20 number of considered harmonics
peakDistributionThresholdnumber <optional> 0.9 allowed deviation below the peak salience mean over all frames (fraction of the standard deviation)
peakFrameThresholdnumber <optional> 0.9 per-frame salience threshold factor (fraction of the highest peak salience in a frame)
pitchContinuitynumber <optional> 27.5625 pitch continuity cue (maximum allowed pitch change during 1 ms time period) [cents]
referenceFrequencynumber <optional> 55 the reference frequency for Hertz to cent conversion [Hz], corresponding to the 0th cent bin
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
timeContinuitynumber <optional> 100 time continuity cue (the maximum allowed gap duration for a pitch contour) [ms]
voiceVibratoboolean <optional> false detect voice vibrato
voicingTolerancenumber <optional> 0.2 allowed deviation below the average contour mean salience of all contours (fraction of the standard deviation)
Returns
Details
-
RMS( array ) → {object}
-
Description
This algorithm computes the root mean square (quadratic mean) of an array. RMS is not defined for empty arrays. In such case, an exception will be thrown . References: [1] Root mean square - Wikipedia, the free encyclopedia, http://en.wikipedia.org/wiki/Root_mean_square Check https://essentia.upf.edu/reference/std_RMS.html for more details.
Parameters
Name Type Description arrayVectorFloat the input array
Returns
Details
-
RawMoments( array [, range ] ) → {object}
-
Description
This algorithm computes the first 5 raw moments of an array. The output array is of size 6 because the zero-ith moment is used for padding so that the first moment corresponds to index 1. Check https://essentia.upf.edu/reference/std_RawMoments.html for more details.
Parameters
Name Type Attributes Default Description arrayVectorFloat the input array
rangenumber <optional> 22050 the range of the input array, used for normalizing the results
Returns
Details
-
ReplayGain( signal [, sampleRate ] ) → {object}
-
Description
This algorithm computes the Replay Gain loudness value of an audio signal. The algorithm is described in detail in [1]. The value returned is the 'standard' ReplayGain value, not the value with 6dB preamplification as computed by lame, mp3gain, vorbisgain, and all widely used ReplayGain programs. Check https://essentia.upf.edu/reference/std_ReplayGain.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input audio signal (must be longer than 0.05ms)
sampleRatenumber <optional> 44100 the sampling rate of the input audio signal [Hz]
Returns
Details
-
Resample( signal [, inputSampleRate [, outputSampleRate [, quality ] ] ] ) → {object}
-
Description
This algorithm resamples the input signal to the desired sampling rate. Check https://essentia.upf.edu/reference/std_Resample.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input signal
inputSampleRatenumber <optional> 44100 the sampling rate of the input signal [Hz]
outputSampleRatenumber <optional> 44100 the sampling rate of the output signal [Hz]
qualitynumber <optional> 1 the quality of the conversion, 0 for best quality
Returns
Details
-
ResampleFFT( input [, inSize [, outSize ] ] ) → {object}
-
Description
This algorithm resamples a sequence using FFT / IFFT. The input and output sizes must be an even number. (It is meant to be eqivalent to the resample function in Numpy). Check https://essentia.upf.edu/reference/std_ResampleFFT.html for more details.
Parameters
Name Type Attributes Default Description inputVectorFloat input array
inSizenumber <optional> 128 the size of the input sequence. It needss to be even-sized.
outSizenumber <optional> 128 the size of the output sequence. It needss to be even-sized.
Returns
Details
-
RhythmDescriptors( signal ) → {object}
-
Description
This algorithm computes rhythm features (bpm, beat positions, beat histogram peaks) for an audio signal. It combines RhythmExtractor2013 for beat tracking and BPM estimation with BpmHistogramDescriptors algorithms. Check https://essentia.upf.edu/reference/std_RhythmDescriptors.html for more details.
Parameters
Name Type Description signalVectorFloat the audio input signal
Returns
Details
-
RhythmExtractor( signal [, frameHop [, frameSize [, hopSize [, lastBeatInterval [, maxTempo [, minTempo [, numberFrames [, sampleRate [, tempoHints [, tolerance [, useBands [, useOnset ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm estimates the tempo in bpm and beat positions given an audio signal. The algorithm combines several periodicity functions and estimates beats using TempoTap and TempoTapTicks. It combines: - onset detection functions based on high-frequency content (see OnsetDetection) - complex-domain spectral difference function (see OnsetDetection) - periodicity function based on energy bands (see FrequencyBands, TempoScaleBands) Check https://essentia.upf.edu/reference/std_RhythmExtractor.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the audio input signal
frameHopnumber <optional> 1024 the number of feature frames separating two evaluations
frameSizenumber <optional> 1024 the number audio samples used to compute a feature
hopSizenumber <optional> 256 the number of audio samples per features
lastBeatIntervalnumber <optional> 0.1 the minimum interval between last beat and end of file [s]
maxTemponumber <optional> 208 the fastest tempo to detect [bpm]
minTemponumber <optional> 40 the slowest tempo to detect [bpm]
numberFramesnumber <optional> 1024 the number of feature frames to buffer on
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
tempoHintsArray.<any> <optional> [] the optional list of initial beat locations, to favor the detection of pre-determined tempo period and beats alignment [s]
tolerancenumber <optional> 0.24 the minimum interval between two consecutive beats [s]
useBandsboolean <optional> true whether or not to use band energy as periodicity function
useOnsetboolean <optional> true whether or not to use onsets as periodicity function
Returns
Details
-
RhythmExtractor2013( signal [, maxTempo [, method [, minTempo ] ] ] ) → {object}
-
Description
This algorithm extracts the beat positions and estimates their confidence as well as tempo in bpm for an audio signal. The beat locations can be computed using: - 'multifeature', the BeatTrackerMultiFeature algorithm - 'degara', the BeatTrackerDegara algorithm (note that there is no confidence estimation for this method, the output confidence value is always 0) Check https://essentia.upf.edu/reference/std_RhythmExtractor2013.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the audio input signal
maxTemponumber <optional> 208 the fastest tempo to detect [bpm]
methodstring <optional> multifeature the method used for beat tracking
minTemponumber <optional> 40 the slowest tempo to detect [bpm]
Returns
Details
-
RhythmTransform( melBands [, frameSize [, hopSize ] ] ) → {object}
-
Description
This algorithm implements the rhythm transform. It computes a tempogram, a representation of rhythmic periodicities in the input signal in the rhythm domain, by using FFT similarly to computation of spectrum in the frequency domain [1]. Additional features, including rhythmic centroid and a rhythmic counterpart of MFCCs, can be derived from this rhythmic representation. Check https://essentia.upf.edu/reference/std_RhythmTransform.html for more details.
Parameters
Name Type Attributes Default Description melBandsVectorVectorFloat the energies in the mel bands
frameSizenumber <optional> 256 the frame size to compute the rhythm trasform
hopSizenumber <optional> 32 the hop size to compute the rhythm transform
Returns
Details
-
RollOff( spectrum [, cutoff [, sampleRate ] ] ) → {object}
-
Description
This algorithm computes the roll-off frequency of a spectrum. The roll-off frequency is defined as the frequency under which some percentage (cutoff) of the total energy of the spectrum is contained. The roll-off frequency can be used to distinguish between harmonic (below roll-off) and noisy sounds (above roll-off). Check https://essentia.upf.edu/reference/std_RollOff.html for more details.
Parameters
Name Type Attributes Default Description spectrumVectorFloat the input audio spectrum (must have more than one elements)
cutoffnumber <optional> 0.85 the ratio of total energy to attain before yielding the roll-off frequency
sampleRatenumber <optional> 44100 the sampling rate of the audio signal (used to normalize rollOff) [Hz]
Returns
Details
-
SNR( frame [, MAAlpha [, MMSEAlpha [, NoiseAlpha [, frameSize [, noiseThreshold [, sampleRate [, useBroadbadNoiseCorrection ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes the SNR of the input audio in a frame-wise manner. The algorithm assumes that: 1. The noise is gaussian. 2. There is a region of noise (without signal) at the beginning of the stream in order to estimate the PSD of the noise.[1] Once the noise PSD is estimated, the algorithm relies on the Ephraim-Malah [2] recursion to estimate the SNR for each frequency bin. The algorithm also returns an overall (a single value for the whole spectrum) SNR estimation and an averaged overall SNR estimation using Exponential Moving Average filtering. This algorithm throws a Warning if less than 15 frames are used to estimte the noise PSD. Check https://essentia.upf.edu/reference/std_SNR.html for more details.
Parameters
Name Type Attributes Default Description frameVectorFloat the input audio frame
MAAlphanumber <optional> 0.95 Alpha coefficient for the EMA SNR estimation [2]
MMSEAlphanumber <optional> 0.98 Alpha coefficient for the MMSE estimation [1].
NoiseAlphanumber <optional> 0.9 Alpha coefficient for the EMA noise estimation [2]
frameSizenumber <optional> 512 the size of the input frame
noiseThresholdnumber <optional> -40 Threshold to detect frames without signal
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
useBroadbadNoiseCorrectionboolean <optional> true flag to apply the -10 * log10(BW) broadband noise correction factor
Returns
Details
-
SaturationDetector( frame [, differentialThreshold [, energyThreshold [, frameSize [, hopSize [, minimumDuration [, sampleRate ] ] ] ] ] ] ) → {object}
-
Description
this algorithm outputs the staring/ending locations of the saturated regions in seconds. Saturated regions are found by means of a tripe criterion: 1. samples in a saturated region should have more energy than a given threshold. 2. the difference between the samples in a saturated region should be smaller than a given threshold. 3. the duration of the saturated region should be longer than a given threshold. Check https://essentia.upf.edu/reference/std_SaturationDetector.html for more details.
Parameters
Name Type Attributes Default Description frameVectorFloat the input audio frame
differentialThresholdnumber <optional> 0.001 minimum difference between contiguous samples of the salturated regions
energyThresholdnumber <optional> -1 mininimum energy of the samples in the saturated regions [dB]
frameSizenumber <optional> 512 expected input frame size
hopSizenumber <optional> 256 hop size used for the analysis
minimumDurationnumber <optional> 0.005 minimum duration of the saturated regions [ms]
sampleRatenumber <optional> 44100 sample rate used for the analysis
Returns
Details
-
Scale( signal [, clipping [, factor [, maxAbsValue ] ] ] ) → {object}
-
Description
This algorithm scales the audio by the specified factor using clipping if required. Check https://essentia.upf.edu/reference/std_Scale.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input audio signal
clippingboolean <optional> true boolean flag whether to apply clipping or not
factornumber <optional> 10 the multiplication factor by which the audio will be scaled
maxAbsValuenumber <optional> 1 the maximum value above which to apply clipping
Returns
Details
-
SineSubtraction( frame, magnitudes, frequencies, phases [, fftSize [, hopSize [, sampleRate ] ] ] ) → {object}
-
Description
This algorithm subtracts the sinusoids computed with the sine model analysis from an input audio signal. It ouputs an audio signal. Check https://essentia.upf.edu/reference/std_SineSubtraction.html for more details.
Parameters
Name Type Attributes Default Description frameVectorFloat the input audio frame to subtract from
magnitudesVectorFloat the magnitudes of the sinusoidal peaks
frequenciesVectorFloat the frequencies of the sinusoidal peaks [Hz]
phasesVectorFloat the phases of the sinusoidal peaks
fftSizenumber <optional> 512 the size of the FFT internal process (full spectrum size) and output frame. Minimum twice the hopsize.
hopSizenumber <optional> 128 the hop size between frames
sampleRatenumber <optional> 44100 the audio sampling rate [Hz]
Returns
Details
-
SingleBeatLoudness( beat [, beatDuration [, beatWindowDuration [, frequencyBands [, onsetStart [, sampleRate ] ] ] ] ] ) → {object}
-
Description
This algorithm computes the spectrum energy of a single beat across the whole frequency range and on each specified frequency band given an audio segment. It detects the onset of the beat within the input segment, computes spectrum on a window starting on this onset, and estimates energy (see Energy and EnergyBandRatio algorithms). The frequency bands used by default are: 0-200 Hz, 200-400 Hz, 400-800 Hz, 800-1600 Hz, 1600-3200 Hz, 3200-22000Hz, following E. Scheirer [1]. Check https://essentia.upf.edu/reference/std_SingleBeatLoudness.html for more details.
Parameters
Name Type Attributes Default Description beatVectorFloat audio segement containing a beat
beatDurationnumber <optional> 0.05 window size for the beat's energy computation (the window starts at the onset) [s]
beatWindowDurationnumber <optional> 0.1 window size for the beat's onset detection [s]
frequencyBandsArray.<any> <optional> [0, 200, 400, 800, 1600, 3200, 22000] frequency bands
onsetStartstring <optional> sumEnergy criteria for finding the start of the beat
sampleRatenumber <optional> 44100 the audio sampling rate [Hz]
Returns
Details
-
Slicer( audio [, endTimes [, sampleRate [, startTimes [, timeUnits ] ] ] ] ) → {object}
-
Description
This algorithm splits an audio signal into segments given their start and end times. Check https://essentia.upf.edu/reference/std_Slicer.html for more details.
Parameters
Name Type Attributes Default Description audioVectorFloat the input audio signal
endTimesArray.<any> <optional> [] the list of end times for the slices you want to extract
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
startTimesArray.<any> <optional> [] the list of start times for the slices you want to extract
timeUnitsstring <optional> seconds the units of time of the start and end times
Returns
Details
-
SpectralCentroidTime( array [, sampleRate ] ) → {object}
-
Description
This algorithm computes the spectral centroid of a signal in time domain. A first difference filter is applied to the input signal. Then the centroid is computed by dividing the norm of the resulting signal by the norm of the input signal. The centroid is given in hertz. References: [1] Udo Zölzer (2002). DAFX Digital Audio Effects pag.364-365 Check https://essentia.upf.edu/reference/std_SpectralCentroidTime.html for more details.
Parameters
Name Type Attributes Default Description arrayVectorFloat the input array
sampleRatenumber <optional> 44100 sampling rate of the input spectrum [Hz]
Returns
Details
-
SpectralComplexity( spectrum [, magnitudeThreshold [, sampleRate ] ] ) → {object}
-
Description
This algorithm computes the spectral complexity of a spectrum. The spectral complexity is based on the number of peaks in the input spectrum. Check https://essentia.upf.edu/reference/std_SpectralComplexity.html for more details.
Parameters
Name Type Attributes Default Description spectrumVectorFloat the input spectrum
magnitudeThresholdnumber <optional> 0.005 the minimum spectral-peak magnitude that contributes to spectral complexity
sampleRatenumber <optional> 44100 the audio sampling rate [Hz]
Returns
Details
-
SpectralContrast( spectrum [, frameSize [, highFrequencyBound [, lowFrequencyBound [, neighbourRatio [, numberBands [, sampleRate [, staticDistribution ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes the Spectral Contrast feature of a spectrum. It is based on the Octave Based Spectral Contrast feature as described in [1]. The version implemented here is a modified version to improve discriminative power and robustness. The modifications are described in [2]. Check https://essentia.upf.edu/reference/std_SpectralContrast.html for more details.
Parameters
Name Type Attributes Default Description spectrumVectorFloat the audio spectrum
frameSizenumber <optional> 2048 the size of the fft frames
highFrequencyBoundnumber <optional> 11000 the upper bound of the highest band
lowFrequencyBoundnumber <optional> 20 the lower bound of the lowest band
neighbourRationumber <optional> 0.4 the ratio of the bins in the sub band used to calculate the peak and valley
numberBandsnumber <optional> 6 the number of bands in the filter
sampleRatenumber <optional> 22050 the sampling rate of the audio signal
staticDistributionnumber <optional> 0.15 the ratio of the bins to distribute equally
Returns
Details
-
SpectralPeaks( spectrum [, magnitudeThreshold [, maxFrequency [, maxPeaks [, minFrequency [, orderBy [, sampleRate ] ] ] ] ] ] ) → {object}
-
Description
This algorithm extracts peaks from a spectrum. It is important to note that the peak algorithm is independent of an input that is linear or in dB, so one has to adapt the threshold to fit with the type of data fed to it. The algorithm relies on PeakDetection algorithm which is run with parabolic interpolation [1]. The exactness of the peak-searching depends heavily on the windowing type. It gives best results with dB input, a blackman-harris 92dB window and interpolation set to true. According to [1], spectral peak frequencies tend to be about twice as accurate when dB magnitude is used rather than just linear magnitude. For further information about the peak detection, see the description of the PeakDetection algorithm. Check https://essentia.upf.edu/reference/std_SpectralPeaks.html for more details.
Parameters
Name Type Attributes Default Description spectrumVectorFloat the input spectrum
magnitudeThresholdnumber <optional> 0 peaks below this given threshold are not outputted
maxFrequencynumber <optional> 5000 the maximum frequency of the range to evaluate [Hz]
maxPeaksnumber <optional> 100 the maximum number of returned peaks
minFrequencynumber <optional> 0 the minimum frequency of the range to evaluate [Hz]
orderBystring <optional> frequency the ordering type of the outputted peaks (ascending by frequency or descending by magnitude)
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
SpectralWhitening( spectrum, frequencies, magnitudes [, maxFrequency [, sampleRate ] ] ) → {object}
-
Description
Performs spectral whitening of spectral peaks of a spectrum. The algorithm works in dB scale, but the conversion is done by the algorithm so input should be in linear scale. The concept of 'whitening' refers to 'white noise' or a non-zero flat spectrum. It first computes a spectral envelope similar to the 'true envelope' in [1], and then modifies the amplitude of each peak relative to the envelope. For example, the predominant peaks will have a value close to 0dB because they are very close to the envelope. On the other hand, minor peaks between significant peaks will have lower amplitudes such as -30dB. Check https://essentia.upf.edu/reference/std_SpectralWhitening.html for more details.
Parameters
Name Type Attributes Default Description spectrumVectorFloat the audio linear spectrum
frequenciesVectorFloat the spectral peaks' linear frequencies
magnitudesVectorFloat the spectral peaks' linear magnitudes
maxFrequencynumber <optional> 5000 max frequency to apply whitening to [Hz]
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
Spectrum( frame [, size ] ) → {object}
-
Description
This algorithm computes the magnitude spectrum of an array of Reals. The resulting magnitude spectrum has a size which is half the size of the input array plus one. Bins contain raw (linear) magnitude values. Check https://essentia.upf.edu/reference/std_Spectrum.html for more details.
Parameters
Name Type Attributes Default Description frameVectorFloat the input audio frame
sizenumber <optional> 2048 the expected size of the input audio signal (this is an optional parameter to optimize memory allocation)
Returns
Details
-
SpectrumCQ( frame [, binsPerOctave [, minFrequency [, minimumKernelSize [, numberBins [, sampleRate [, scale [, threshold [, windowType [, zeroPhase ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes the magnitude of the Constant-Q spectrum. See ConstantQ algorithm for more details. Check https://essentia.upf.edu/reference/std_SpectrumCQ.html for more details.
Parameters
Name Type Attributes Default Description frameVectorFloat the input audio frame
binsPerOctavenumber <optional> 12 number of bins per octave
minFrequencynumber <optional> 32.7 minimum frequency [Hz]
minimumKernelSizenumber <optional> 4 minimum size allowed for frequency kernels
numberBinsnumber <optional> 84 number of frequency bins, starting at minFrequency
sampleRatenumber <optional> 44100 FFT sampling rate [Hz]
scalenumber <optional> 1 filters scale. Larger values use longer windows
thresholdnumber <optional> 0.01 bins whose magnitude is below this quantile are discarded
windowTypestring <optional> hann the window type, which can be 'hamming', 'hann', 'triangular', 'square' or 'blackmanharrisXX'
zeroPhaseboolean <optional> true a boolean value that enables zero-phase windowing. Input audio frames should be windowed with the same phase mode
Returns
Details
-
SpectrumToCent( spectrum [, bands [, centBinResolution [, inputSize [, log [, minimumFrequency [, normalize [, sampleRate [, type ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes energy in triangular frequency bands of a spectrum equally spaced on the cent scale. Each band is computed to have a constant wideness in the cent scale. For each band the power-spectrum (mag-squared) is summed. Check https://essentia.upf.edu/reference/std_SpectrumToCent.html for more details.
Parameters
Name Type Attributes Default Description spectrumVectorFloat the input spectrum (must be greater than size one)
bandsnumber <optional> 720 number of bins to compute. Default is 720 (6 octaves with the default 'centBinResolution')
centBinResolutionnumber <optional> 10 Width of each band in cents. Default is 10 cents
inputSizenumber <optional> 32768 the size of the spectrum
logboolean <optional> true compute log-energies (log10 (1 + energy))
minimumFrequencynumber <optional> 164 central frequency of the first band of the bank [Hz]
normalizestring <optional> unit_sum use unit area or vertex equal to 1 triangles.
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
typestring <optional> power use magnitude or power spectrum
Returns
Details
-
Spline( x [, beta1 [, beta2 [, type [, xPoints [, yPoints ] ] ] ] ] ) → {object}
-
Description
Evaluates a piecewise spline of type b, beta or quadratic. The input value, i.e. the point at which the spline is to be evaluated typically should be between xPoins[0] and xPoinst[size-1]. If the value lies outside this range, extrapolation is used. Regarding spline types: - B: evaluates a cubic B spline approximant. - Beta: evaluates a cubic beta spline approximant. For beta splines parameters 'beta1' and 'beta2' can be supplied. For no bias set beta1 to 1 and for no tension set beta2 to 0. Note that if beta1=1 and beta2=0, the cubic beta becomes a cubic B spline. On the other hand if beta1=1 and beta2 is large the beta spline turns into a linear spline. - Quadratic: evaluates a piecewise quadratic spline at a point. Note that size of input must be odd. Check https://essentia.upf.edu/reference/std_Spline.html for more details.
Parameters
Name Type Attributes Default Description xnumber the input coordinate (x-axis)
beta1number <optional> 1 the skew or bias parameter (only available for type beta)
beta2number <optional> 0 the tension parameter
typestring <optional> b the type of spline to be computed
xPointsArray.<any> <optional> [0, 1] the x-coordinates where data is specified (the points must be arranged in ascending order and cannot contain duplicates)
yPointsArray.<any> <optional> [0, 1] the y-coordinates to be interpolated (i.e. the known data)
Returns
Details
-
SprModelAnal( frame [, fftSize [, freqDevOffset [, freqDevSlope [, hopSize [, magnitudeThreshold [, maxFrequency [, maxPeaks [, maxnSines [, minFrequency [, orderBy [, sampleRate ] ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes the sinusoidal plus residual model analysis. Check https://essentia.upf.edu/reference/std_SprModelAnal.html for more details.
Parameters
Name Type Attributes Default Description frameVectorFloat the input frame
fftSizenumber <optional> 2048 the size of the internal FFT size (full spectrum size)
freqDevOffsetnumber <optional> 20 minimum frequency deviation at 0Hz
freqDevSlopenumber <optional> 0.01 slope increase of minimum frequency deviation
hopSizenumber <optional> 512 the hop size between frames
magnitudeThresholdnumber <optional> 0 peaks below this given threshold are not outputted
maxFrequencynumber <optional> 5000 the maximum frequency of the range to evaluate [Hz]
maxPeaksnumber <optional> 100 the maximum number of returned peaks
maxnSinesnumber <optional> 100 maximum number of sines per frame
minFrequencynumber <optional> 0 the minimum frequency of the range to evaluate [Hz]
orderBystring <optional> frequency the ordering type of the outputted peaks (ascending by frequency or descending by magnitude)
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
SprModelSynth( magnitudes, frequencies, phases, res [, fftSize [, hopSize [, sampleRate ] ] ] ) → {object}
-
Description
This algorithm computes the sinusoidal plus residual model synthesis from SPS model analysis. Check https://essentia.upf.edu/reference/std_SprModelSynth.html for more details.
Parameters
Name Type Attributes Default Description magnitudesVectorFloat the magnitudes of the sinusoidal peaks
frequenciesVectorFloat the frequencies of the sinusoidal peaks [Hz]
phasesVectorFloat the phases of the sinusoidal peaks
resVectorFloat the residual frame
fftSizenumber <optional> 2048 the size of the output FFT frame (full spectrum size)
hopSizenumber <optional> 512 the hop size between frames
sampleRatenumber <optional> 44100 the audio sampling rate [Hz]
Returns
Details
-
SpsModelAnal( frame [, fftSize [, freqDevOffset [, freqDevSlope [, hopSize [, magnitudeThreshold [, maxFrequency [, maxPeaks [, maxnSines [, minFrequency [, orderBy [, sampleRate [, stocf ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes the stochastic model analysis. Check https://essentia.upf.edu/reference/std_SpsModelAnal.html for more details.
Parameters
Name Type Attributes Default Description frameVectorFloat the input frame
fftSizenumber <optional> 2048 the size of the internal FFT size (full spectrum size)
freqDevOffsetnumber <optional> 20 minimum frequency deviation at 0Hz
freqDevSlopenumber <optional> 0.01 slope increase of minimum frequency deviation
hopSizenumber <optional> 512 the hop size between frames
magnitudeThresholdnumber <optional> 0 peaks below this given threshold are not outputted
maxFrequencynumber <optional> 5000 the maximum frequency of the range to evaluate [Hz]
maxPeaksnumber <optional> 100 the maximum number of returned peaks
maxnSinesnumber <optional> 100 maximum number of sines per frame
minFrequencynumber <optional> 0 the minimum frequency of the range to evaluate [Hz]
orderBystring <optional> frequency the ordering type of the outputted peaks (ascending by frequency or descending by magnitude)
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
stocfnumber <optional> 0.2 decimation factor used for the stochastic approximation
Returns
Details
-
SpsModelSynth( magnitudes, frequencies, phases, stocenv [, fftSize [, hopSize [, sampleRate [, stocf ] ] ] ] ) → {object}
-
Description
This algorithm computes the sinusoidal plus stochastic model synthesis from SPS model analysis. Check https://essentia.upf.edu/reference/std_SpsModelSynth.html for more details.
Parameters
Name Type Attributes Default Description magnitudesVectorFloat the magnitudes of the sinusoidal peaks
frequenciesVectorFloat the frequencies of the sinusoidal peaks [Hz]
phasesVectorFloat the phases of the sinusoidal peaks
stocenvVectorFloat the stochastic envelope
fftSizenumber <optional> 2048 the size of the output FFT frame (full spectrum size)
hopSizenumber <optional> 512 the hop size between frames
sampleRatenumber <optional> 44100 the audio sampling rate [Hz]
stocfnumber <optional> 0.2 decimation factor used for the stochastic approximation
Returns
Details
-
StartStopCut( audio [, frameSize [, hopSize [, maximumStartTime [, maximumStopTime [, sampleRate [, threshold ] ] ] ] ] ] ) → {object}
-
Description
This algorithm outputs if there is a cut at the beginning or at the end of the audio by locating the first and last non-silent frames and comparing their positions to the actual beginning and end of the audio. The input audio is considered to be cut at the beginning (or the end) and the corresponding flag is activated if the first (last) non-silent frame occurs before (after) the configurable time threshold. Check https://essentia.upf.edu/reference/std_StartStopCut.html for more details.
Parameters
Name Type Attributes Default Description audioVectorFloat the input audio
frameSizenumber <optional> 256 the frame size for the internal power analysis
hopSizenumber <optional> 256 the hop size for the internal power analysis
maximumStartTimenumber <optional> 10 if the first non-silent frame occurs before maximumStartTime startCut is activated [ms]
maximumStopTimenumber <optional> 10 if the last non-silent frame occurs after maximumStopTime to the end stopCut is activated [ms]
sampleRatenumber <optional> 44100 the sample rate
thresholdnumber <optional> -60 the threshold below which average energy is defined as silence [dB]
Returns
Details
-
StartStopSilence( frame [, threshold ] ) → {object}
-
Description
This algorithm outputs the frame at which sound begins and the frame at which sound ends. Check https://essentia.upf.edu/reference/std_StartStopSilence.html for more details.
Parameters
Name Type Attributes Default Description frameVectorFloat the input audio frames
thresholdnumber <optional> -60 the threshold below which average energy is defined as silence [dB]
Returns
Details
-
StochasticModelAnal( frame [, fftSize [, hopSize [, sampleRate [, stocf ] ] ] ] ) → {object}
-
Description
This algorithm computes the stochastic model analysis. It gets the resampled spectral envelope of the stochastic component. Check https://essentia.upf.edu/reference/std_StochasticModelAnal.html for more details.
Parameters
Name Type Attributes Default Description frameVectorFloat the input frame
fftSizenumber <optional> 2048 the size of the internal FFT size (full spectrum size)
hopSizenumber <optional> 512 the hop size between frames
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
stocfnumber <optional> 0.2 decimation factor used for the stochastic approximation
Returns
Details
-
StochasticModelSynth( stocenv [, fftSize [, hopSize [, sampleRate [, stocf ] ] ] ] ) → {object}
-
Description
This algorithm computes the stochastic model synthesis. It generates the noisy spectrum from a resampled spectral envelope of the stochastic component. Check https://essentia.upf.edu/reference/std_StochasticModelSynth.html for more details.
Parameters
Name Type Attributes Default Description stocenvVectorFloat the stochastic envelope input
fftSizenumber <optional> 2048 the size of the internal FFT size (full spectrum size)
hopSizenumber <optional> 512 the hop size between frames
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
stocfnumber <optional> 0.2 decimation factor used for the stochastic approximation
Returns
Details
-
StrongDecay( signal [, sampleRate ] ) → {object}
-
Description
This algorithm computes the Strong Decay of an audio signal. The Strong Decay is built from the non-linear combination of the signal energy and the signal temporal centroid, the latter being the balance of the absolute value of the signal. A signal containing a temporal centroid near its start boundary and a strong energy is said to have a strong decay. Check https://essentia.upf.edu/reference/std_StrongDecay.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input audio signal
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
StrongPeak( spectrum ) → {object}
-
Description
This algorithm computes the Strong Peak of a spectrum. The Strong Peak is defined as the ratio between the spectrum's maximum peak's magnitude and the "bandwidth" of the peak above a threshold (half its amplitude). This ratio reveals whether the spectrum presents a very "pronounced" maximum peak (i.e. the thinner and the higher the maximum of the spectrum is, the higher the ratio value). Check https://essentia.upf.edu/reference/std_StrongPeak.html for more details.
Parameters
Name Type Description spectrumVectorFloat the input spectrum (must be greater than one element and cannot contain negative values)
Returns
Details
-
SuperFluxExtractor( signal [, combine [, frameSize [, hopSize [, ratioThreshold [, sampleRate [, threshold ] ] ] ] ] ] ) → {object}
-
Description
This algorithm detects onsets given an audio signal using SuperFlux algorithm. This implementation is based on the available reference implementation in python [2]. The algorithm computes spectrum of the input signal, summarizes it into triangular band energies, and computes a onset detection function based on spectral flux tracking spectral trajectories with a maximum filter (SuperFluxNovelty). The peaks of the function are then detected (SuperFluxPeaks). Check https://essentia.upf.edu/reference/std_SuperFluxExtractor.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the audio input signal
combinenumber <optional> 20 time threshold for double onsets detections (ms)
frameSizenumber <optional> 2048 the frame size for computing low-level features
hopSizenumber <optional> 256 the hop size for computing low-level features
ratioThresholdnumber <optional> 16 ratio threshold for peak picking with respect to novelty_signal/novelty_average rate, use 0 to disable it (for low-energy onsets)
sampleRatenumber <optional> 44100 the audio sampling rate [Hz]
thresholdnumber <optional> 0.05 threshold for peak peaking with respect to the difference between novelty_signal and average_signal (for onsets in ambient noise)
Returns
Details
-
SuperFluxNovelty( bands [, binWidth [, frameWidth ] ] ) → {object}
-
Description
Onset detection function for Superflux algorithm. See SuperFluxExtractor for more details. Check https://essentia.upf.edu/reference/std_SuperFluxNovelty.html for more details.
Parameters
Name Type Attributes Default Description bandsVectorVectorFloat the input bands spectrogram
binWidthnumber <optional> 3 filter width (number of frequency bins)
frameWidthnumber <optional> 2 differentiation offset (compute the difference with the N-th previous frame)
Returns
Details
-
SuperFluxPeaks( novelty [, combine [, frameRate [, pre_avg [, pre_max [, ratioThreshold [, threshold ] ] ] ] ] ] ) → {object}
-
Description
This algorithm detects peaks of an onset detection function computed by the SuperFluxNovelty algorithm. See SuperFluxExtractor for more details. Check https://essentia.upf.edu/reference/std_SuperFluxPeaks.html for more details.
Parameters
Name Type Attributes Default Description noveltyVectorFloat the input onset detection function
combinenumber <optional> 30 time threshold for double onsets detections (ms)
frameRatenumber <optional> 172 frameRate
pre_avgnumber <optional> 100 look back duration for moving average filter [ms]
pre_maxnumber <optional> 30 look back duration for moving maximum filter [ms]
ratioThresholdnumber <optional> 16 ratio threshold for peak picking with respect to novelty_signal/novelty_average rate, use 0 to disable it (for low-energy onsets)
thresholdnumber <optional> 0.05 threshold for peak peaking with respect to the difference between novelty_signal and average_signal (for onsets in ambient noise)
Returns
Details
-
TCToTotal( envelope ) → {object}
-
Description
This algorithm calculates the ratio of the temporal centroid to the total length of a signal envelope. This ratio shows how the sound is 'balanced'. Its value is close to 0 if most of the energy lies at the beginning of the sound (e.g. decrescendo or impulsive sounds), close to 0.5 if the sound is symetric (e.g. 'delta unvarying' sounds), and close to 1 if most of the energy lies at the end of the sound (e.g. crescendo sounds). Check https://essentia.upf.edu/reference/std_TCToTotal.html for more details.
Parameters
Name Type Description envelopeVectorFloat the envelope of the signal (its length must be greater than 1
Returns
Details
-
TempoScaleBands( bands [, bandsGain [, frameTime ] ] ) → {object}
-
Description
This algorithm computes features for tempo tracking to be used with the TempoTap algorithm. See standard_rhythmextractor_tempotap in examples folder. Check https://essentia.upf.edu/reference/std_TempoScaleBands.html for more details.
Parameters
Name Type Attributes Default Description bandsVectorFloat the audio power spectrum divided into bands
bandsGainArray.<any> <optional> [2, 3, 2, 1, 1.20000004768, 2, 3, 2.5] gain for each bands
frameTimenumber <optional> 512 the frame rate in samples
Returns
Details
-
TempoTap( featuresFrame [, frameHop [, frameSize [, maxTempo [, minTempo [, numberFrames [, sampleRate [, tempoHints ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm estimates the periods and phases of a periodic signal, represented by a sequence of values of any number of detection functions, such as energy bands, onsets locations, etc. It requires to be sequentially run on a vector of such values ("featuresFrame") for each particular audio frame in order to get estimations related to that frames. The estimations are done for each detection function separately, utilizing the latest "frameHop" frames, including the present one, to compute autocorrelation. Empty estimations will be returned until enough frames are accumulated in the algorithm's buffer. The algorithm uses elements of the following beat-tracking methods: - BeatIt, elaborated by Fabien Gouyon and Simon Dixon (input features) [1] - Multi-comb filter with Rayleigh weighting, Mathew Davies [2] Check https://essentia.upf.edu/reference/std_TempoTap.html for more details.
Parameters
Name Type Attributes Default Description featuresFrameVectorFloat input temporal features of a frame
frameHopnumber <optional> 1024 number of feature frames separating two evaluations
frameSizenumber <optional> 256 number of audio samples in a frame
maxTemponumber <optional> 208 fastest tempo allowed to be detected [bpm]
minTemponumber <optional> 40 slowest tempo allowed to be detected [bpm]
numberFramesnumber <optional> 1024 number of feature frames to buffer on
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
tempoHintsArray.<any> <optional> [] optional list of initial beat locations, to favor the detection of pre-determined tempo period and beats alignment [s]
Returns
Details
-
TempoTapDegara( onsetDetections [, maxTempo [, minTempo [, resample [, sampleRateODF ] ] ] ] ) → {object}
-
Description
This algorithm estimates beat positions given an onset detection function. The detection function is partitioned into 6-second frames with a 1.5-second increment, and the autocorrelation is computed for each frame, and is weighted by a tempo preference curve [2]. Periodicity estimations are done frame-wisely, searching for the best match with the Viterbi algorith [3]. The estimated periods are then passed to the probabilistic beat tracking algorithm [1], which computes beat positions. Check https://essentia.upf.edu/reference/std_TempoTapDegara.html for more details.
Parameters
Name Type Attributes Default Description onsetDetectionsVectorFloat the input frame-wise vector of onset detection values
maxTemponumber <optional> 208 fastest tempo allowed to be detected [bpm]
minTemponumber <optional> 40 slowest tempo allowed to be detected [bpm]
resamplestring <optional> none use upsampling of the onset detection function (may increase accuracy)
sampleRateODFnumber <optional> 86.1328 the sampling rate of the onset detection function [Hz]
Returns
Details
-
TempoTapMaxAgreement( tickCandidates ) → {object}
-
Description
This algorithm outputs beat positions and confidence of their estimation based on the maximum mutual agreement between beat candidates estimated by different beat trackers (or using different features). Check https://essentia.upf.edu/reference/std_TempoTapMaxAgreement.html for more details.
Parameters
Name Type Description tickCandidatesVectorVectorFloat the tick candidates estimated using different beat trackers (or features) [s]
Returns
Details
-
TempoTapTicks( periods, phases [, frameHop [, hopSize [, sampleRate ] ] ] ) → {object}
-
Description
This algorithm builds the list of ticks from the period and phase candidates given by the TempoTap algorithm. Check https://essentia.upf.edu/reference/std_TempoTapTicks.html for more details.
Parameters
Name Type Attributes Default Description periodsVectorFloat tempo period candidates for the current frame, in frames
phasesVectorFloat tempo ticks phase candidates for the current frame, in frames
frameHopnumber <optional> 512 number of feature frames separating two evaluations
hopSizenumber <optional> 256 number of audio samples per features
sampleRatenumber <optional> 44100 sampling rate of the audio signal [Hz]
Returns
Details
-
TensorflowInputMusiCNN( frame ) → {object}
-
Description
This algorithm computes mel-bands with a particular parametrization specific to MusiCNN based models. Check https://essentia.upf.edu/reference/std_TensorflowInputMusiCNN.html for more details.
Parameters
Name Type Description frameVectorFloat the audio frame
Returns
Details
-
TensorflowInputVGGish( frame ) → {object}
-
Description
This algorithm computes mel-bands with a particular parametrization specific to VGGish based models. Check https://essentia.upf.edu/reference/std_TensorflowInputVGGish.html for more details.
Parameters
Name Type Description frameVectorFloat the audio frame
Returns
Details
-
TonalExtractor( signal [, frameSize [, hopSize [, tuningFrequency ] ] ] ) → {object}
-
Description
This algorithm computes tonal features for an audio signal Check https://essentia.upf.edu/reference/std_TonalExtractor.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the audio input signal
frameSizenumber <optional> 4096 the framesize for computing tonal features
hopSizenumber <optional> 2048 the hopsize for computing tonal features
tuningFrequencynumber <optional> 440 the tuning frequency of the input signal
Returns
Details
-
TonicIndianArtMusic( signal [, binResolution [, frameSize [, harmonicWeight [, hopSize [, magnitudeCompression [, magnitudeThreshold [, maxTonicFrequency [, minTonicFrequency [, numberHarmonics [, numberSaliencePeaks [, referenceFrequency [, sampleRate ] ] ] ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm estimates the tonic frequency of the lead artist in Indian art music. It uses multipitch representation of the audio signal (pitch salience) to compute a histogram using which the tonic is identified as one of its peak. The decision is made based on the distance between the prominent peaks, the classification is done using a decision tree. Check https://essentia.upf.edu/reference/std_TonicIndianArtMusic.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input signal
binResolutionnumber <optional> 10 salience function bin resolution [cents]
frameSizenumber <optional> 2048 the frame size for computing pitch saliecnce
harmonicWeightnumber <optional> 0.85 harmonic weighting parameter (weight decay ratio between two consequent harmonics, =1 for no decay)
hopSizenumber <optional> 512 the hop size with which the pitch salience function was computed
magnitudeCompressionnumber <optional> 1 magnitude compression parameter (=0 for maximum compression, =1 for no compression)
magnitudeThresholdnumber <optional> 40 peak magnitude threshold (maximum allowed difference from the highest peak in dBs)
maxTonicFrequencynumber <optional> 375 the maximum allowed tonic frequency [Hz]
minTonicFrequencynumber <optional> 100 the minimum allowed tonic frequency [Hz]
numberHarmonicsnumber <optional> 20 number of considered hamonics
numberSaliencePeaksnumber <optional> 5 number of top peaks of the salience function which should be considered for constructing histogram
referenceFrequencynumber <optional> 55 the reference frequency for Hertz to cent convertion [Hz], corresponding to the 0th cent bin
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
Returns
Details
-
TriangularBands( spectrum [, frequencyBands [, inputSize [, log [, normalize [, sampleRate [, type [, weighting ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes energy in triangular frequency bands of a spectrum. The arbitrary number of overlapping bands can be specified. For each band the power-spectrum (mag-squared) is summed. Check https://essentia.upf.edu/reference/std_TriangularBands.html for more details.
Parameters
Name Type Attributes Default Description spectrumVectorFloat the input spectrum (must be greater than size one)
frequencyBandsArray.<any> <optional> [21.533203125, 43.06640625, 64.599609375, 86.1328125, 107.666015625, 129.19921875, 150.732421875, 172.265625, 193.798828125, 215.33203125, 236.865234375, 258.3984375, 279.931640625, 301.46484375, 322.998046875, 344.53125, 366.064453125, 387.59765625, 409.130859375, 430.6640625, 452.197265625, 473.73046875, 495.263671875, 516.796875, 538.330078125, 559.86328125, 581.396484375, 602.9296875, 624.462890625, 645.99609375, 667.529296875, 689.0625, 710.595703125, 732.12890625, 753.662109375, 775.1953125, 796.728515625, 839.794921875, 861.328125, 882.861328125, 904.39453125, 925.927734375, 968.994140625, 990.52734375, 1012.06054688, 1055.12695312, 1076.66015625, 1098.19335938, 1141.25976562, 1184.32617188, 1205.859375, 1248.92578125, 1270.45898438, 1313.52539062, 1356.59179688, 1399.65820312, 1442.72460938, 1485.79101562, 1528.85742188, 1571.92382812, 1614.99023438, 1658.05664062, 1701.12304688, 1765.72265625, 1808.7890625, 1873.38867188, 1916.45507812, 1981.0546875, 2024.12109375, 2088.72070312, 2153.3203125, 2217.91992188, 2282.51953125, 2347.11914062, 2411.71875, 2497.8515625, 2562.45117188, 2627.05078125, 2713.18359375, 2799.31640625, 2885.44921875, 2950.04882812, 3036.18164062, 3143.84765625, 3229.98046875, 3316.11328125, 3423.77929688, 3509.91210938, 3617.578125, 3725.24414062, 3832.91015625, 3940.57617188, 4069.77539062, 4177.44140625, 4306.640625, 4435.83984375, 4565.0390625, 4694.23828125, 4844.97070312, 4974.16992188, 5124.90234375, 5275.63476562, 5426.3671875, 5577.09960938, 5749.36523438, 5921.63085938, 6093.89648438, 6266.16210938, 6459.9609375, 6653.75976562, 6847.55859375, 7041.35742188, 7256.68945312, 7450.48828125, 7687.35351562, 7902.68554688, 8139.55078125, 8376.41601562, 8613.28125, 8871.6796875, 9130.078125, 9388.4765625, 9668.40820312, 9948.33984375, 10249.8046875, 10551.2695312, 10852.734375, 11175.7324219, 11498.7304688, 11843.2617188, 12187.7929688, 12553.8574219, 12919.921875, 13285.9863281, 13673.5839844, 14082.7148438, 14491.8457031, 14922.5097656, 15353.1738281, 15805.3710938, 16257.5683594] list of frequency ranges into which the spectrum is divided (these must be in ascending order and connot contain duplicates),each triangle is build as x(i-1)=0, x(i)=1, x(i+1)=0 over i, the resulting number of bands is size of input array - 2
inputSizenumber <optional> 1025 the size of the spectrum
logboolean <optional> true compute log-energies (log10 (1 + energy))
normalizestring <optional> unit_sum spectrum bin weights to use for each triangular band: 'unit_max' to make each triangle vertex equal to 1, 'unit_sum' to make each triangle area equal to 1 summing the actual weights of spectrum bins, 'unit_area' to make each triangle area equal to 1 normalizing the weights of each triangle by its bandwidth
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
typestring <optional> power use magnitude or power spectrum
weightingstring <optional> linear type of weighting function for determining triangle area
Returns
Details
-
TriangularBarkBands( spectrum [, highFrequencyBound [, inputSize [, log [, lowFrequencyBound [, normalize [, numberBands [, sampleRate [, type [, weighting ] ] ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm computes energy in the bark bands of a spectrum. It is different to the regular BarkBands algorithm in that is more configurable so that it can be used in the BFCC algorithm to produce output similar to Rastamat (http://www.ee.columbia.edu/ln/rosa/matlab/rastamat/) See the BFCC algorithm documentation for more information as to why you might want to choose this over Mel frequency analysis It is recommended that the input "spectrum" be calculated by the Spectrum algorithm. Check https://essentia.upf.edu/reference/std_TriangularBarkBands.html for more details.
Parameters
Name Type Attributes Default Description spectrumVectorFloat the audio spectrum
highFrequencyBoundnumber <optional> 22050 an upper-bound limit for the frequencies to be included in the bands
inputSizenumber <optional> 1025 the size of the spectrum
logboolean <optional> false compute log-energies (log10 (1 + energy))
lowFrequencyBoundnumber <optional> 0 a lower-bound limit for the frequencies to be included in the bands
normalizestring <optional> unit_sum 'unit_max' makes the vertex of all the triangles equal to 1, 'unit_sum' makes the area of all the triangles equal to 1
numberBandsnumber <optional> 24 the number of output bands
sampleRatenumber <optional> 44100 the sample rate
typestring <optional> power 'power' to output squared units, 'magnitude' to keep it as the input
weightingstring <optional> warping type of weighting function for determining triangle area
Returns
Details
-
Trimmer( signal [, checkRange [, endTime [, sampleRate [, startTime ] ] ] ] ) → {object}
-
Description
This algorithm extracts a segment of an audio signal given its start and end times. Giving "startTime" greater than "endTime" will raise an exception. Check https://essentia.upf.edu/reference/std_Trimmer.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input signal
checkRangeboolean <optional> false check whether the specified time range for a slice fits the size of input signal (throw exception if not)
endTimenumber <optional> 1e+06 the end time of the slice you want to extract [s]
sampleRatenumber <optional> 44100 the sampling rate of the input audio signal [Hz]
startTimenumber <optional> 0 the start time of the slice you want to extract [s]
Returns
Details
-
Tristimulus( frequencies, magnitudes ) → {object}
-
Description
This algorithm calculates the tristimulus of a signal given its harmonic peaks. The tristimulus has been introduced as a timbre equivalent to the color attributes in the vision. Tristimulus measures the mixture of harmonics in a given sound, grouped into three sections. The first tristimulus measures the relative weight of the first harmonic; the second tristimulus measures the relative weight of the second, third, and fourth harmonics taken together; and the third tristimulus measures the relative weight of all the remaining harmonics. Check https://essentia.upf.edu/reference/std_Tristimulus.html for more details.
Parameters
Name Type Description frequenciesVectorFloat the frequencies of the harmonic peaks ordered by frequency
magnitudesVectorFloat the magnitudes of the harmonic peaks ordered by frequency
Returns
Details
-
TruePeakDetector( signal [, blockDC [, emphasise [, oversamplingFactor [, quality [, sampleRate [, threshold [, version ] ] ] ] ] ] ] ) → {object}
-
Description
This algorithm implements a “true-peak” level meter for clipping detection. According to the ITU-R recommendations, “true-peak” values overcoming the full-scale range are potential sources of “clipping in subsequent processes, such as within particular D/A converters or during sample-rate conversion”. The ITU-R BS.1770-4[1] (by default) and the ITU-R BS.1770-2[2] signal-flows can be used. Go to the references for information about the differences. Only the peaks (if any) exceeding the configurable amplitude threshold are returned. Note: the parameters 'blockDC' and 'emphasise' work only when 'version' is set to 2. References: [1] Series, B. S. (2011). Recommendation ITU-R BS.1770-4. Algorithms to measure audio programme loudness and true-peak audio level, https://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.1770-4-201510-I!!PDF-E.pdf [2] Series, B. S. (2011). Recommendation ITU-R BS.1770-2. Algorithms to measure audio programme loudness and true-peak audio level, https://www.itu.int/dms_pubrec/itu-r/rec/bs/R-REC-BS.1770-2-201103-S!!PDF-E.pdf Check https://essentia.upf.edu/reference/std_TruePeakDetector.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input audio signal
blockDCboolean <optional> false flag to activate the optional DC blocker
emphasiseboolean <optional> false flag to activate the optional emphasis filter
oversamplingFactornumber <optional> 4 times the signal is oversapled
qualitynumber <optional> 1 type of interpolation applied (see libresmple)
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
thresholdnumber <optional> -0.0002 threshold to detect peaks [dB]
versionnumber <optional> 4 algorithm version
Returns
Details
-
TuningFrequency( frequencies, magnitudes [, resolution ] ) → {object}
-
Description
This algorithm estimates the tuning frequency give a sequence/set of spectral peaks. The result is the tuning frequency in Hz, and its distance from 440Hz in cents. This version is slightly adapted from the original algorithm [1], but gives the same results. Check https://essentia.upf.edu/reference/std_TuningFrequency.html for more details.
Parameters
Name Type Attributes Default Description frequenciesVectorFloat the frequencies of the spectral peaks [Hz]
magnitudesVectorFloat the magnitudes of the spectral peaks
resolutionnumber <optional> 1 resolution in cents (logarithmic scale, 100 cents = 1 semitone) for tuning frequency determination
Returns
Details
-
TuningFrequencyExtractor( signal [, frameSize [, hopSize ] ] ) → {object}
-
Description
This algorithm extracts the tuning frequency of an audio signal Check https://essentia.upf.edu/reference/std_TuningFrequencyExtractor.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the audio input signal
frameSizenumber <optional> 4096 the frameSize for computing tuning frequency
hopSizenumber <optional> 2048 the hopsize for computing tuning frequency
Returns
Details
-
UnaryOperator( array [, scale [, shift [, type ] ] ] ) → {object}
-
Description
This algorithm performs basic arithmetical operations element by element given an array. Note: - log and ln are equivalent to the natural logarithm - for log, ln, log10 and lin2db, x is clipped to 1e-30 for x<1e-30 - for x<0, sqrt(x) is invalid - scale and shift parameters define linear transformation to be applied to the resulting elements Check https://essentia.upf.edu/reference/std_UnaryOperator.html for more details.
Parameters
Name Type Attributes Default Description arrayVectorFloat the input array
scalenumber <optional> 1 multiply result by factor
shiftnumber <optional> 0 shift result by value (add value)
typestring <optional> identity the type of the unary operator to apply to input array
Returns
Details
-
UnaryOperatorStream( array [, scale [, shift [, type ] ] ] ) → {object}
-
Description
This algorithm performs basic arithmetical operations element by element given an array. Note: - log and ln are equivalent to the natural logarithm - for log, ln, log10 and lin2db, x is clipped to 1e-30 for x<1e-30 - for x<0, sqrt(x) is invalid - scale and shift parameters define linear transformation to be applied to the resulting elements Check https://essentia.upf.edu/reference/std_UnaryOperatorStream.html for more details.
Parameters
Name Type Attributes Default Description arrayVectorFloat the input array
scalenumber <optional> 1 multiply result by factor
shiftnumber <optional> 0 shift result by value (add value)
typestring <optional> identity the type of the unary operator to apply to input array
Returns
Details
-
Variance( array ) → {object}
-
Description
This algorithm computes the variance of an array. Check https://essentia.upf.edu/reference/std_Variance.html for more details.
Parameters
Name Type Description arrayVectorFloat the input array
Returns
Details
-
Vibrato( pitch [, maxExtend [, maxFrequency [, minExtend [, minFrequency [, sampleRate ] ] ] ] ] ) → {object}
-
Description
This algorithm detects the presence of vibrato and estimates its parameters given a pitch contour [Hz]. The result is the vibrato frequency in Hz and the extent (peak to peak) in cents. If no vibrato is detected in a frame, the output of both values is zero. Check https://essentia.upf.edu/reference/std_Vibrato.html for more details.
Parameters
Name Type Attributes Default Description pitchVectorFloat the pitch trajectory [Hz].
maxExtendnumber <optional> 250 maximum considered vibrato extent [cents]
maxFrequencynumber <optional> 8 maximum considered vibrato frequency [Hz]
minExtendnumber <optional> 50 minimum considered vibrato extent [cents]
minFrequencynumber <optional> 4 minimum considered vibrato frequency [Hz]
sampleRatenumber <optional> 344.531 sample rate of the input pitch contour
Returns
Details
-
WarpedAutoCorrelation( array [, maxLag [, sampleRate ] ] ) → {object}
-
Description
This algorithm computes the warped auto-correlation of an audio signal. The implementation is an adapted version of K. Schmidt's implementation of the matlab algorithm from the 'warped toolbox' by Aki Harma and Matti Karjalainen found [2]. For a detailed explanation of the algorithm, see [1]. This algorithm is only defined for positive lambda = 1.0674sqrt(2.0atan(0.00006583*sampleRate)/PI) - 0.1916, thus it will throw an exception when the supplied sampling rate does not pass the requirements. If maxLag is larger than the size of the input array, an exception is thrown. Check https://essentia.upf.edu/reference/std_WarpedAutoCorrelation.html for more details.
Parameters
Name Type Attributes Default Description arrayVectorFloat the array to be analyzed
maxLagnumber <optional> 1 the maximum lag for which the auto-correlation is computed (inclusive) (must be smaller than signal size)
sampleRatenumber <optional> 44100 the audio sampling rate [Hz]
Returns
Details
-
Welch( frame [, averagingFrames [, fftSize [, frameSize [, sampleRate [, scaling [, windowType ] ] ] ] ] ] ) → {object}
-
Description
This algorithm estimates the Power Spectral Density of the input signal using the Welch's method [1]. The input should be fed with the overlapped audio frames. The algorithm stores internally therequired past frames to compute each output. Call reset() to clear the buffers. This implentation is based on Scipy [2] Check https://essentia.upf.edu/reference/std_Welch.html for more details.
Parameters
Name Type Attributes Default Description frameVectorFloat the input stereo audio signal
averagingFramesnumber <optional> 10 amount of frames to average
fftSizenumber <optional> 1024 size of the FFT. Zero padding is added if this is larger the input frame size.
frameSizenumber <optional> 512 the expected size of the input audio signal (this is an optional parameter to optimize memory allocation)
sampleRatenumber <optional> 44100 the sampling rate of the audio signal [Hz]
scalingstring <optional> density 'density' normalizes the result to the bandwidth while 'power' outputs the unnormalized power spectrum
windowTypestring <optional> hann the window type, which can be 'hamming', 'hann', 'triangular', 'square' or 'blackmanharrisXX'
Returns
Details
-
Windowing( frame [, normalized [, size [, type [, zeroPadding [, zeroPhase ] ] ] ] ] ) → {object}
-
Description
This algorithm applies windowing to an audio signal. It optionally applies zero-phase windowing and optionally adds zero-padding. The resulting windowed frame size is equal to the incoming frame size plus the number of padded zeros. By default, the available windows are normalized (to have an area of 1) and then scaled by a factor of 2. Check https://essentia.upf.edu/reference/std_Windowing.html for more details.
Parameters
Name Type Attributes Default Description frameVectorFloat the input audio frame
normalizedboolean <optional> true a boolean value to specify whether to normalize windows (to have an area of 1) and then scale by a factor of 2
sizenumber <optional> 1024 the window size
typestring <optional> hann the window type, which can be 'hamming', 'hann', 'triangular', 'square' or 'blackmanharrisXX'
zeroPaddingnumber <optional> 0 the size of the zero-padding
zeroPhaseboolean <optional> true a boolean value that enables zero-phase windowing
Returns
Details
-
ZeroCrossingRate( signal [, threshold ] ) → {object}
-
Description
This algorithm computes the zero-crossing rate of an audio signal. It is the number of sign changes between consecutive signal values divided by the total number of values. Noisy signals tend to have higher zero-crossing rate. In order to avoid small variations around zero caused by noise, a threshold around zero is given to consider a valid zerocrosing whenever the boundary is crossed. Check https://essentia.upf.edu/reference/std_ZeroCrossingRate.html for more details.
Parameters
Name Type Attributes Default Description signalVectorFloat the input signal
thresholdnumber <optional> 0 the threshold which will be taken as the zero axis in both positive and negative sign
Returns
Details