Invertible Constant-Q based on Non-Stationary Gabor frames

A Constant-Q transform is a time/frequency representation where the bins follow a geometric progression. This means that the bins can be chosen to represent the frequencies of the semitones (or fractions of semitones) from an equal-tempered scale. This could be seen as a dimensionality reduction over the Short-Time Fourier transform done in a way that matches the human musical interpretation of frequency.

However, most of the CQ implementations have drawbacks. They are computationally inefficient, lacking C/C++ solutions, and most importantly, not invertible. This makes them unsuitable for applications such as audio modification or synthesis.

Recently we have implemented an invertible CQ algorithm based on Non-Stationary Gabor frames [1]. The NSGConstantQ reference page contains details about the algorithm and the related research.

Below, we will show how to get CQ spectrograms in Essentia and estimate the reconstruction error in terms of SNR.

Standard computation

Here we run the forward and backward transforms on the entire audio file. We are storing some configuration parameters in a dictionary to make sure that the same setup is used for analysis and synthesis. A list of all available parameters can be found in the NSGConstantQ reference page.

from essentia.standard import (MonoLoader, NSGConstantQ, 

# Load an audio file
x = MonoLoader(filename='your/audio/file.wav')()

# Parameters
params = {
          # Backward transform needs to know the signal size.
          'inputSize': x.size,
          'minFrequency': 65.41,
          'maxFrequency': 6000,
          'binsPerOctave': 48,
          # Minimum number of FFT bins per CQ channel.
          'minimumWindow': 128  

# Forward and backward transforms
constantq, dcchannel, nfchannel = NSGConstantQ(**params)(x)
y = NSGIConstantQ(**params)(constantq, dcchannel, nfchannel)

The algorithm generates three outputs: constantq, dcchannel and nfchannel. The reason for this is that the Constant-Q condition is held between the (minFrequency, maxFrequency) range, but the information in the DC and Nyquist channels is also required for perfect reconstruction. We were able to run the analysis/synthesis process at 32x realtime on a 3.4GHz i5-3570 CPU.

Let’s evaluate the quality of the reconstructed signal in terms of SNR:

import numpy as np
from essentia import lin2db

def SNR(r, t, skip=8192):
    r    : reference
    t    : test
    skip : number of samples to skip from the SNR computation
    difference = ((r[skip: -skip] - t[skip: -skip]) ** 2).sum()
    return lin2db((r[skip: -skip] ** 2).sum() / difference)

cq_snr = SNR(x, y)
print('Reconstruction SNR: {:.3f} dB'.format(cq_snr))
Reconstruction SNR: 127.854 dB

Now let’s plot the transform. Note that as the values are complex, we are only showing their magnitude.

from matplotlib import pyplot as plt
plt.rcParams['figure.figsize'] = (12.0, 8.0)

# Display
            origin='lower', aspect='auto')
plt.title('Magnitude of the Constant-Q transform (dB)')


Finally, we can listen and compare the original and the reconstructed signals!


from IPython.display import Audio
Audio(x, rate=44100)


Audio(y, rate=44100)

Framewise computation

Additionally, we have implemented a framewise version of the algorithm that works on half-overlapped frames. This can be useful for very long audio signals that are unsuitable to be processed at once. The algorithm is described in [2]. In this case, we don’t have a dedicated C++ algorithm, but we have implemented a Python wrapper with functions to perform the analysis and synthesis.

import essentia.pytools.spectral as sp

# Forward and backward transforms
cq_frames, dc_frames, nb_frames = sp.nsgcqgram(x, frameSize=4096)
y_frames = sp.nsgicqgram(cq_frames, dc_frames, nb_frames,

Reconstruction error in this case:

cq_snr = SNR(x, y_frames[:x.size])
print('Reconstruction SNR: {:.3f} dB'.format(cq_snr))
Reconstruction SNR: 133.596 dB

Displaying the framewise transform is slightly more tricky as we have to overlap-add the spectrograms obtained for each frame. To facilitate that we provide a function as shown in the next example. The framewise Constant-Q spectrogram is not supposed to be identical to the standard computation.

# Get the overlap-add version for visualization
cq_overlaped = sp.nsgcq_overlap_add(cq_frames)

            origin='lower', aspect='auto')
plt.title('Magnitude of the Framewise Constant-Q transform (dB)')


Note that it is not possible to synthesize the audio from this overlapped version as we cannot retrieve the analysis frames from it. The synthesis has to be performed from the original list of frames output by the nsgcqgram function.


[1] Velasco, G. A., Holighaus, N., Dörfler, M., & Grill, T. (2011). Constructing an invertible constant-Q transform with non-stationary Gabor frames. Proceedings of DAFX11, Paris, 93-99.

[2] Holighaus, N., Dörfler, M., Velasco, G. A., & Grill, T. (2013). A framework for invertible, real-time constant-Q transforms. IEEE Transactions on Audio, Speech, and Language Processing, 21(4), 775-785.

Fingerprinting with Chromaprint algorithm

As we are gradually expanding Essentia with new functionality, we have added a new algorithm for computation of audio fingerprints, the Chromaprinter. Technically, it is a wrapper of Chromaprint library which you will need to install to be able to use Chromaprinter.

The fingerprints computed with Chromaprinter can be used to query the AcoustID database for track metadata. Check a few examples of how Chromaprinter can be used in Python here. To start using this algorithm now, build the latest Essentia code from the master branch.

Docker images for Essentia

We now host images on docker hub containing Essenta, the python bindings, and all pre-built examples. This means that you can run Essentia on Linux, Mac, or Windows without having to compile it yourself.

We publish 6 variations, on three different operating systems: Ubuntu 16.04 LTS, Ubuntu 17.10, Debian 9.0 Stretch, and a python 2 and python 3 variation for each OS.

See the documentation for further information on how to run an example or python file using these images.

These images can also be used as a base image for your own docker projects if you want Essentia available.

Updates to cepstral features (MFCC and GFCC)

Working towards the next Essentia release, we have updated our cepstral features. The updates include:

  • Support for extracting MFCCs ‘the htk way’ (python example).

  • In literature there are two common MFCCs ‘standards’ differing in some parameters and the mel-scale computation itself: the Slaney way (Auditory toolbox) and the htk way (chapter 5.4 from htk book).

  • See a python notebook for a comparison with mfcc extracted with librosa and with htk.

  • Support for inverting the computed MFCCs back to spectral (mel) domain (python example).

  • The first MFCC coefficients are standard for describing singing voice timbre. The MFCC feature vector however does not represent the singing voice well visually. Instead, it is a common practice to invert the first 12-15 MFCC coefficients back to mel-bands domain for visualization. We have ported invmelfcc.m as explained here.

  • Support for cent scale.

You can start using these features before the official release by building Essentia from the master branch.

Essentia 2.1 beta3 released

Keeping Essentia in constant development, we are glad to announce that Essentia 2.1 beta3 has been recently released. It is a preliminary version of the forthcoming 2.1 release and is the version recommended to install. This version includes a very significant amount of new algorithms, updates, bug-fixes and documentation improvements. The full list of changes can be seen here.

Static binaries for command-line extractors for music descriptors

We are now hosting static binaries for a number command-line extractors, originally developed as examples of how Essentia can be used in applications. These tools make it easy to compute some descriptors and store them to a file given an audio file as an input (therefore, called “extractors”) without any need to install Essentia library itself.

In particular these extractors can:

  • compute a large set of spectral, time-domain, rhythm, tonal and high-level descriptors
  • compute MFCC frames
  • extract pitch of a predominant melody using MELODIA algorithm
  • extract pitch for a monophonic signal using YinFFT algorithm
  • extract beats

Notably, two extractors, specifically designed for AcousticBrainz and Freesound projects are included. They include large sets of features and are designed for batch processing large amounts of audio files.

See description of the extractors in the official documentation. Specifically, here you will find details about our music extractor.

Current builds are done for our latest version of Essentia 2.1_beta2 (Linux, OSX, and Windows). Download them here:

Of course, you are welcome to submit your own extractors to be included in our future builds.

AcousticBrainz - a large-scale music analysis project using Essentia

We are very excited to announce the AcousticBrainz, a new collaboration project between MusicBrainz community and the Music Technology Group, that uses the power of music analysis done by Essentia to build a large-scale open content database of music.

The goal of the project is to crowd source acoustic information for all music in the world and to make it available to the public, providing a massive database of information about music for music technology researchers and open-source hackers. The project relies on Essentia for extracting acoustic characteristics of music, including low-level spectral information, rhythm, keys, scales, and much more, and automatic annotation by genres, moods, and instrumentation.

Since the first public announcement a month ago, the project has gained more than 1.2 million of analysed tracks. The possibility to bring audio analysis on the large scale while having all the data and tools in open access, and the instant feedback from the users, makes this project an exciting opportunity to explore the opportunities of the state-of-the-art music analysis tools and improve them. See more details about the project on the official AcousticBrainz website.

Essentia 2.0.1 released

We are glad to announce Essentia 2.0.1 release!

This new version includes high-level classifier models and a number of minor updates:

  • Added pre-trained high-level classifier models for genres, moods, rhythm and instrumentation (to be used with streaming_extractor_archivemusicextractor, see accuracies here)
  • Fixed scheduler in streaming mode
  • Fixed compilation with clang/libc++/c++11
  • PitchYinFFT now supports parabolic interpolation
  • Updated Vamp plugin
  • Updated documentation and tutorials
  • Minor bugfixes, more unittests, etc.

Please, compile it, use it and give us feedback!!! As usual, we are looking forward for your ideas and suggestions!

Essentia presented at ISMIR 2013

Essentia was presented at the International Society for Music Information Retrieval Conference (ISMIR 2013) that took take place from November 4th to the 8th 2013 in Curitiba (Brazil). The article presented is available from here.

Essentia wins the Open-Source Competition of ACM Multimedia

Essentia wins the Open-Source Software Competition of ACM Multimedia 2013, the worldwide premier multimedia conference that took place in Barcelona from October 21st to 25th 2013.

The ACM Multimedia Open-Source Software Competition celebrates the invaluable contribution of researchers and software developers who advance the field by providing the community with implementations of codecs, middleware, frameworks, toolkits, libraries, applications, and other multimedia software. The criteria for judging all submissions include broad applicability and potential impact, novelty, technical depth, demo suitability, and other miscellaneous factors (e.g., maturity, popularity, student-led, no dependence on closed source, etc.).

The article on Essentia presented at ACM Multimedia is available from here.

Essentia 2.0 becomes open source!!!

Today, June 27th 2013, Essentia becomes Open Source Software and we start distributing it using the Affero-GPLv3 license (also available under proprietary license upon request).

In the last few months we have been working hard in the code to make a major upgrade of the existing code and we will continue improving it and adding more functionalities. We encourage everyone to use it, test it, contribute to it, and help make it better than what already is. You can find the source code in the github repository: There you can find all the documentation, complementary information, and you can download everything.

We have also opened the Gaia library (under the same license terms), which complements Essentia for indexing and similarity computation tasks. It is available at:

Over the years many people have contributed to Essentia, but we specially recognize the work of Nicolas Wack, who was its main developer for many years at the MTG and who has always supported the idea of opening it. We have many plans to promote and exploit Essentia. It is a great tool and we all can do big things with it. Please, compile it, use it and give us feedback!!!