Demos

English male voice - Rio

Feed-forward Transformer Seq2Seq model, with neural vocoder, effects and background music.

The model predicts timbre and phonetic timings, while F0 and note onsets (vowel onsets) are obtained from a reference recording.

Comparison

FFT-NPSS (proposed)
FFT-NPSS w/ ground truth dur.
FFT-NPSS w/o self-attention
AR-NPSS (baseline)

Acknowledgments

The proprietary dataset used in these experiments was provided by Voctro Labs.