Sound examples

Comparison of F0 generated curves by different models trained with the NIT-SONG070-F001 dataset.

Phonetic timings are obtained from a reference recording, while timbre is predicted by the NPSS model and synthesized with the WORLD vocoder.

Comparison

Example 1 (a cappella)
Example 2 (with music)
Performance driven
P-F0 (proposed)
P-F0 (proposed) trained with 1 song
AR-F0
AR-F0 trained with 1 song

Acknowledgments

This work uses the public version of the NIT-SONG070-F001 dataset by Nagoya Institute of Technology, licensed under CC BY 3.0