Sound examples

Comparison of different systems, in all cases the input mel-spectrogram/WORLD features are obtained by analysis of a recording.

GT = Reference recording

NW = Neural WORLD (our proposed system)

NW-NoAdv = Neural WORLD (our proposed system) without adversarial training

E-PWG = Excited Parallel WaveGAN (non-autoregressive baseline)

WORLD = WORLD vocoder (signal processing based)

AR-WNV = Autoregressive WaveNet vocoder

Seen singers

Examples
GT
NW
NW-NoAdv
E-PWG
WORLD
AR-WNV
Dulcinea
Baritone
Feel
R2

Unseen singers

Examples
GT
NW
NW-NoAdv
E-PWG
WORLD
AR-WNV
DI 1-17
DI 1-18
F001-029
F001-040
OpenCPop 1
OpenCPop 2