Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
Author: Hubert Siuzdak
Paper: arXiv
Code: GitHub
Updates
2023-06-12
Added examples generated with Bark text-to-audio model. Check them out here .
Abstract
Recent advancements in neural vocoding are predominantly driven by Generative Adversarial Networks (GANs) operating in
the time-domain. While effective, this approach neglects the inductive bias offered by time-frequency representations,
resulting in reduntant and computionally-intensive upsampling operations. Fourier-based time-frequency representation
is
an appealing alternative, aligning more accurately with human auditory perception, and benefitting from
well-established
fast algorithms for its computation. Nevertheless, direct reconstruction of complex-valued spectrograms has been
historically problematic, primarily due to phase recovery issues. This study seeks to close this gap by presenting
Vocos, a new model that addresses the key challenges of modeling spectral coefficients. Vocos demonstrates improved
computational efficiency, achieving an order of magnitude increase in speed compared to prevailing time-domain neural
vocoding approaches. As shown by objective evaluation, Vocos not only matches state-of-the-art audio quality, but
thanks
to frequency-aware generator, also effectively mitigates the periodicity issues frequently associated with time-domain
GANs. The source code and model weights have been open-sourced at https://github.com/charactr-platform/vocos.
Figure 1: Comparison of a typical time-domain GAN vocoder (a), with the proposed Vocos architecture (b) that maintains
the same temporal resolution across all layers. Time-domain vocoders use transposed convolutions to sequentially
upsample the signal to the desired sample rate. In contrast, Vocos achieves this by using a computationally efficient
inverse Fourier transform.
Resynthesis from neural audio codec (EnCodec)
1.5 kbps
Ground truth
EnCodec
Vocos
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
3 kbps
Ground truth
EnCodec
Vocos
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
6 kbps
Ground truth
EnCodec
Vocos
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
12 kbps
Ground truth
EnCodec
Vocos
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Resynthesis from mel-spectrograms
Ground truth
HiFi-GAN
BigVGAN
iSTFTNet
Vocos
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Your browser does not support the audio element.
Audio reconstruction from Bark tokens
Sequence of tokens generated with Bark text-to-audio model: https://github.com/suno-ai/bark
Text prompt
EnCodec
Vocos
So, you've heard about neural vocoding? [laughs] We've been messing around with this new model called Vocos.
Your browser does not support the audio element.
Your browser does not support the audio element.
Ok [clears throat] let's compare the audio outputs. Listen carefully to the differences in each sample's quality and artifacts.
Your browser does not support the audio element.
Your browser does not support the audio element.
My friend’s bakery burned down last night. [sighs] Now his business is toast.
Your browser does not support the audio element.
Your browser does not support the audio element.
Schweinsteiger ist ein nationales kulturgut. Wir müssen ihn um jeden preis schützen.
Your browser does not support the audio element.
Your browser does not support the audio element.
Polecam odwiedzenie Starego Miasta w Szczecinie! Architektura jest piękna, a lokalna kuchnia doskonała!
Your browser does not support the audio element.
Your browser does not support the audio element.
我计划在下周的游泳比赛中和我的朋友托尼比赛。他认为自己可以打败我,但他不知道我一直在浴缸里偷偷练习游泳。我不敢说我会赢,但我很确定我会搞出一片浪花。
Your browser does not support the audio element.
Your browser does not support the audio element.
Bonjour. Aujourd’hui, nous sommes içi pour manger trop de glace.
Your browser does not support the audio element.
Your browser does not support the audio element.
हॉटस्टार पर रुद्र सबसे बेहतरीन शो है! कहानी बेहद शानदार है, और अजय देवगन बहुत खूबसूरत लगते हैं।
Your browser does not support the audio element.
Your browser does not support the audio element.
¿Estos payasos llamaron a su modelo como un ladrido de perro? [laughs] ¿En serio?
Your browser does not support the audio element.
Your browser does not support the audio element.
추석은 내가 가장 좋아하는 명절이다. 나는 며칠 동안 휴식을 취하고 친구 및 가족과 시간을 보낼 수 있습니다
Your browser does not support the audio element.
Your browser does not support the audio element.