Amphion Singing Voice Conversion (VC) Recipe

Quick Start

We provide a beginner recipe to demonstrate how to train a cutting edge VC model. Specifically, it is an official implementation of the paper "NORO: A Noise-Robust One-Shot Voice Conversion System with Hidden Speaker Representation Capabilities".

Supported Model Architectures

Until now, Amphion has supported a noise-robust VC model with the following architecture:

It has the following features:

Noise-Robust Voice Conversion: Utilizes a dual-branch reference encoding module and noise-agnostic contrastive speaker loss to maintain high-quality voice conversion in noisy environments.
One-shot Voice Conversion: Achieves timbre conversion using only one reference speech sample.
Speaker Representation Learning: Explores the potential of the reference encoder as a self-supervised speaker encoder.