Singing Voice Conversion via diffusion model
Go to file
红血球AE3803 670ed3ef5c
2023-03-11 13:09:06 +09:00
configs upload code 2023-03-10 20:08:57 +09:00
doc Update 2023-03-10 22:12:06 +08:00
infer_tools upload code 2023-03-10 20:08:57 +09:00
modules upload code 2023-03-10 20:08:57 +09:00
preprocessing upload code 2023-03-10 20:08:57 +09:00
training upload code 2023-03-10 20:08:57 +09:00
utils upload code 2023-03-10 20:08:57 +09:00
.DS_Store upload code 2023-03-10 20:08:57 +09:00
.gitignore upload code 2023-03-10 20:08:57 +09:00 upload code 2023-03-10 20:08:57 +09:00 将README默认设成英文版 2023-03-11 13:04:09 +09:00 upload code 2023-03-10 20:08:57 +09:00 upload code 2023-03-10 20:08:57 +09:00 upload code 2023-03-10 20:08:57 +09:00 upload code 2023-03-10 20:08:57 +09:00 upload code 2023-03-10 20:08:57 +09:00
requirements.txt upload code 2023-03-10 20:08:57 +09:00 upload code 2023-03-10 20:08:57 +09:00 upload code 2023-03-10 20:08:57 +09:00


Singing Voice Conversion via diffusion model

This repository is a refactored version of diff-svc fork, with new features such as multi-speaker support, auxiliary scripts, and new Hubert. Please evaluate and assume the risks on your own.

We recommend using the stable version: Diff-SVC

The project tutorial can be found in the doc folder. Please do not ask questions about this modified version in the original project channel or Discord.

Under the same parameters, the number of training steps required for the Chinese Hubert is approximately 1.5 to 2 times that of Soft Hubert. It is not recommended for beginners to use.

Changes Log


Optimized the speed of nsf-hifigan @diffsinger


Updated config parameters, added flask_api multi-speaker model, and removed midi a mode diffsinger nesting support @小狼


Refactored directory, streamlined code, and removed multiple inheritance @小狼


Changed configuration file to cascading, only need to modify config_nsf, config_ms (choose one) for preprocessing @小狼


Added multi-speaker support (config_ms.yaml), preprocessing code referenced diffsinger modified by @小狼


Added to filter the pitch range of the dataset (when the amount of data is sufficient, remove the duplicate pitch range to speed up the convergence of high and low pitches)

Removed dependencies on 24k pe and hifigan, deleted pitch cwt mode, and reused preprocessing code for inference @小狼


Added f0_static parameter for statistical pitch range, and added adaptive pitch shift function (requires f0_static, old model config can use data_static to add this parameter) @小狼


Cancelled support for 24k sampling rate and pe, reduced some parameters, added specialized tutorials to the documentation; supports both specialized and nesting mode export;

pre_hubert is a step-by-step preprocessing (used for preprocessing with 4g or less memory); data_static is for dataset pitch range statistics (for reference only); > > Chinese Hubert requires fairseq dependency, please install it yourself @小狼


Updated slicer v2, removed slicer cache, simplified some infer processes; removed vec support, added Chinese Hubert (only base model, around 1.1g) @小狼


Added pre_check to detect environment and data @深夜诗人; improved simplify model @九尾玄狐; supervised code @小狼


Fixed the problem of repeatedly loading the hubert model during inference @小狼


Opened application for 44.1kHz vocoder and officially provided support for 44.1kHz.


An option, no_fs2, has been added by default which can optimize some networks, improve training speed, reduce model size, and be effective for new models trained in the future.


A major bug has been fixed that could potentially convert the original gt audio used for inference to a sampling rate of 22.05kHz. We apologize for any inconvenience caused and kindly ask that you check your test audio and use the updated code.


Many bugs have been fixed, including several major ones that have a significant impact on inference performance.


Added support for most formats of input and output during inference, eliminating the need for manual conversion using other software.


Corrected the epoch/steps display issue when reading models after interruption, added disk cache for f0 processing, and added support files for real-time pitch-shifting inference.


Corrected the duration error during slicing, added support for 44.1kHz, and added support for contentvec.


Added the feature to save mel-spectrograms.


Integrated the new vocoder code and updated the parselmouth algorithm.


Organized the inference section and added the feature for automatic slicing of long audio files.


Migrated the hubert onnx inference to torch inference and reorganized the inference logic. If you have previously downloaded the onnx hubert model, please download and replace it with the pt model. The configuration file does not need to be changed. Currently, direct GPU inference and preprocessing on a 1060 6G GPU is possible. For more details, please refer to the documentation.


Updated dependency files and removed redundant dependencies.


Fixed a severe bug that caused hubert to use CPU inference on a GPU server, slowing it down by 3-5 times. This issue affects preprocessing and inference, but not training.


Fixed the issue of preprocessed data on Windows not being usable on Linux and updated some documentation.


Wrote detailed documentation for inference and training, modified and integrated some code, and added support for ogg audio format (no need to distinguish between ogg and wav, can be used directly).


Supported training on custom datasets and simplified the code.


Completed training on the opencpop dataset and created a repository.

注意事项 /Notes


如将本仓库代码二次分发,或将由此项目产出的任何结果公开发表 (包括但不限于视频网站投稿),请注明原作者及代码来源 (此仓库)。


This project is established for academic exchange purposes and is not intended for production environments. We are not

responsible for any copyright issues arising from the sound produced by this project's model.

If you redistribute the code in this repository or publicly publish any results produced by this project (including but not limited to video website submissions), please indicate the original author and source code (this repository).

If you use this project for any other plans, please contact and inform the author of this repository in advance. Thank you very much.

推理 /Inference

参考 进行修改

预处理 /PreProcessing:

CUDA_VISIBLE_DEVICES=0 python preprocessing/ --config configs/config_nsf.yaml

训练 /Training:

CUDA_VISIBLE_DEVICES=0 python --config configs/config_nsf.yaml --exp_name <your project name> --reset 


详细训练过程和各种参数介绍请查看 推理与训练说明

中文 hubert 与特化教程

学术 / Acknowledgements

项目基于 diffsingerdiffsinger (openvpi 维护版)soft-vc 开发.

同时也十分感谢 openvpi 成员在开发训练过程中给予的帮助。

This project is based on diffsinger, diffsinger (openvpi maintenance version), and soft-vc. We would also like to thank the openvpi members for their help during the development and training process.

注意:此项目与同名论文 DiffSVC 无任何联系,请勿混淆!

Note: This project has no connection with the paper of the same name DiffSVC, please do not confuse them!

工具 / Tools

音频切片参考 audio-slicer

Audio Slice Reference audio-slicer