Compare commits

...

3 Commits

Author SHA1 Message Date
红血球AE3803 670ed3ef5c
Delete README_zh.md 2023-03-11 13:09:06 +09:00
红血球AE3803 1ac4ae2e3a
将README默认设成英文版 2023-03-11 13:04:09 +09:00
红血球AE3803 b93abf81c5
Rename README.md to README_zh.md 2023-03-11 13:03:24 +09:00
2 changed files with 86 additions and 280 deletions

View File

@ -1,189 +0,0 @@
# Diff-SVC
Singing Voice Conversion via diffusion model
## This repository is a refactored version of diff-svc fork, with new features such as multi-speaker support, auxiliary scripts, and new Hubert. Please evaluate and assume the risks on your own.
> We recommend using the stable version: [Diff-SVC](https://github.com/prophesier/diff-svc)
>
> The project tutorial can be found in the doc folder. Please do not ask questions about this modified version in the original project channel or Discord.
>
> Under the same parameters, the number of training steps required for the Chinese Hubert is approximately 1.5 to 2 times that of Soft Hubert. It is not recommended for beginners to use.
## Changes Log
> 2023.03.09
>
> Optimized the speed of nsf-hifigan @diffsinger
>
> 2023.02.18
>
> Updated config parameters, added flask_api multi-speaker model, and removed midi a mode diffsinger nesting support @小狼
>
> 2023.01.20
>
> Refactored directory, streamlined code, and removed multiple inheritance @小狼
>
> 2023.01.18
>
> Changed configuration file to cascading, only need to modify config_nsf, config_ms (choose one) for preprocessing @小狼
>
> 2023.01.16
>
> Added multi-speaker support (config_ms.yaml), preprocessing code referenced diffsinger modified by @小狼
>
> 2023.01.09
>
> Added select.py to filter the pitch range of the dataset (when the amount of data is sufficient, remove the duplicate pitch range to speed up the convergence of high and low pitches)
>
> Removed dependencies on 24k pe and hifigan, deleted pitch cwt mode, and reused preprocessing code for inference @小狼
>
> 2023.01.07
>
> Added f0_static parameter for statistical pitch range, and added adaptive pitch shift function (requires f0_static, old model config can use data_static to add this parameter) @小狼
>
> 2023.01.05
>
> Cancelled support for 24k sampling rate and pe, reduced some parameters, added specialized tutorials to the documentation; batch.py supports both specialized and nesting mode export;
>
> pre_hubert is a step-by-step preprocessing (used for preprocessing with 4g or less memory); data_static is for dataset pitch range statistics (for reference only); > > Chinese Hubert requires fairseq dependency, please install it yourself @小狼
>
> 2023.01.01
>
> Updated slicer v2, removed slicer cache, simplified some infer processes; removed vec support, added Chinese Hubert (only base model, around 1.1g) @小狼
>
> 2022.12.17
>
> Added pre_check to detect environment and data @深夜诗人; improved simplify model @九尾玄狐; supervised code @小狼
>
> 2022.12.16
>
> Fixed the problem of repeatedly loading the hubert model during inference @小狼
>
> 2022.12.04
>
> Opened application for 44.1kHz vocoder and officially provided support for 44.1kHz.
>
> 2022.11.28
>
> An option, no_fs2, has been added by default which can optimize some networks, improve training speed, reduce model size, and be effective for new models trained in the future.
>
> 2022.11.23
>
> A major bug has been fixed that could potentially convert the original gt audio used for inference to a sampling rate of 22.05kHz. We apologize for any inconvenience caused and kindly ask that you check your test audio and use the updated code.
>
> 2022.11.22
>
> Many bugs have been fixed, including several major ones that have a significant impact on inference performance.
>
> 2022.11.20
>
> Added support for most formats of input and output during inference, eliminating the need for manual conversion using other software.
>
> 2022.11.13
>
> Corrected the epoch/steps display issue when reading models after interruption, added disk cache for f0 processing, and added support files for real-time pitch-shifting inference.
>
> 2022.11.11
>
> Corrected the duration error during slicing, added support for 44.1kHz, and added support for contentvec.
>
> 2022.11.04
>
> Added the feature to save mel-spectrograms.
>
> 2022.11.02
>
> Integrated the new vocoder code and updated the parselmouth algorithm.
>
> 2022.10.29
>
> Organized the inference section and added the feature for automatic slicing of long audio files.
>
> 2022.10.28
>
> Migrated the hubert onnx inference to torch inference and reorganized the inference logic. If you have previously downloaded the onnx hubert model, please download and replace it with the pt model. The configuration file does not need to be changed. Currently, direct GPU inference and preprocessing on a 1060 6G GPU is possible. For more details, please refer to the documentation.
>
> 2022.10.27
>
> Updated dependency files and removed redundant dependencies.
>
> 2022.10.27
>
> Fixed a severe bug that caused hubert to use CPU inference on a GPU server, slowing it down by 3-5 times. This issue affects preprocessing and inference, but not training.
>
> 2022.10.26
>
> Fixed the issue of preprocessed data on Windows not being usable on Linux and updated some documentation.
>
> 2022.10.25
>
> Wrote detailed documentation for inference and training, modified and integrated some code, and added support for ogg audio format (no need to distinguish between ogg and wav, can be used directly).
>
> 2022.10.24
>
> Supported training on custom datasets and simplified the code.
>
> 2022.10.22
>
> Completed training on the opencpop dataset and created a repository.
## 注意事项 /Notes
> 本项目是基于学术交流目的建立,并非为生产环境准备,不对由此项目模型产生的任何声音的版权问题负责。
>
> 如将本仓库代码二次分发,或将由此项目产出的任何结果公开发表 (包括但不限于视频网站投稿),请注明原作者及代码来源 (此仓库)。
>
> 如果将此项目用于任何其他企划,请提前联系并告知本仓库作者,十分感谢。
> This project is established for academic exchange purposes and is not intended for production environments. We are not
>
> responsible for any copyright issues arising from the sound produced by this project's model.
>
> If you redistribute the code in this repository or publicly publish any results produced by this project (including but not limited to video website submissions), please indicate the original author and source code (this repository).
>
> If you use this project for any other plans, please contact and inform the author of this repository in advance. Thank you very much.
## 推理 /Inference
参考 `infer.py` 进行修改
## 预处理 /PreProcessing:
```sh
export PYTHONPATH=.
CUDA_VISIBLE_DEVICES=0 python preprocessing/svc_binarizer.py --config configs/config_nsf.yaml
```
## 训练 /Training:
```sh
CUDA_VISIBLE_DEVICES=0 python run.py --config configs/config_nsf.yaml --exp_name <your project name> --reset
```
> Links:
>
> 详细训练过程和各种参数介绍请查看 [推理与训练说明](./doc/train_and_inference.markdown)
>
> [中文 hubert 与特化教程](./doc/advanced_skills.markdown)
## 学术 / Acknowledgements
项目基于 [diffsinger](https://github.com/MoonInTheRiver/DiffSinger)、[diffsinger (openvpi 维护版)](https://github.com/openvpi/DiffSinger)、[soft-vc](https://github.com/bshall/soft-vc)
开发.
同时也十分感谢 openvpi 成员在开发训练过程中给予的帮助。
This project is based
on [diffsinger](https://github.com/MoonInTheRiver/DiffSinger), [diffsinger (openvpi maintenance version)](https://github.com/openvpi/DiffSinger),
and [soft-vc](https://github.com/bshall/soft-vc). We would also like to thank the openvpi members for their help during
the development and training process.
> 注意:此项目与同名论文 [DiffSVC](https://arxiv.org/abs/2105.13871) 无任何联系,请勿混淆!
> Note: This project has no connection with the paper of the same name [DiffSVC](https://arxiv.org/abs/2105.13871),
> please
> do not confuse them!
## 工具 / Tools
音频切片参考 [audio-slicer](https://github.com/openvpi/audio-slicer)
Audio Slice Reference [audio-slicer](https://github.com/openvpi/audio-slicer)

177
README.md
View File

@ -1,135 +1,130 @@
# Diff-SVC
# Diff-SVC
Singing Voice Conversion via diffusion model
## 本仓库为 diff-svc fork 重构版,新增多说话人、辅助脚本、新 hubert 等,请自行评估并承担风险
> 建议使用稳定版:[Diff-SVC](https://github.com/prophesier/diff-svc)
## This repository is a refactored version of diff-svc fork, with new features such as multi-speaker support, auxiliary scripts, and new Hubert. Please evaluate and assume the risks on your own.
> We recommend using the stable version: [Diff-SVC](https://github.com/prophesier/diff-svc)
>
> 项目教程在 doc 文件夹下此魔改版问题请勿在原项目频道、discord 等询问。
> The project tutorial can be found in the doc folder. Please do not ask questions about this modified version in the original project channel or Discord.
>
> 同参数下,中文 hubert 所需训练步数约为 soft hubert 的 1.5~2 倍,不建议新手使用
## 更新日志 /Changes Log
> Under the same parameters, the number of training steps required for the Chinese Hubert is approximately 1.5 to 2 times that of Soft Hubert. It is not recommended for beginners to use.
## Changes Log
> 2023.03.09
>
> 优化nsf-hifigan速度 @diffsinger
> Optimized the speed of nsf-hifigan @diffsinger
>
> 2023.02.18
>
> 更新config参数增加flask_api多人模型取消midi a模式diffsinger套娃支持 @小狼
> Updated config parameters, added flask_api multi-speaker model, and removed midi a mode diffsinger nesting support @小狼
>
> 2023.01.20
>
> 重构目录,精简代码,去除多层继承 @小狼
>
> Refactored directory, streamlined code, and removed multiple inheritance @小狼
>
> 2023.01.18
>
> 配置文件改为级联,仅需修改 config_nsf、config_ms二选一即可预处理 @小狼
>
> Changed configuration file to cascading, only need to modify config_nsf, config_ms (choose one) for preprocessing @小狼
>
> 2023.01.16
>
> 增加多说话人支持 (config_ms.yaml),预处理代码参考 diffsinger 修改 @小狼
>
> 2023.01.09
> Added multi-speaker support (config_ms.yaml), preprocessing code referenced diffsinger modified by @小狼
>
> 2023.01.09
>
> Added select.py to filter the pitch range of the dataset (when the amount of data is sufficient, remove the duplicate pitch range to speed up the convergence of high and low pitches)
>
> Removed dependencies on 24k pe and hifigan, deleted pitch cwt mode, and reused preprocessing code for inference @小狼
>
> 新增 select.py 筛选数据集音域(数据量足够时,删去重复音域部分,加快高低音收敛)
>
> 删除 24k 的 pe、hifigan 等依赖,删除 pitch cwt 模式infer 复用预处理部分代码 @小狼
>
> 2023.01.07
>
> 预处理新增 f0_static 超参统计音域,新增自适应变调功能 (需 f0_static旧模型 config 可用 data_static 添加此超参) @小狼
>
> 2023.01.05
> Added f0_static parameter for statistical pitch range, and added adaptive pitch shift function (requires f0_static, old model config can use data_static to add this parameter) @小狼
>
> 2023.01.05
>
> Cancelled support for 24k sampling rate and pe, reduced some parameters, added specialized tutorials to the documentation; batch.py supports both specialized and nesting mode export;
>
> pre_hubert is a step-by-step preprocessing (used for preprocessing with 4g or less memory); data_static is for dataset pitch range statistics (for reference only); > > Chinese Hubert requires fairseq dependency, please install it yourself @小狼
>
> 取消 24k 采样率、pe 支持删减部分参数、doc 新增特化教程batch.py 支持特化、套娃两种模式的导出;
>
> pre_hubert 为分步预处理4g 及以下内存预处理使用data_static 为数据集音域统计(仅供参考);中文 hubert 所需依赖 fairseq 请自行安装 @小狼
>
> 2023.01.01
>
> 更新切片机 v2、取消切片缓存简化部分 infer 流程;取消 vec 支持、增加中文 hubert仅 base 模型1.1g 左右)@小狼
>
>
> Updated slicer v2, removed slicer cache, simplified some infer processes; removed vec support, added Chinese Hubert (only base model, around 1.1g) @小狼
>
> 2022.12.17
>
> 新增 pre_check 检测环境、数据 @深夜诗人;改进 simplify 精简模型 @九尾玄狐;监修代码 @小狼
>
>
> Added pre_check to detect environment and data @深夜诗人; improved simplify model @九尾玄狐; supervised code @小狼
>
> 2022.12.16
>
> 修复推理时 hubert 模型重复加载的问题 @小狼
>
>
> Fixed the problem of repeatedly loading the hubert model during inference @小狼
>
> 2022.12.04
>
> 44.1kHz 声码器开放申请,正式提供对 44.1kHz 的支持
>
>
> Opened application for 44.1kHz vocoder and officially provided support for 44.1kHz.
>
> 2022.11.28
>
> 增加了默认打开的 no_fs2 选项,可优化部分网络,提升训练速度、缩减模型体积,对于未来新训练的模型有效
>
> 2022.11.23
>
> 修复了一个重大 bug曾导致可能将用于推理的原始 gt 音频转变采样率为 22.05kHz, 对于由此造成的影响我们表示十分抱歉,请务必检查自己的测试音频,并使用更新后的代码
>
>
> An option, no_fs2, has been added by default which can optimize some networks, improve training speed, reduce model size, and be effective for new models trained in the future.
>
> 2022.11.23
>
> A major bug has been fixed that could potentially convert the original gt audio used for inference to a sampling rate of 22.05kHz. We apologize for any inconvenience caused and kindly ask that you check your test audio and use the updated code.
>
> 2022.11.22
>
> 修复了很多 bug其中有几个影响推理效果重大的 bug
>
>
> Many bugs have been fixed, including several major ones that have a significant impact on inference performance.
>
> 2022.11.20
>
> 增加对推理时多数格式的输入和保存,无需手动借助其他软件转换
>
>
> Added support for most formats of input and output during inference, eliminating the need for manual conversion using other software.
>
> 2022.11.13
>
> 修正中断后读取模型的 epoch/steps 显示问题,添加 f0 处理的磁盘缓存,添加实时变声推理的支持文件
>
>
> Corrected the epoch/steps display issue when reading models after interruption, added disk cache for f0 processing, and added support files for real-time pitch-shifting inference.
>
> 2022.11.11
>
> 修正切片时长误差,补充对 44.1khz 的适配,增加对 contentvec 的支持
>
>
> Corrected the duration error during slicing, added support for 44.1kHz, and added support for contentvec.
>
> 2022.11.04
>
> 添加梅尔谱保存功能
>
>
> Added the feature to save mel-spectrograms.
>
> 2022.11.02
>
> 整合新声码器代码,更新 parselmouth 算法
>
>
> Integrated the new vocoder code and updated the parselmouth algorithm.
>
> 2022.10.29
>
> 整理推理部分,添加长音频自动切片功能。
>
>
> Organized the inference section and added the feature for automatic slicing of long audio files.
>
> 2022.10.28
> 将 hubert 的 onnx 推理迁移为 torch 推理,并整理推理逻辑。
>
> <font color=#FFA500 > 如原先下载过 onnx 的 hubert 模型需重新下载并替换为 pt 模型 </font>config 不需要改,目前可以实现 1060
> 6G 显存的直接 GPU 推理与预处理,详情请查看文档。
>
>
> Migrated the hubert onnx inference to torch inference and reorganized the inference logic. If you have previously downloaded the onnx hubert model, please download and replace it with the pt model. The configuration file does not need to be changed. Currently, direct GPU inference and preprocessing on a 1060 6G GPU is possible. For more details, please refer to the documentation.
>
> 2022.10.27
>
> 更新依赖文件,去除冗余依赖。
>
>
> Updated dependency files and removed redundant dependencies.
>
> 2022.10.27
>
> 修复了一个严重错误,曾导致在 gpu 服务器上 hubert 仍使用 cpu 推理,速度减慢 3-5 倍,影响预处理与推理,不影响训练
>
>
> Fixed a severe bug that caused hubert to use CPU inference on a GPU server, slowing it down by 3-5 times. This issue affects preprocessing and inference, but not training.
>
> 2022.10.26
>
> 修复 windows 上预处理数据在 linux 上无法使用的问题,更新部分文档
>
>
> Fixed the issue of preprocessed data on Windows not being usable on Linux and updated some documentation.
>
> 2022.10.25
>
> 编写推理 / 训练详细文档,修改整合部分代码,增加对 ogg 格式音频的支持 (无需与 wav 区分,直接使用即可)
>
>
> Wrote detailed documentation for inference and training, modified and integrated some code, and added support for ogg audio format (no need to distinguish between ogg and wav, can be used directly).
>
> 2022.10.24
>
> 支持对自定义数据集的训练,并精简代码
>
>
> Supported training on custom datasets and simplified the code.
>
> 2022.10.22
>
> 完成对 opencpop 数据集的训练并创建仓库
>
> Completed training on the opencpop dataset and created a repository.
## 注意事项 /Notes
> 本项目是基于学术交流目的建立,并非为生产环境准备,不对由此项目模型产生的任何声音的版权问题负责。