Compare commits

...

4 Commits

Author SHA1 Message Date
Stardust·减 ea3013cbb7
Eng doc finished 2023-03-10 23:32:12 +08:00
Stardust·减 0f55c55c93
Create README.en.md 2023-03-10 23:13:02 +08:00
Geraint-Dou 00c03c408e
Update README_CN.md 2023-03-10 22:12:06 +08:00
Geraint-Dou bba2d7009c
upload README_CN.md
upload README_CN.md
2023-03-10 21:55:31 +08:00
2 changed files with 391 additions and 0 deletions

189
README.en.md Normal file
View File

@ -0,0 +1,189 @@
# Diff-SVC
Singing Voice Conversion via diffusion model
## This repository is a refactored version of diff-svc fork, with new features such as multi-speaker support, auxiliary scripts, and new Hubert. Please evaluate and assume the risks on your own.
> We recommend using the stable version: [Diff-SVC](https://github.com/prophesier/diff-svc)
>
> The project tutorial can be found in the doc folder. Please do not ask questions about this modified version in the original project channel or Discord.
>
> Under the same parameters, the number of training steps required for the Chinese Hubert is approximately 1.5 to 2 times that of Soft Hubert. It is not recommended for beginners to use.
## Changes Log
> 2023.03.09
>
> Optimized the speed of nsf-hifigan @diffsinger
>
> 2023.02.18
>
> Updated config parameters, added flask_api multi-speaker model, and removed midi a mode diffsinger nesting support @小狼
>
> 2023.01.20
>
> Refactored directory, streamlined code, and removed multiple inheritance @小狼
>
> 2023.01.18
>
> Changed configuration file to cascading, only need to modify config_nsf, config_ms (choose one) for preprocessing @小狼
>
> 2023.01.16
>
> Added multi-speaker support (config_ms.yaml), preprocessing code referenced diffsinger modified by @小狼
>
> 2023.01.09
>
> Added select.py to filter the pitch range of the dataset (when the amount of data is sufficient, remove the duplicate pitch range to speed up the convergence of high and low pitches)
>
> Removed dependencies on 24k pe and hifigan, deleted pitch cwt mode, and reused preprocessing code for inference @小狼
>
> 2023.01.07
>
> Added f0_static parameter for statistical pitch range, and added adaptive pitch shift function (requires f0_static, old model config can use data_static to add this parameter) @小狼
>
> 2023.01.05
>
> Cancelled support for 24k sampling rate and pe, reduced some parameters, added specialized tutorials to the documentation; batch.py supports both specialized and nesting mode export;
>
> pre_hubert is a step-by-step preprocessing (used for preprocessing with 4g or less memory); data_static is for dataset pitch range statistics (for reference only); > > Chinese Hubert requires fairseq dependency, please install it yourself @小狼
>
> 2023.01.01
>
> Updated slicer v2, removed slicer cache, simplified some infer processes; removed vec support, added Chinese Hubert (only base model, around 1.1g) @小狼
>
> 2022.12.17
>
> Added pre_check to detect environment and data @深夜诗人; improved simplify model @九尾玄狐; supervised code @小狼
>
> 2022.12.16
>
> Fixed the problem of repeatedly loading the hubert model during inference @小狼
>
> 2022.12.04
>
> Opened application for 44.1kHz vocoder and officially provided support for 44.1kHz.
>
> 2022.11.28
>
> An option, no_fs2, has been added by default which can optimize some networks, improve training speed, reduce model size, and be effective for new models trained in the future.
>
> 2022.11.23
>
> A major bug has been fixed that could potentially convert the original gt audio used for inference to a sampling rate of 22.05kHz. We apologize for any inconvenience caused and kindly ask that you check your test audio and use the updated code.
>
> 2022.11.22
>
> Many bugs have been fixed, including several major ones that have a significant impact on inference performance.
>
> 2022.11.20
>
> Added support for most formats of input and output during inference, eliminating the need for manual conversion using other software.
>
> 2022.11.13
>
> Corrected the epoch/steps display issue when reading models after interruption, added disk cache for f0 processing, and added support files for real-time pitch-shifting inference.
>
> 2022.11.11
>
> Corrected the duration error during slicing, added support for 44.1kHz, and added support for contentvec.
>
> 2022.11.04
>
> Added the feature to save mel-spectrograms.
>
> 2022.11.02
>
> Integrated the new vocoder code and updated the parselmouth algorithm.
>
> 2022.10.29
>
> Organized the inference section and added the feature for automatic slicing of long audio files.
>
> 2022.10.28
>
> Migrated the hubert onnx inference to torch inference and reorganized the inference logic. If you have previously downloaded the onnx hubert model, please download and replace it with the pt model. The configuration file does not need to be changed. Currently, direct GPU inference and preprocessing on a 1060 6G GPU is possible. For more details, please refer to the documentation.
>
> 2022.10.27
>
> Updated dependency files and removed redundant dependencies.
>
> 2022.10.27
>
> Fixed a severe bug that caused hubert to use CPU inference on a GPU server, slowing it down by 3-5 times. This issue affects preprocessing and inference, but not training.
>
> 2022.10.26
>
> Fixed the issue of preprocessed data on Windows not being usable on Linux and updated some documentation.
>
> 2022.10.25
>
> Wrote detailed documentation for inference and training, modified and integrated some code, and added support for ogg audio format (no need to distinguish between ogg and wav, can be used directly).
>
> 2022.10.24
>
> Supported training on custom datasets and simplified the code.
>
> 2022.10.22
>
> Completed training on the opencpop dataset and created a repository.
## 注意事项 /Notes
> 本项目是基于学术交流目的建立,并非为生产环境准备,不对由此项目模型产生的任何声音的版权问题负责。
>
> 如将本仓库代码二次分发,或将由此项目产出的任何结果公开发表 (包括但不限于视频网站投稿),请注明原作者及代码来源 (此仓库)。
>
> 如果将此项目用于任何其他企划,请提前联系并告知本仓库作者,十分感谢。
> This project is established for academic exchange purposes and is not intended for production environments. We are not
>
> responsible for any copyright issues arising from the sound produced by this project's model.
>
> If you redistribute the code in this repository or publicly publish any results produced by this project (including but not limited to video website submissions), please indicate the original author and source code (this repository).
>
> If you use this project for any other plans, please contact and inform the author of this repository in advance. Thank you very much.
## 推理 /Inference
参考 `infer.py` 进行修改
## 预处理 /PreProcessing:
```sh
export PYTHONPATH=.
CUDA_VISIBLE_DEVICES=0 python preprocessing/svc_binarizer.py --config configs/config_nsf.yaml
```
## 训练 /Training:
```sh
CUDA_VISIBLE_DEVICES=0 python run.py --config configs/config_nsf.yaml --exp_name <your project name> --reset
```
> Links:
>
> 详细训练过程和各种参数介绍请查看 [推理与训练说明](./doc/train_and_inference.markdown)
>
> [中文 hubert 与特化教程](./doc/advanced_skills.markdown)
## 学术 / Acknowledgements
项目基于 [diffsinger](https://github.com/MoonInTheRiver/DiffSinger)、[diffsinger (openvpi 维护版)](https://github.com/openvpi/DiffSinger)、[soft-vc](https://github.com/bshall/soft-vc)
开发.
同时也十分感谢 openvpi 成员在开发训练过程中给予的帮助。
This project is based
on [diffsinger](https://github.com/MoonInTheRiver/DiffSinger), [diffsinger (openvpi maintenance version)](https://github.com/openvpi/DiffSinger),
and [soft-vc](https://github.com/bshall/soft-vc). We would also like to thank the openvpi members for their help during
the development and training process.
> 注意:此项目与同名论文 [DiffSVC](https://arxiv.org/abs/2105.13871) 无任何联系,请勿混淆!
> Note: This project has no connection with the paper of the same name [DiffSVC](https://arxiv.org/abs/2105.13871),
> please
> do not confuse them!
## 工具 / Tools
音频切片参考 [audio-slicer](https://github.com/openvpi/audio-slicer)
Audio Slice Reference [audio-slicer](https://github.com/openvpi/audio-slicer)

202
doc/README_CN.md Normal file
View File

@ -0,0 +1,202 @@
# Diff-SVC
Singing Voice Conversion via diffusion model
## 注意事项
> 本仓库为 diff-svc fork 重构版,新增多说话人、辅助脚本、新 hubert 等,请自行评估并承担风险
>
> 建议使用稳定版:[Diff-SVC](https://github.com/prophesier/diff-svc)
>
> 项目教程在 doc 文件夹下此魔改版问题请勿在原项目频道、discord 等询问。
>
> 同参数下,中文 hubert 所需训练步数约为 soft hubert 的 1.5~2 倍,不建议新手使用
>
> 本项目是基于学术交流目的建立,仅供交流与学习使用并非为生产环境准备,不对由此项目模型产生的任何声音的版权问题负责,请勿用于违法违规或违反公序良德等不良用途
>
> 继续使用视为已同意本仓库README所述相关条例本仓库README已进行劝导义务不对后续可能存在问题负责。
>
> 如将本仓库代码二次分发,或将由此项目产出的任何结果公开发表 (包括但不限于视频网站投稿),请注明原作者及代码来源 (此仓库)。
>
> 如果将此项目用于任何其他企划,请提前联系并告知本仓库作者,十分感谢。
## 训练
详细训练过程和各种参数介绍请查看: [推理与训练说明](./doc/train_and_inference.markdown)
想追求更好的训练效果请查看:[中文 hubert 与特化教程](./doc/advanced_skills.markdown)
## 推理
参考 `infer.py` 内注释进行修改
## 工具
音频切片参考 [audio-slicer](https://github.com/openvpi/audio-slicer)
## 学术
项目基于 [diffsinger](https://github.com/MoonInTheRiver/DiffSinger)、[diffsinger (openvpi 维护版)](https://github.com/openvpi/DiffSinger)、[soft-vc](https://github.com/bshall/soft-vc) 开发
同时也十分感谢 openvpi 成员在开发训练过程中给予的帮助。
## 更新日志
> 2023.03.09
>
> 优化nsf-hifigan速度 @diffsinger
>
> 2023.02.18
>
> 更新config参数增加flask_api多人模型取消midi a模式diffsinger套娃支持 @小狼
>
> 2023.01.20
>
> 重构目录,精简代码,去除多层继承 @小狼
>
> 2023.01.18
>
> 配置文件改为级联,仅需修改 config_nsf、config_ms二选一即可预处理 @小狼
>
> 2023.01.16
>
> 增加多说话人支持 (config_ms.yaml),预处理代码参考 diffsinger 修改 @小狼
>
> 2023.01.09
>
> 新增 select.py 筛选数据集音域(数据量足够时,删去重复音域部分,加快高低音收敛)
>
> 删除 24k 的 pe、hifigan 等依赖,删除 pitch cwt 模式infer 复用预处理部分代码 @小狼
>
> 2023.01.07
>
> 预处理新增 f0_static 超参统计音域,新增自适应变调功能 (需 f0_static旧模型 config 可用 data_static 添加此超参) @小狼
>
> 2023.01.05
>
> 取消 24k 采样率、pe 支持删减部分参数、doc 新增特化教程batch.py 支持特化、套娃两种模式的导出;
>
> pre_hubert 为分步预处理4g 及以下内存预处理使用data_static 为数据集音域统计(仅供参考);中文 hubert 所需依赖 fairseq 请自行安装 @小狼
>
> 2023.01.01
>
> 更新切片机 v2、取消切片缓存简化部分 infer 流程;取消 vec 支持、增加中文 hubert仅 base 模型1.1g 左右)@小狼
>
> 2022.12.17
>
> 新增 pre_check 检测环境、数据 @深夜诗人;改进 simplify 精简模型 @九尾玄狐;监修代码 @小狼
>
> 2022.12.16
>
> 修复推理时 hubert 模型重复加载的问题 @小狼
>
> 2022.12.04
>
> 44.1kHz 声码器开放申请,正式提供对 44.1kHz 的支持
>
> 2022.11.28
>
> 增加了默认打开的 no_fs2 选项,可优化部分网络,提升训练速度、缩减模型体积,对于未来新训练的模型有效
>
> 2022.11.23
>
> 修复了一个重大 bug曾导致可能将用于推理的原始 gt 音频转变采样率为 22.05kHz, 对于由此造成的影响我们表示十分抱歉,请务必检查自己的测试音频,并使用更新后的代码
>
> 2022.11.22
>
> 修复了很多 bug其中有几个影响推理效果重大的 bug
>
> 2022.11.20
>
> 增加对推理时多数格式的输入和保存,无需手动借助其他软件转换
>
> 2022.11.13
>
> 修正中断后读取模型的 epoch/steps 显示问题,添加 f0 处理的磁盘缓存,添加实时变声推理的支持文件
>
> 2022.11.11
>
> 修正切片时长误差,补充对 44.1khz 的适配,增加对 contentvec 的支持
>
> 2022.11.04
>
> 添加梅尔谱保存功能
>
> 2022.11.02
>
> 整合新声码器代码,更新 parselmouth 算法
>
> 2022.10.29
>
> 整理推理部分,添加长音频自动切片功能。
>
> 2022.10.28
> 将 hubert 的 onnx 推理迁移为 torch 推理,并整理推理逻辑。
>
> <font color=#FFA500 > 如原先下载过 onnx 的 hubert 模型需重新下载并替换为 pt 模型 </font>config 不需要改,目前可以实现 1060
> 6G 显存的直接 GPU 推理与预处理,详情请查看文档。
>
> 2022.10.27
>
> 更新依赖文件,去除冗余依赖。
>
> 2022.10.27
>
> 修复了一个严重错误,曾导致在 gpu 服务器上 hubert 仍使用 cpu 推理,速度减慢 3-5 倍,影响预处理与推理,不影响训练
>
> 2022.10.26
>
> 修复 windows 上预处理数据在 linux 上无法使用的问题,更新部分文档
>
> 2022.10.25
>
> 编写推理 / 训练详细文档,修改整合部分代码,增加对 ogg 格式音频的支持 (无需与 wav 区分,直接使用即可)
>
> 2022.10.24
>
> 支持对自定义数据集的训练,并精简代码
>
> 2022.10.22
>
> 完成对 opencpop 数据集的训练并创建仓库
## 一些法律条文参考
#### 《民法典》
##### 第一千零一十九条
任何组织或者个人**不得**以丑化、污损,或者利用信息技术手段伪造等方式侵害他人的肖像权。**未经**肖像权人同意,**不得**制作、使用、公开肖像权人的肖像,但是法律另有规定的除外。
**未经**肖像权人同意,肖像作品权利人不得以发表、复制、发行、出租、展览等方式使用或者公开肖像权人的肖像。
对自然人声音的保护,参照适用肖像权保护的有关规定。
##### 第一千零二十四条
【名誉权】民事主体享有名誉权。任何组织或者个人**不得**以侮辱、诽谤等方式侵害他人的名誉权。
##### 第一千零二十七条
【作品侵害名誉权】行为人发表的文学、艺术作品以真人真事或者特定人为描述对象,含有侮辱、诽谤内容,侵害他人名誉权的,受害人有权依法请求该行为人承担民事责任。
行为人发表的文学、艺术作品不以特定人为描述对象,仅其中的情节与该特定人的情况相似的,不承担民事责任。