Update README.md

This commit is contained in:
Miuzarte 2023-03-24 16:57:59 +08:00
parent cf5a8fb94c
commit 89a0de2834
2 changed files with 77 additions and 40 deletions

View File

@ -2,38 +2,40 @@
[**English**](./README.md) | [**中文简体**](./README_zh_CN.md)
## Terms of Use
## 📏 Terms of Use
1. This project is established for academic exchange purposes only and is intended for communication and learning purposes. It is not intended for production environments. Please solve the authorization problem of the dataset on your own. You shall be solely responsible for any problems caused by the use of non-authorized datasets for training and all consequences thereof.
# Warning: Please solve the authorization problem of the dataset on your own. You shall be solely responsible for any problems caused by the use of non-authorized datasets for training and all consequences thereof.The repository and its maintainer, svc develop team, have nothing to do with the consequences!
1. This project is established for academic exchange purposes only and is intended for communication and learning purposes. It is not intended for production environments.
2. Any videos based on sovits that are published on video platforms must clearly indicate in the description that they are used for voice changing and specify the input source of the voice or audio, for example, using videos or audios published by others and separating the vocals as input source for conversion, which must provide clear original video or music links. If your own voice or other synthesized voices from other commercial vocal synthesis software are used as the input source for conversion, you must also explain it in the description.
3. You shall be solely responsible for any infringement problems caused by the input source. When using other commercial vocal synthesis software as input source, please ensure that you comply with the terms of use of the software. Note that many vocal synthesis engines clearly state in their terms of use that they cannot be used for input source conversion.
4. Continuing to use this project is deemed as agreeing to the relevant provisions stated in this repository README. This repository README has the obligation to persuade, and is not responsible for any subsequent problems that may arise.
5. If you distribute this repository's code or publish any results produced by this project publicly (including but not limited to video sharing platforms), please indicate the original author and code source (this repository).
6. If you use this project for any other plan, please contact and inform the author of this repository in advance. Thank you very much.
## Model Introduction
## 📝 Model Introduction
The singing voice conversion model uses SoftVC content encoder to extract source audio speech features, and inputs them together with F0 into VITS instead of the original text input to achieve the effect of song conversion. At the same time, the vocoder is changed to [NSF HiFiGAN](https://github.com/openvpi/DiffSinger/tree/refactor/modules/nsf_hifigan) to solve the problem of sound interruption.
### 4.0 v2 update content
### 🆕 4.0 v2 update content
+ The model architecture is completely change to [visinger2](https://github.com/zhangyongmao/VISinger2)
+ Others are exactly the same as [4.0](https://github.com/svc-develop-team/so-vits-svc/tree/4.0).
### 4.0 v2 features
### 🆕 4.0 v2 features
+ It is better than 4.0 in some scenes.For example, the current sound in the breath sound
+ But there is also a certain retrogression in some scene. For example, training with data from streaming of vtubers is not as good as [4.0](https://github.com/svc-develop-team/so-vits-svc/tree/4.0). Also in some cases it will turn out a terrible sound.
+ [4.0-v2](https://github.com/svc-develop-team/so-vits-svc/tree/4.0-v2) is the last version of sovits, there is no more update in the future.
## Note
## Note
+ [4.0-v2](https://github.com/svc-develop-team/so-vits-svc/tree/4.0-v2) and [4.0](https://github.com/svc-develop-team/so-vits-svc/tree/4.0) are almost identical in process, which include preprocessing and requirements.
+ The difference from 4.0 is:
+ The models are **completely different**. Check the version of the pretrained models if you are using them.
+ The structure of config file changed a lot. You can only run `python preprocess_flist_config.py` to generate new `config.json` if you are using preprocessed dataset from 4.0.
## Pre-trained Model Files
## 📥 Pre-trained Model Files
#### **Required**
@ -55,11 +57,11 @@ Get them from svc-develop-team(TBD) or anywhere else.
Although the pretrained model generally does not cause any copyright problems, please pay attention to it. For example, ask the author in advance, or the author has indicated the feasible use in the description clearly.
## Dataset Preparation
## 📊 Dataset Preparation
Simply place the dataset in the `dataset_raw` directory with the following file structure.
```shell
```
dataset_raw
├───speaker0
│ ├───xxx1-xxx1.wav
@ -71,9 +73,19 @@ dataset_raw
└───xxx7-xxx007.wav
```
## Preprocessing
You can customize the speaker name.
1. Resample to 44100hz
```
dataset_raw
└───suijiSUI
├───1.wav
├───...
└───25788785-20221210-200143-856_01_(Vocals)_0_0.wav
```
## 🛠️ Preprocessing
1. Resample to 44100Hz and mono
```shell
python resample.py
@ -93,7 +105,7 @@ python preprocess_hubert_f0.py
After completing the above steps, the dataset directory will contain the preprocessed data, and the dataset_raw folder can be deleted.
## Training
## 🏋️‍♀️ Training
```shell
python train.py -c configs/config.json -m 44k
@ -101,7 +113,7 @@ python train.py -c configs/config.json -m 44k
Note: During training, the old models will be automatically cleared and only the latest three models will be kept. If you want to prevent overfitting, you need to manually backup the model checkpoints, or modify the configuration file `keep_ckpts` to 0 to never clear them.
## Inference
## 🤖 Inference
Use [inference_main.py](https://github.com/svc-develop-team/so-vits-svc/blob/4.0/inference_main.py)
@ -126,7 +138,7 @@ Optional parameters: see the next section
- -cm, --cluster_model_path: path to the clustering model, fill in any value if clustering is not trained.
- -cr, --cluster_infer_ratio: proportion of the clustering solution, range 0-1, fill in 0 if the clustering model is not trained.
## Optional Settings
## 🤔 Optional Settings
If the results from the previous section are satisfactory, or if you didn't understand what is being discussed in the following section, you can skip it, and it won't affect the model usage. (These optional settings have a relatively small impact, and they may have some effect on certain specific data, but in most cases, the difference may not be noticeable.)
@ -153,7 +165,7 @@ The existing steps before clustering do not need to be changed. All you need to
#### [23/03/16] No longer need to download hubert manually
## Exporting to Onnx
## 📤 Exporting to Onnx
Use [onnx_export.py](https://github.com/svc-develop-team/so-vits-svc/blob/4.0/onnx_export.py)
@ -170,7 +182,9 @@ Use [onnx_export.py](https://github.com/svc-develop-team/so-vits-svc/blob/4.0/on
Note: For Hubert Onnx models, please use the models provided by MoeSS. Currently, they cannot be exported on their own (Hubert in fairseq has many unsupported operators and things involving constants that can cause errors or result in problems with the input/output shape and results when exported.) [Hubert4.0](https://huggingface.co/NaruseMioShirakana/MoeSS-SUBModel)
## Some legal provisions for reference
## 📚 Some legal provisions for reference
#### Any country, region, organization, or individual using this project must comply with the following laws.
#### 《民法典》
@ -188,3 +202,9 @@ Note: For Hubert Onnx models, please use the models provided by MoeSS. Currently
【作品侵害名誉权】行为人发表的文学、艺术作品以真人真事或者特定人为描述对象,含有侮辱、诽谤内容,侵害他人名誉权的,受害人有权依法请求该行为人承担民事责任。
行为人发表的文学、艺术作品不以特定人为描述对象,仅其中的情节与该特定人的情况相似的,不承担民事责任。
#### 《[中华人民共和国宪法](http://www.gov.cn/guoqing/2018-03/22/content_5276318.htm)》
#### 《[中华人民共和国刑法](http://gongbao.court.gov.cn/Details/f8e30d0689b23f57bfc782d21035c3.html?sw=%E4%B8%AD%E5%8D%8E%E4%BA%BA%E6%B0%91%E5%85%B1%E5%92%8C%E5%9B%BD%E5%88%91%E6%B3%95)》
#### 《[中华人民共和国民法典](http://gongbao.court.gov.cn/Details/51eb6750b8361f79be8f90d09bc202.html)》

View File

@ -2,38 +2,40 @@
[**English**](./README.md) | [**中文简体**](./README_zh_CN.md)
## 使用规约
## 📏 使用规约
1. 本项目是基于学术交流目的建立,仅供交流与学习使用,并非为生产环境准备,请自行解决数据集的授权问题,任何由于使用非授权数据集进行训练造成的问题,需自行承担全部责任和一切后果!
# Warning请自行解决数据集授权问题禁止使用非授权数据集进行训练任何由于使用非授权数据集进行训练造成的问题需自行承担全部责任和后果与仓库、仓库维护者、svc develop team 无关!
1. 本项目是基于学术交流目的建立,仅供交流与学习使用,并非为生产环境准备。
2. 任何发布到视频平台的基于 sovits 制作的视频,都必须要在简介明确指明用于变声器转换的输入源歌声、音频,例如:使用他人发布的视频 / 音频,通过分离的人声作为输入源进行转换的,必须要给出明确的原视频、音乐链接;若使用是自己的人声,或是使用其他歌声合成引擎合成的声音作为输入源进行转换的,也必须在简介加以说明。
3. 由输入源造成的侵权问题需自行承担全部责任和一切后果。使用其他商用歌声合成软件作为输入源时,请确保遵守该软件的使用条例,注意,许多歌声合成引擎使用条例中明确指明不可用于输入源进行转换!
4. 继续使用视为已同意本仓库 README 所述相关条例,本仓库 README 已进行劝导义务,不对后续可能存在问题负责。
5. 如将本仓库代码二次分发,或将由此项目产出的任何结果公开发表 (包括但不限于视频网站投稿),请注明原作者及代码来源 (此仓库)。
6. 如果将此项目用于任何其他企划,请提前联系并告知本仓库作者,十分感谢。
## 模型简介
## 📝 模型简介
歌声音色转换模型,使用[Content Vec](https://github.com/auspicious3000/contentvec) 提取内容特征输入visinger2模型合成目标声音
### 4.0 v2版本更新内容
### 🆕 4.0 v2版本更新内容
+ 模型架构完全修改成[visinger2](https://github.com/zhangyongmao/VISinger2) 架构
+ 其他和4.0完全一致
### 4.0 v2版本特点
### 🆕 4.0 v2版本特点
+ 在部分场景下比4.0有一定提升(例如部分场景的呼吸音电流音问题)
+ 但也有部分场景效果也有一定倒退例如在猫雷数据上训练出来效果并不如4.0,而且在部分情况会合成出很鬼畜的声音
+ 4.0-v2是sovits的最后一个版本之后不会再有更新
## 注意
## 注意
+ 4.0-v2全部流程与4.0相同环境与4.0相同4.0预处理完成的数据和环境可以直接用
+ 与4.0不同的地方在于:
+ 模型**完全** 不通用,旧模型不可使用,底模也需要使用全新的底模, 请确保你加载了正确的底模否则训练时间会究极长!
+ config文件结构很不一样不要使用老的config如果是使用4.0的数据集则只需要执行preprocess_flist_config.py这一步生成新的config
## 预先下载的模型文件
## 📥 预先下载的模型文件
#### **必须项**
@ -57,11 +59,11 @@ http://obs.cstcloud.cn/share/obs/sankagenkeshi/checkpoint_best_legacy_500.pt
后面部分的readme和4.0一样了,没有变化
## 数据集准备
## 📊 数据集准备
仅需要以以下文件结构将数据集放入dataset_raw目录即可
```shell
```
dataset_raw
├───speaker0
│ ├───xxx1-xxx1.wav
@ -73,7 +75,17 @@ dataset_raw
└───xxx7-xxx007.wav
```
## 数据预处理
可以自定义说话人名称
```
dataset_raw
└───suijiSUI
├───1.wav
├───...
└───25788785-20221210-200143-856_01_(Vocals)_0_0.wav
```
## 🛠️ 数据预处理
1. 重采样至 44100hz
@ -95,14 +107,14 @@ python preprocess_hubert_f0.py
执行完以上步骤后 dataset 目录便是预处理完成的数据可以删除dataset_raw文件夹了
## 训练
## 🏋️‍♀️ 训练
```shell
python train.py -c configs/config.json -m 44k
```
训练时会自动清除老的模型只保留最新3个模型如果想防止过拟合需要自己手动备份模型记录点,或修改配置文件keep_ckpts 0为永不清除
## 推理
## 🤖 推理
使用 [inference_main.py](inference_main.py)
@ -125,7 +137,7 @@ python inference_main.py -m "logs/44k/G_30400.pth" -c "configs/config.json" -n "
+ -cm, --cluster_model_path聚类模型路径如果没有训练聚类则随便填。
+ -cr, --cluster_infer_ratio聚类方案占比范围 0-1若没有训练聚类模型则填 0 即可。
## 可选项
## 🤔 可选项
如果前面的效果已经满意,或者没看明白下面在讲啥,那后面的内容都可以忽略,不影响模型使用(这些可选项影响比较小,可能在某些特定数据上有点效果,但大部分情况似乎都感知不太明显)
@ -152,7 +164,7 @@ python inference_main.py -m "logs/44k/G_30400.pth" -c "configs/config.json" -n "
#### [23/03/16] 不再需要手动下载hubert
## Onnx导出
## 📤 Onnx导出
使用 [onnx_export.py](onnx_export.py)
+ 新建文件夹:`checkpoints` 并打开
@ -169,21 +181,26 @@ python inference_main.py -m "logs/44k/G_30400.pth" -c "configs/config.json" -n "
+ 注意Hubert Onnx模型请使用MoeSS提供的模型目前无法自行导出fairseq中Hubert有不少onnx不支持的算子和涉及到常量的东西在导出时会报错或者导出的模型输入输出shape和结果都有问题
[Hubert4.0](https://huggingface.co/NaruseMioShirakana/MoeSS-SUBModel)
## 一些法律条例参考
## 📚 一些法律条例参考
#### 任何国家,地区,组织和个人使用此项目必须遵守以下法律
#### 《民法典》
##### 第一千零一十九条
##### 第一千零一十九条
任何组织或者个人不得以丑化、污损,或者利用信息技术手段伪造等方式侵害他人的肖像权。未经肖像权人同意,不得制作、使用、公开肖像权人的肖像,但是法律另有规定的除外。
未经肖像权人同意,肖像作品权利人不得以发表、复制、发行、出租、展览等方式使用或者公开肖像权人的肖像。
对自然人声音的保护,参照适用肖像权保护的有关规定。
任何组织或者个人不得以丑化、污损,或者利用信息技术手段伪造等方式侵害他人的肖像权。未经肖像权人同意,不得制作、使用、公开肖像权人的肖像,但是法律另有规定的除外。 未经肖像权人同意,肖像作品权利人不得以发表、复制、发行、出租、展览等方式使用或者公开肖像权人的肖像。 对自然人声音的保护,参照适用肖像权保护的有关规定。
##### 第一千零二十四条
##### 第一千零二十四条
【名誉权】民事主体享有名誉权。任何组织或者个人不得以侮辱、诽谤等方式侵害他人的名誉权。
【名誉权】民事主体享有名誉权。任何组织或者个人不得以侮辱、诽谤等方式侵害他人的名誉权。
##### 第一千零二十七条
##### 第一千零二十七条
【作品侵害名誉权】行为人发表的文学、艺术作品以真人真事或者特定人为描述对象,含有侮辱、诽谤内容,侵害他人名誉权的,受害人有权依法请求该行为人承担民事责任。
行为人发表的文学、艺术作品不以特定人为描述对象,仅其中的情节与该特定人的情况相似的,不承担民事责任。
【作品侵害名誉权】行为人发表的文学、艺术作品以真人真事或者特定人为描述对象,含有侮辱、诽谤内容,侵害他人名誉权的,受害人有权依法请求该行为人承担民事责任。 行为人发表的文学、艺术作品不以特定人为描述对象,仅其中的情节与该特定人的情况相似的,不承担民事责任。
#### 《[中华人民共和国宪法](http://www.gov.cn/guoqing/2018-03/22/content_5276318.htm)》
#### 《[中华人民共和国刑法](http://gongbao.court.gov.cn/Details/f8e30d0689b23f57bfc782d21035c3.html?sw=中华人民共和国刑法)》
#### 《[中华人民共和国民法典](http://gongbao.court.gov.cn/Details/51eb6750b8361f79be8f90d09bc202.html)》