Update README.md. close #64

This commit is contained in:
Miuzarte 2023-03-20 19:07:59 +08:00
parent 849c792bcf
commit 31d6fb7807
2 changed files with 25 additions and 21 deletions

View File

@ -2,26 +2,30 @@
[**English**](./README.md) | [**中文简体**](./README_zh_CN.md)
## Terms of Use
#### ✨ A fork with a greatly improved interface: [34j/so-vits-svc-fork](https://github.com/34j/so-vits-svc-fork)
1. This project is established for academic exchange purposes only and is intended for communication and learning purposes. It is not intended for production environments. Please solve the authorization problem of the dataset on your own. You shall be solely responsible for any problems caused by the use of non-authorized datasets for training and all consequences thereof.The repository and its maintainer, svc develop team, have nothing to do with the consequences!
#### ✨ A client supports real-time conversion: [w-okada/voice-changer](https://github.com/w-okada/voice-changer)
## 📏 Terms of Use
# Warning: Please solve the authorization problem of the dataset on your own. You shall be solely responsible for any problems caused by the use of non-authorized datasets for training and all consequences thereof.The repository and its maintainer, svc develop team, have nothing to do with the consequences!
1. This project is established for academic exchange purposes only and is intended for communication and learning purposes. It is not intended for production environments.
2. Any videos based on sovits that are published on video platforms must clearly indicate in the description that they are used for voice changing and specify the input source of the voice or audio, for example, using videos or audios published by others and separating the vocals as input source for conversion, which must provide clear original video or music links. If your own voice or other synthesized voices from other commercial vocal synthesis software are used as the input source for conversion, you must also explain it in the description.
3. You shall be solely responsible for any infringement problems caused by the input source. When using other commercial vocal synthesis software as input source, please ensure that you comply with the terms of use of the software. Note that many vocal synthesis engines clearly state in their terms of use that they cannot be used for input source conversion.
4. Continuing to use this project is deemed as agreeing to the relevant provisions stated in this repository README. This repository README has the obligation to persuade, and is not responsible for any subsequent problems that may arise.
5. If you distribute this repository's code or publish any results produced by this project publicly (including but not limited to video sharing platforms), please indicate the original author and code source (this repository).
6. If you use this project for any other plan, please contact and inform the author of this repository in advance. Thank you very much.
### A fork with a greatly improved interface[34j/so-vits-svc-fork](https://github.com/34j/so-vits-svc-fork)
## Update
## 🆕 Update!
> Updated the 4.0-v2 model, the entire process is the same as 4.0. Compared to 4.0, there is some improvement in certain scenarios, but there are also some cases where it has regressed. Please refer to the [4.0-v2 branch](https://github.com/svc-develop-team/so-vits-svc/tree/4.0-v2) for more information.
## Model Introduction
## 📝 Model Introduction
The singing voice conversion model uses SoftVC content encoder to extract source audio speech features, then the vectors are directly fed into VITS instead of converting to a text based intermediate; thus the pitch and intonations are conserved. Additionally, the vocoder is changed to [NSF HiFiGAN](https://github.com/openvpi/DiffSinger/tree/refactor/modules/nsf_hifigan) to solve the problem of sound interruption.
### 4.0 Version Update Content
### 🆕 4.0 Version Update Content
- Feature input is changed to [Content Vec](https://github.com/auspicious3000/contentvec)
- The sampling rate is unified to use 44100Hz
@ -31,7 +35,7 @@ The singing voice conversion model uses SoftVC content encoder to extract source
- Added an option 1: automatic pitch prediction for vc mode, which means that you don't need to manually enter the pitch key when converting speech, and the pitch of male and female voices can be automatically converted. However, this mode will cause pitch shift when converting songs.
- Added option 2: reduce timbre leakage through k-means clustering scheme, making the timbre more similar to the target timbre.
## Pre-trained Model Files
## 📥 Pre-trained Model Files
#### **Required**
@ -53,7 +57,7 @@ Get them from svc-develop-team(TBD) or anywhere else.
Although the pretrained model generally does not cause any copyright problems, please pay attention to it. For example, ask the author in advance, or the author has indicated the feasible use in the description clearly.
## Dataset Preparation
## 📊 Dataset Preparation
Simply place the dataset in the `dataset_raw` directory with the following file structure.
@ -69,7 +73,7 @@ dataset_raw
└───xxx7-xxx007.wav
```
## Preprocessing
## 🛠️ Preprocessing
1. Resample to 44100hz
@ -91,7 +95,7 @@ python preprocess_hubert_f0.py
After completing the above steps, the dataset directory will contain the preprocessed data, and the dataset_raw folder can be deleted.
## Training
## 🏋️‍♀️ Training
```shell
python train.py -c configs/config.json -m 44k
@ -99,7 +103,7 @@ python train.py -c configs/config.json -m 44k
Note: During training, the old models will be automatically cleared and only the latest three models will be kept. If you want to prevent overfitting, you need to manually backup the model checkpoints, or modify the configuration file `keep_ckpts` to 0 to never clear them.
## Inference
## 🤖 Inference
Use [inference_main.py](https://github.com/svc-develop-team/so-vits-svc/blob/4.0/inference_main.py)
@ -111,7 +115,6 @@ python inference_main.py -m "logs/44k/G_30400.pth" -c "configs/config.json" -n "
```
Required parameters:
- -m, --model_path: path to the model.
- -c, --config_path: path to the configuration file.
- -n, --clean_names: a list of wav file names located in the raw folder.
@ -119,19 +122,17 @@ Required parameters:
- -s, --spk_list: target speaker name for synthesis.
Optional parameters: see the next section
- -a, --auto_predict_f0: automatic pitch prediction for voice conversion, do not enable this when converting songs as it can cause serious pitch issues.
- -cm, --cluster_model_path: path to the clustering model, fill in any value if clustering is not trained.
- -cr, --cluster_infer_ratio: proportion of the clustering solution, range 0-1, fill in 0 if the clustering model is not trained.
## Optional Settings
## 🤔 Optional Settings
If the results from the previous section are satisfactory, or if you didn't understand what is being discussed in the following section, you can skip it, and it won't affect the model usage. (These optional settings have a relatively small impact, and they may have some effect on certain specific data, but in most cases, the difference may not be noticeable.)
### Automatic f0 prediction
During the 4.0 model training, an f0 predictor is also trained, which can be used for automatic pitch prediction during voice conversion. However, if the effect is not good, manual pitch prediction can be used instead. But please do not enable this feature when converting singing voice as it may cause serious pitch shifting!
- Set "auto_predict_f0" to true in inference_main.
### Cluster-based timbre leakage control
@ -151,7 +152,7 @@ The existing steps before clustering do not need to be changed. All you need to
#### [23/03/16] No longer need to download hubert manually
## Exporting to Onnx
## 📤 Exporting to Onnx
Use [onnx_export.py](https://github.com/svc-develop-team/so-vits-svc/blob/4.0/onnx_export.py)
@ -168,7 +169,7 @@ Use [onnx_export.py](https://github.com/svc-develop-team/so-vits-svc/blob/4.0/on
Note: For Hubert Onnx models, please use the models provided by MoeSS. Currently, they cannot be exported on their own (Hubert in fairseq has many unsupported operators and things involving constants that can cause errors or result in problems with the input/output shape and results when exported.) [Hubert4.0](https://huggingface.co/NaruseMioShirakana/MoeSS-SUBModel)
## Some legal provisions for reference
## 📚 Some legal provisions for reference
#### 《民法典》

View File

@ -4,8 +4,12 @@
#### ✨ 改善了交互的一个分支推荐:[34j/so-vits-svc-fork](https://github.com/34j/so-vits-svc-fork)
#### ✨ 支持实时转换的一个客户端:[w-okada/voice-changer](https://github.com/w-okada/voice-changer)
## 📏 使用规约
# Warning请自行解决数据集授权问题禁止使用非授权数据集进行训练任何由于使用非授权数据集进行训练造成的问题需自行承担全部责任和后果与仓库、仓库维护者、svc develop team 无关!
1. 本项目是基于学术交流目的建立,仅供交流与学习使用,并非为生产环境准备。
2. 任何发布到视频平台的基于 sovits 制作的视频,都必须要在简介明确指明用于变声器转换的输入源歌声、音频,例如:使用他人发布的视频 / 音频,通过分离的人声作为输入源进行转换的,必须要给出明确的原视频、音乐链接;若使用是自己的人声,或是使用其他歌声合成引擎合成的声音作为输入源进行转换的,也必须在简介加以说明。
3. 由输入源造成的侵权问题需自行承担全部责任和一切后果。使用其他商用歌声合成软件作为输入源时,请确保遵守该软件的使用条例,注意,许多歌声合成引擎使用条例中明确指明不可用于输入源进行转换!
@ -13,7 +17,6 @@
5. 如将本仓库代码二次分发,或将由此项目产出的任何结果公开发表 (包括但不限于视频网站投稿),请注明原作者及代码来源 (此仓库)。
6. 如果将此项目用于任何其他企划,请提前联系并告知本仓库作者,十分感谢。
## 🆕 Update!
> 更新了4.0-v2模型全部流程同4.0相比4.0在部分场景下有一定提升,但也有些情况有退步,具体可移步[4.0-v2分支](https://github.com/svc-develop-team/so-vits-svc/tree/4.0-v2)
@ -97,6 +100,7 @@ python preprocess_hubert_f0.py
```shell
python train.py -c configs/config.json -m 44k
```
训练时会自动清除老的模型只保留最新3个模型如果想防止过拟合需要自己手动备份模型记录点,或修改配置文件keep_ckpts 0为永不清除
## 🤖 推理
@ -133,8 +137,7 @@ python inference_main.py -m "logs/44k/G_30400.pth" -c "configs/config.json" -n "
### 聚类音色泄漏控制
介绍:聚类方案可以减小音色泄漏,使得模型训练出来更像目标的音色(但其实不是特别明显),但是单纯的聚类方案会降低模型的咬字(会口齿不清)(这个很明显),本模型采用了融合的方式,
可以线性控制聚类方案与非聚类方案的占比,也就是可以手动在"像目标音色" 和 "咬字清晰" 之间调整比例,找到合适的折中点。
介绍:聚类方案可以减小音色泄漏,使得模型训练出来更像目标的音色(但其实不是特别明显),但是单纯的聚类方案会降低模型的咬字(会口齿不清)(这个很明显),本模型采用了融合的方式,可以线性控制聚类方案与非聚类方案的占比,也就是可以手动在"像目标音色" 和 "咬字清晰" 之间调整比例,找到合适的折中点。
使用聚类前面的已有步骤不用进行任何的变动,只需要额外训练一个聚类模型,虽然效果比较有限,但训练成本也比较低