Updata readme.md

This commit is contained in:
ylzz1997 2023-04-09 01:07:12 +08:00
parent ad4f47ab9b
commit ec3ed074b8
2 changed files with 34 additions and 1 deletions

View File

@ -34,7 +34,8 @@ The singing voice conversion model uses SoftVC content encoder to extract source
- The dataset creation and training process are consistent with version 3.0, but the model is completely non-universal, and the data set needs to be fully pre-processed again.
- Added an option 1: automatic pitch prediction for vc mode, which means that you don't need to manually enter the pitch key when converting speech, and the pitch of male and female voices can be automatically converted. However, this mode will cause pitch shift when converting songs.
- Added option 2: reduce timbre leakage through k-means clustering scheme, making the timbre more similar to the target timbre.
- Added option 3: Added [NFS-HIFIGAN Enhancer](https://github.com/yxlllc/DDSP-SVC), which has certain sound quality enhancement effect on some models with few train-sets, but has negative effect on well-trained models, so it is closed by default
## 💬 About Python Version
After conducting tests, we believe that the project runs stably on Python version 3.8.9.
@ -61,6 +62,20 @@ Get them from svc-develop-team(TBD) or anywhere else.
Although the pretrained model generally does not cause any copyright problems, please pay attention to it. For example, ask the author in advance, or the author has indicated the feasible use in the description clearly.
#### **Optional(Select as Required)**
If you are using the NSF-HIFIGAN enhancer, you will need to download the pre-trained NSF-HIFIGAN model, or not if you do not need to download.
- Pre-trained NSF-HIFIGAN Vocoder: [nsf_hifigan_20221211.zip](https://github.com/openvpi/vocoders/releases/download/nsf-hifigan-v1/nsf_hifigan_20221211.zip)
- After unzipping, place putting the four files in the 'pretrain/nsf_hifigan' directory
```shell
# nsf_hifigan
https://github.com/openvpi/vocoders/releases/download/nsf-hifigan-v1/nsf_hifigan_20221211.zip
# Alternatively, you can manually download and place it in the pretrain/nsf_hifigan directory
# URLhttps://github.com/openvpi/vocoders/releases/tag/nsf-hifigan-v1
```
## 📊 Dataset Preparation
Simply place the dataset in the `dataset_raw` directory with the following file structure.
@ -144,6 +159,7 @@ Optional parameters: see the next section
- `-a` | `--auto_predict_f0`: automatic pitch prediction for voice conversion, do not enable this when converting songs as it can cause serious pitch issues.
- `-cm` | `--cluster_model_path`: path to the clustering model, fill in any value if clustering is not trained.
- `-cr` | `--cluster_infer_ratio`: proportion of the clustering solution, range 0-1, fill in 0 if the clustering model is not trained.
- `-eh` | `--enhance`: Whether to use NSF_HIFIGAN enhancer, this option has certain effect on sound quality enhancement for some models with few training sets, but has negative effect on well-trained models, so it is turned off by default
## 🤔 Optional Settings

View File

@ -34,6 +34,7 @@
+ 数据集制作、训练过程和3.0保持一致,但模型完全不通用,数据集也需要全部重新预处理
+ 增加了可选项 1vc模式自动预测音高f0,即转换语音时不需要手动输入变调key男女声的调能自动转换但仅限语音转换该模式转换歌声会跑调
+ 增加了可选项 2通过kmeans聚类方案减小音色泄漏即使得音色更加像目标音色
+ 增加了可选项 3增加了[NFS-HIFIGAN增强器](https://github.com/yxlllc/DDSP-SVC),对部分训练集少的模型有一定的音质增强效果,但是对训练好的模型有反面效果,默认关闭
## 💬 关于 Python 版本问题
@ -61,6 +62,20 @@ http://obs.cstcloud.cn/share/obs/sankagenkeshi/checkpoint_best_legacy_500.pt
虽然底模一般不会引起什么版权问题,但还是请注意一下,比如事先询问作者,又或者作者在模型描述中明确写明了可行的用途
#### **可选项(根据情况选择)**
如果使用NSF-HIFIGAN增强器的话需要下载预训练的NSF-HIFIGAN模型如果不需要可以不下载
+ 预训练的NSF-HIFIGAN声码器 [nsf_hifigan_20221211.zip](https://github.com/openvpi/vocoders/releases/download/nsf-hifigan-v1/nsf_hifigan_20221211.zip)
+ 解压后,将四个文件放在`pretrain/nsf_hifigan`目录下
```shell
# nsf_hifigan
https://github.com/openvpi/vocoders/releases/download/nsf-hifigan-v1/nsf_hifigan_20221211.zip
# 也可手动下载放在pretrain/nsf_hifigan目录
# 地址https://github.com/openvpi/vocoders/releases/tag/nsf-hifigan-v1
```
## 📊 数据集准备
仅需要以以下文件结构将数据集放入dataset_raw目录即可
@ -144,6 +159,7 @@ python inference_main.py -m "logs/44k/G_30400.pth" -c "configs/config.json" -n "
+ `-a` | `--auto_predict_f0`:语音转换自动预测音高,转换歌声时不要打开这个会严重跑调
+ `-cm` | `--cluster_model_path`:聚类模型路径,如果没有训练聚类则随便填
+ `-cr` | `--cluster_infer_ratio`聚类方案占比范围0-1若没有训练聚类模型则默认0即可
+ `-eh` | `--enhance`是否使用NSF_HIFIGAN增强器,该选项对部分训练集少的模型有一定的音质增强效果,但是对训练好的模型有反面效果,默认关闭
## 🤔 可选项
@ -170,6 +186,7 @@ python inference_main.py -m "logs/44k/G_30400.pth" -c "configs/config.json" -n "
### F0均值滤波
介绍对F0进行均值滤波可以有效的减少因音高推测波动造成的哑音由于混响或和声等造成的哑音暂时不能消除。该功能在部分歌曲上提升巨大但是在部分歌曲上会出现跑调的问题。如果歌曲推理后出现哑音可以考虑开启。
+ 在inference_main中设置f0_mean_pooling为true即可
### [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1kv-3y2DmZo0uya8pEr1xk7cSB-4e_Pct?usp=sharing) [sovits4_for_colab.ipynb](https://colab.research.google.com/drive/1kv-3y2DmZo0uya8pEr1xk7cSB-4e_Pct?usp=sharing)