Updata readme

This commit is contained in:
ylzz1997 2023-05-14 21:43:53 +08:00
parent 0cef3f22e4
commit c3ba30534a
2 changed files with 99 additions and 12 deletions

View File

@ -41,6 +41,19 @@ The singing voice conversion model uses SoftVC content encoder to extract source
- Feature input is changed to [Content Vec](https://github.com/auspicious3000/contentvec) Transformer output of 12 layer, the branch is not compatible with 4.0 model
### 🆕 Questions about compatibility with the main branch model
- You can support the main branch model by modifying the config.json of the main branch model, adding the speech_encoder field to the Model field of config.json, see below for details
```
"model": {
.........
"ssl_dim": 768,
"n_speakers": 200,
"speech_encoder":"vec256l9"
}
```
## 💬 About Python Version
After conducting tests, we believe that the project runs stably on `Python 3.8.9`.
@ -49,15 +62,24 @@ After conducting tests, we believe that the project runs stably on `Python 3.8.9
#### **Required**
**The following encoder needs to select one to use**
##### **1. If using contentvec as sound encoder**
- ContentVec: [checkpoint_best_legacy_500.pt](https://ibm.box.com/s/z1wgl1stco8ffooyatzdwsqn2psd9lrr)
- Place it under the `hubert` directory
- Place it under the `pretrain` directory
```shell
# contentvec
wget -P hubert/ http://obs.cstcloud.cn/share/obs/sankagenkeshi/checkpoint_best_legacy_500.pt
wget -P pretrain/ http://obs.cstcloud.cn/share/obs/sankagenkeshi/checkpoint_best_legacy_500.pt
# Alternatively, you can manually download and place it in the hubert directory
```
##### **2. If hubertsoft is used as the sound encoder**
- soft vc hubert[hubert-soft-0d54a1f4.pt](https://github.com/bshall/hubert/releases/download/v0.1/hubert-soft-0d54a1f4.pt)
- Place it under the `pretrain` directory
#### **Optional(Strongly recommend)**
- Pre-trained model files: `G_0.pth` `D_0.pth`
@ -76,7 +98,7 @@ If you are using the NSF-HIFIGAN enhancer, you will need to download the pre-tra
```shell
# nsf_hifigan
https://github.com/openvpi/vocoders/releases/download/nsf-hifigan-v1/nsf_hifigan_20221211.zip
wget -P pretrain/ https://github.com/openvpi/vocoders/releases/download/nsf-hifigan-v1/nsf_hifigan_20221211.zip
# Alternatively, you can manually download and place it in the pretrain/nsf_hifigan directory
# URLhttps://github.com/openvpi/vocoders/releases/tag/nsf-hifigan-v1
```
@ -128,15 +150,39 @@ python resample.py
### 2. Automatically split the dataset into training and validation sets, and generate configuration files.
```shell
python preprocess_flist_config.py
python preprocess_flist_config.py --speech_encoder vec768l12
```
speech_encoder has three choices
```
vec768l12
vec256l9
hubertsoft
```
If the speech_encoder argument is omitted, the default value is vec768l12
### 3. Generate hubert and f0
```shell
python preprocess_hubert_f0.py
python preprocess_hubert_f0.py --f0_predictor dio
```
f0_predictor has four options
```
crepe
dio
pm
harvest
```
If the training set is too noisy, use crepe to handle f0
If the f0_predictor parameter is omitted, the default value is dio
After completing the above steps, the dataset directory will contain the preprocessed data, and the dataset_raw folder can be deleted.
#### You can modify some parameters in the generated config.json

View File

@ -39,7 +39,20 @@
### 🆕 4.0-Vec768-Layer12 版本更新内容
+ 特征输入更换为 [Content Vec](https://github.com/auspicious3000/contentvec) 的第12层Transformer输出该分支不兼容4.0的模型
+ 特征输入更换为 [Content Vec](https://github.com/auspicious3000/contentvec) 的第12层Transformer输出
### 🆕 关于兼容主分支模型的问题
+ 可通过修改主分支模型的config.json对主分支的模型进行支持需要在config.json的model字段中添加speech_encoder字段具体见下
```
"model": {
.........
"ssl_dim": 768,
"n_speakers": 200,
"speech_encoder":"vec256l9"
}
```
## 💬 关于 Python 版本问题
@ -49,15 +62,22 @@
#### **必须项**
**以下编码器需要选择一个使用**
##### **1. 若使用contentvec作为声音编码器**
+ contentvec [checkpoint_best_legacy_500.pt](https://ibm.box.com/s/z1wgl1stco8ffooyatzdwsqn2psd9lrr)
+ 放在`hubert`目录下
+ 放在`pretrain`目录下
```shell
# contentvec
http://obs.cstcloud.cn/share/obs/sankagenkeshi/checkpoint_best_legacy_500.pt
# 也可手动下载放在hubert目录
wget -P pretrain/ http://obs.cstcloud.cn/share/obs/sankagenkeshi/checkpoint_best_legacy_500.pt
# 也可手动下载放在pretrain目录
```
##### **2. 若使用hubertsoft作为声音编码器**
+ soft vc hubert[hubert-soft-0d54a1f4.pt](https://github.com/bshall/hubert/releases/download/v0.1/hubert-soft-0d54a1f4.pt)
+ 放在`pretrain`目录下
#### **可选项(强烈建议使用)**
+ 预训练底模文件: `G_0.pth` `D_0.pth`
@ -76,7 +96,7 @@ http://obs.cstcloud.cn/share/obs/sankagenkeshi/checkpoint_best_legacy_500.pt
```shell
# nsf_hifigan
https://github.com/openvpi/vocoders/releases/download/nsf-hifigan-v1/nsf_hifigan_20221211.zip
wget -P pretrain/ https://github.com/openvpi/vocoders/releases/download/nsf-hifigan-v1/nsf_hifigan_20221211.zip
# 也可手动下载放在pretrain/nsf_hifigan目录
# 地址https://github.com/openvpi/vocoders/releases/tag/nsf-hifigan-v1
```
@ -128,15 +148,36 @@ python resample.py
### 2. 自动划分训练集、验证集,以及自动生成配置文件
```shell
python preprocess_flist_config.py
python preprocess_flist_config.py --speech_encoder vec768l12
```
speech_encoder拥有三个选择
```
vec768l12
vec256l9
hubertsoft
```
如果省略speech_encoder参数默认值为vec768l12
### 3. 生成hubert与f0
```shell
python preprocess_hubert_f0.py
python preprocess_hubert_f0.py --f0_predictor dio
```
f0_predictor拥有四个选择
```
crepe
dio
pm
harvest
```
如果训练集过于嘈杂请使用crepe处理f0
如果省略f0_predictor参数默认值为dio
执行完以上步骤后 dataset 目录便是预处理完成的数据,可以删除 dataset_raw 文件夹了
#### 此时可以在生成的config.json修改部分参数