Updata readme
This commit is contained in:
parent
0cef3f22e4
commit
c3ba30534a
56
README.md
56
README.md
|
@ -41,6 +41,19 @@ The singing voice conversion model uses SoftVC content encoder to extract source
|
|||
|
||||
- Feature input is changed to [Content Vec](https://github.com/auspicious3000/contentvec) Transformer output of 12 layer, the branch is not compatible with 4.0 model
|
||||
|
||||
### 🆕 Questions about compatibility with the main branch model
|
||||
|
||||
- You can support the main branch model by modifying the config.json of the main branch model, adding the speech_encoder field to the Model field of config.json, see below for details
|
||||
|
||||
```
|
||||
"model": {
|
||||
.........
|
||||
"ssl_dim": 768,
|
||||
"n_speakers": 200,
|
||||
"speech_encoder":"vec256l9"
|
||||
}
|
||||
```
|
||||
|
||||
## 💬 About Python Version
|
||||
|
||||
After conducting tests, we believe that the project runs stably on `Python 3.8.9`.
|
||||
|
@ -49,15 +62,24 @@ After conducting tests, we believe that the project runs stably on `Python 3.8.9
|
|||
|
||||
#### **Required**
|
||||
|
||||
**The following encoder needs to select one to use**
|
||||
|
||||
##### **1. If using contentvec as sound encoder**
|
||||
|
||||
- ContentVec: [checkpoint_best_legacy_500.pt](https://ibm.box.com/s/z1wgl1stco8ffooyatzdwsqn2psd9lrr)
|
||||
- Place it under the `hubert` directory
|
||||
- Place it under the `pretrain` directory
|
||||
|
||||
```shell
|
||||
# contentvec
|
||||
wget -P hubert/ http://obs.cstcloud.cn/share/obs/sankagenkeshi/checkpoint_best_legacy_500.pt
|
||||
wget -P pretrain/ http://obs.cstcloud.cn/share/obs/sankagenkeshi/checkpoint_best_legacy_500.pt
|
||||
# Alternatively, you can manually download and place it in the hubert directory
|
||||
```
|
||||
|
||||
##### **2. If hubertsoft is used as the sound encoder**
|
||||
- soft vc hubert:[hubert-soft-0d54a1f4.pt](https://github.com/bshall/hubert/releases/download/v0.1/hubert-soft-0d54a1f4.pt)
|
||||
- Place it under the `pretrain` directory
|
||||
|
||||
|
||||
#### **Optional(Strongly recommend)**
|
||||
|
||||
- Pre-trained model files: `G_0.pth` `D_0.pth`
|
||||
|
@ -76,7 +98,7 @@ If you are using the NSF-HIFIGAN enhancer, you will need to download the pre-tra
|
|||
|
||||
```shell
|
||||
# nsf_hifigan
|
||||
https://github.com/openvpi/vocoders/releases/download/nsf-hifigan-v1/nsf_hifigan_20221211.zip
|
||||
wget -P pretrain/ https://github.com/openvpi/vocoders/releases/download/nsf-hifigan-v1/nsf_hifigan_20221211.zip
|
||||
# Alternatively, you can manually download and place it in the pretrain/nsf_hifigan directory
|
||||
# URL:https://github.com/openvpi/vocoders/releases/tag/nsf-hifigan-v1
|
||||
```
|
||||
|
@ -128,15 +150,39 @@ python resample.py
|
|||
### 2. Automatically split the dataset into training and validation sets, and generate configuration files.
|
||||
|
||||
```shell
|
||||
python preprocess_flist_config.py
|
||||
python preprocess_flist_config.py --speech_encoder vec768l12
|
||||
```
|
||||
|
||||
speech_encoder has three choices
|
||||
|
||||
```
|
||||
vec768l12
|
||||
vec256l9
|
||||
hubertsoft
|
||||
```
|
||||
|
||||
If the speech_encoder argument is omitted, the default value is vec768l12
|
||||
|
||||
|
||||
### 3. Generate hubert and f0
|
||||
|
||||
```shell
|
||||
python preprocess_hubert_f0.py
|
||||
python preprocess_hubert_f0.py --f0_predictor dio
|
||||
```
|
||||
|
||||
f0_predictor has four options
|
||||
|
||||
```
|
||||
crepe
|
||||
dio
|
||||
pm
|
||||
harvest
|
||||
```
|
||||
|
||||
If the training set is too noisy, use crepe to handle f0
|
||||
|
||||
If the f0_predictor parameter is omitted, the default value is dio
|
||||
|
||||
After completing the above steps, the dataset directory will contain the preprocessed data, and the dataset_raw folder can be deleted.
|
||||
|
||||
#### You can modify some parameters in the generated config.json
|
||||
|
|
|
@ -39,7 +39,20 @@
|
|||
|
||||
### 🆕 4.0-Vec768-Layer12 版本更新内容
|
||||
|
||||
+ 特征输入更换为 [Content Vec](https://github.com/auspicious3000/contentvec) 的第12层Transformer输出,该分支不兼容4.0的模型
|
||||
+ 特征输入更换为 [Content Vec](https://github.com/auspicious3000/contentvec) 的第12层Transformer输出
|
||||
|
||||
### 🆕 关于兼容主分支模型的问题
|
||||
|
||||
+ 可通过修改主分支模型的config.json对主分支的模型进行支持,需要在config.json的model字段中添加speech_encoder字段,具体见下
|
||||
|
||||
```
|
||||
"model": {
|
||||
.........
|
||||
"ssl_dim": 768,
|
||||
"n_speakers": 200,
|
||||
"speech_encoder":"vec256l9"
|
||||
}
|
||||
```
|
||||
|
||||
## 💬 关于 Python 版本问题
|
||||
|
||||
|
@ -49,15 +62,22 @@
|
|||
|
||||
#### **必须项**
|
||||
|
||||
**以下编码器需要选择一个使用**
|
||||
|
||||
##### **1. 若使用contentvec作为声音编码器**
|
||||
+ contentvec :[checkpoint_best_legacy_500.pt](https://ibm.box.com/s/z1wgl1stco8ffooyatzdwsqn2psd9lrr)
|
||||
+ 放在`hubert`目录下
|
||||
+ 放在`pretrain`目录下
|
||||
|
||||
```shell
|
||||
# contentvec
|
||||
http://obs.cstcloud.cn/share/obs/sankagenkeshi/checkpoint_best_legacy_500.pt
|
||||
# 也可手动下载放在hubert目录
|
||||
wget -P pretrain/ http://obs.cstcloud.cn/share/obs/sankagenkeshi/checkpoint_best_legacy_500.pt
|
||||
# 也可手动下载放在pretrain目录
|
||||
```
|
||||
|
||||
##### **2. 若使用hubertsoft作为声音编码器**
|
||||
+ soft vc hubert:[hubert-soft-0d54a1f4.pt](https://github.com/bshall/hubert/releases/download/v0.1/hubert-soft-0d54a1f4.pt)
|
||||
+ 放在`pretrain`目录下
|
||||
|
||||
#### **可选项(强烈建议使用)**
|
||||
|
||||
+ 预训练底模文件: `G_0.pth` `D_0.pth`
|
||||
|
@ -76,7 +96,7 @@ http://obs.cstcloud.cn/share/obs/sankagenkeshi/checkpoint_best_legacy_500.pt
|
|||
|
||||
```shell
|
||||
# nsf_hifigan
|
||||
https://github.com/openvpi/vocoders/releases/download/nsf-hifigan-v1/nsf_hifigan_20221211.zip
|
||||
wget -P pretrain/ https://github.com/openvpi/vocoders/releases/download/nsf-hifigan-v1/nsf_hifigan_20221211.zip
|
||||
# 也可手动下载放在pretrain/nsf_hifigan目录
|
||||
# 地址:https://github.com/openvpi/vocoders/releases/tag/nsf-hifigan-v1
|
||||
```
|
||||
|
@ -128,15 +148,36 @@ python resample.py
|
|||
### 2. 自动划分训练集、验证集,以及自动生成配置文件
|
||||
|
||||
```shell
|
||||
python preprocess_flist_config.py
|
||||
python preprocess_flist_config.py --speech_encoder vec768l12
|
||||
```
|
||||
|
||||
speech_encoder拥有三个选择
|
||||
```
|
||||
vec768l12
|
||||
vec256l9
|
||||
hubertsoft
|
||||
```
|
||||
|
||||
如果省略speech_encoder参数,默认值为vec768l12
|
||||
|
||||
### 3. 生成hubert与f0
|
||||
|
||||
```shell
|
||||
python preprocess_hubert_f0.py
|
||||
python preprocess_hubert_f0.py --f0_predictor dio
|
||||
```
|
||||
|
||||
f0_predictor拥有四个选择
|
||||
```
|
||||
crepe
|
||||
dio
|
||||
pm
|
||||
harvest
|
||||
```
|
||||
|
||||
如果训练集过于嘈杂,请使用crepe处理f0
|
||||
|
||||
如果省略f0_predictor参数,默认值为dio
|
||||
|
||||
执行完以上步骤后 dataset 目录便是预处理完成的数据,可以删除 dataset_raw 文件夹了
|
||||
|
||||
#### 此时可以在生成的config.json修改部分参数
|
||||
|
|
Loading…
Reference in New Issue