Updata readme

2023-05-14 21:43:53 +08:00 · 2023-05-14 21:43:53 +08:00 · c3ba30534a
parent 0cef3f22e4
commit c3ba30534a
2 changed files with 99 additions and 12 deletions
--- a/README.md
+++ b/README.md
@ -41,6 +41,19 @@ The singing voice conversion model uses SoftVC content encoder to extract source

 - Feature input is changed to [Content Vec](https://github.com/auspicious3000/contentvec) Transformer output of 12 layer, the branch is not compatible with 4.0 model
  
+### 🆕 Questions about compatibility with the main branch model
+
+- You can support the main branch model by modifying the config.json of the main branch model, adding the speech_encoder field to the Model field of config.json, see below for details
+
+```
+  "model": {
+    .........
+    "ssl_dim": 768,
+    "n_speakers": 200,
+    "speech_encoder":"vec256l9"
+  }
+```
+
 ## 💬 About Python Version

 After conducting tests, we believe that the project runs stably on `Python 3.8.9`.
@ -49,15 +62,24 @@ After conducting tests, we believe that the project runs stably on `Python 3.8.9

 #### **Required**

+**The following encoder needs to select one to use**
+
+##### **1. If using contentvec as sound encoder**
+
 - ContentVec: [checkpoint_best_legacy_500.pt](https://ibm.box.com/s/z1wgl1stco8ffooyatzdwsqn2psd9lrr)
-  - Place it under the `hubert` directory
+  - Place it under the `pretrain` directory

 ```shell
 # contentvec
-wget -P hubert/ http://obs.cstcloud.cn/share/obs/sankagenkeshi/checkpoint_best_legacy_500.pt
+wget -P pretrain/ http://obs.cstcloud.cn/share/obs/sankagenkeshi/checkpoint_best_legacy_500.pt
 # Alternatively, you can manually download and place it in the hubert directory
 ```

+##### **2. If hubertsoft is used as the sound encoder**
+- soft vc hubert：[hubert-soft-0d54a1f4.pt](https://github.com/bshall/hubert/releases/download/v0.1/hubert-soft-0d54a1f4.pt)
+  - Place it under the `pretrain` directory
+
+
 #### **Optional(Strongly recommend)**

 - Pre-trained model files: `G_0.pth` `D_0.pth`
@ -76,7 +98,7 @@ If you are using the NSF-HIFIGAN enhancer, you will need to download the pre-tra

 ```shell
 # nsf_hifigan
-https://github.com/openvpi/vocoders/releases/download/nsf-hifigan-v1/nsf_hifigan_20221211.zip
+wget -P pretrain/ https://github.com/openvpi/vocoders/releases/download/nsf-hifigan-v1/nsf_hifigan_20221211.zip
 # Alternatively, you can manually download and place it in the pretrain/nsf_hifigan directory
 # URL：https://github.com/openvpi/vocoders/releases/tag/nsf-hifigan-v1
 ```
@ -128,15 +150,39 @@ python resample.py
 ### 2. Automatically split the dataset into training and validation sets, and generate configuration files.

 ```shell
-python preprocess_flist_config.py
+python preprocess_flist_config.py --speech_encoder vec768l12
 ```

+speech_encoder has three choices
+
+```
+vec768l12
+vec256l9
+hubertsoft
+```
+
+If the speech_encoder argument is omitted, the default value is vec768l12
+
+
 ### 3. Generate hubert and f0

 ```shell
-python preprocess_hubert_f0.py
+python preprocess_hubert_f0.py --f0_predictor dio
 ```

+f0_predictor has four options
+
+```
+crepe
+dio
+pm
+harvest
+```
+
+If the training set is too noisy, use crepe to handle f0
+
+If the f0_predictor parameter is omitted, the default value is dio
+
 After completing the above steps, the dataset directory will contain the preprocessed data, and the dataset_raw folder can be deleted.

 #### You can modify some parameters in the generated config.json
--- a/README_zh_CN.md
+++ b/README_zh_CN.md
@ -39,7 +39,20 @@

 ### 🆕 4.0-Vec768-Layer12 版本更新内容

-+ 特征输入更换为 [Content Vec](https://github.com/auspicious3000/contentvec) 的第12层Transformer输出，该分支不兼容4.0的模型
+ 特征输入更换为 [Content Vec](https://github.com/auspicious3000/contentvec) 的第12层Transformer输出
+
+### 🆕 关于兼容主分支模型的问题
+
+ 可通过修改主分支模型的config.json对主分支的模型进行支持，需要在config.json的model字段中添加speech_encoder字段，具体见下
+
+```
+  "model": {
+    .........
+    "ssl_dim": 768,
+    "n_speakers": 200,
+    "speech_encoder":"vec256l9"
+  }
+```

 ## 💬 关于 Python 版本问题

@ -49,15 +62,22 @@

 #### **必须项**

+**以下编码器需要选择一个使用**
+
+##### **1. 若使用contentvec作为声音编码器**
 + contentvec ：[checkpoint_best_legacy_500.pt](https://ibm.box.com/s/z1wgl1stco8ffooyatzdwsqn2psd9lrr)
-  + 放在`hubert`目录下
+  + 放在`pretrain`目录下

 ```shell
 # contentvec
-http://obs.cstcloud.cn/share/obs/sankagenkeshi/checkpoint_best_legacy_500.pt
-# 也可手动下载放在hubert目录
+wget -P pretrain/ http://obs.cstcloud.cn/share/obs/sankagenkeshi/checkpoint_best_legacy_500.pt
+# 也可手动下载放在pretrain目录
 ```

+##### **2. 若使用hubertsoft作为声音编码器**
+ soft vc hubert：[hubert-soft-0d54a1f4.pt](https://github.com/bshall/hubert/releases/download/v0.1/hubert-soft-0d54a1f4.pt)
+  + 放在`pretrain`目录下
+
 #### **可选项(强烈建议使用)**

 + 预训练底模文件： `G_0.pth` `D_0.pth`
@ -76,7 +96,7 @@ http://obs.cstcloud.cn/share/obs/sankagenkeshi/checkpoint_best_legacy_500.pt

 ```shell
 # nsf_hifigan
-https://github.com/openvpi/vocoders/releases/download/nsf-hifigan-v1/nsf_hifigan_20221211.zip
+wget -P pretrain/ https://github.com/openvpi/vocoders/releases/download/nsf-hifigan-v1/nsf_hifigan_20221211.zip
 # 也可手动下载放在pretrain/nsf_hifigan目录
 # 地址：https://github.com/openvpi/vocoders/releases/tag/nsf-hifigan-v1
 ```
@ -128,15 +148,36 @@ python resample.py
 ### 2. 自动划分训练集、验证集，以及自动生成配置文件

 ```shell
-python preprocess_flist_config.py
+python preprocess_flist_config.py --speech_encoder vec768l12
 ```

+speech_encoder拥有三个选择
+```
+vec768l12
+vec256l9
+hubertsoft
+```
+
+如果省略speech_encoder参数，默认值为vec768l12
+
 ### 3. 生成hubert与f0

 ```shell
-python preprocess_hubert_f0.py
+python preprocess_hubert_f0.py --f0_predictor dio
 ```

+f0_predictor拥有四个选择
+```
+crepe
+dio
+pm
+harvest
+```
+
+如果训练集过于嘈杂，请使用crepe处理f0
+
+如果省略f0_predictor参数，默认值为dio
+
 执行完以上步骤后 dataset 目录便是预处理完成的数据，可以删除 dataset_raw 文件夹了

 #### 此时可以在生成的config.json修改部分参数