diff --git a/README.md b/README.md index f4c8395..bd67c21 100644 --- a/README.md +++ b/README.md @@ -41,6 +41,19 @@ The singing voice conversion model uses SoftVC content encoder to extract source - Feature input is changed to [Content Vec](https://github.com/auspicious3000/contentvec) Transformer output of 12 layer, the branch is not compatible with 4.0 model +### 🆕 Questions about compatibility with the main branch model + +- You can support the main branch model by modifying the config.json of the main branch model, adding the speech_encoder field to the Model field of config.json, see below for details + +``` + "model": { + ......... + "ssl_dim": 768, + "n_speakers": 200, + "speech_encoder":"vec256l9" + } +``` + ## 💬 About Python Version After conducting tests, we believe that the project runs stably on `Python 3.8.9`. @@ -49,15 +62,24 @@ After conducting tests, we believe that the project runs stably on `Python 3.8.9 #### **Required** +**The following encoder needs to select one to use** + +##### **1. If using contentvec as sound encoder** + - ContentVec: [checkpoint_best_legacy_500.pt](https://ibm.box.com/s/z1wgl1stco8ffooyatzdwsqn2psd9lrr) - - Place it under the `hubert` directory + - Place it under the `pretrain` directory ```shell # contentvec -wget -P hubert/ http://obs.cstcloud.cn/share/obs/sankagenkeshi/checkpoint_best_legacy_500.pt +wget -P pretrain/ http://obs.cstcloud.cn/share/obs/sankagenkeshi/checkpoint_best_legacy_500.pt # Alternatively, you can manually download and place it in the hubert directory ``` +##### **2. If hubertsoft is used as the sound encoder** +- soft vc hubert:[hubert-soft-0d54a1f4.pt](https://github.com/bshall/hubert/releases/download/v0.1/hubert-soft-0d54a1f4.pt) + - Place it under the `pretrain` directory + + #### **Optional(Strongly recommend)** - Pre-trained model files: `G_0.pth` `D_0.pth` @@ -76,7 +98,7 @@ If you are using the NSF-HIFIGAN enhancer, you will need to download the pre-tra ```shell # nsf_hifigan -https://github.com/openvpi/vocoders/releases/download/nsf-hifigan-v1/nsf_hifigan_20221211.zip +wget -P pretrain/ https://github.com/openvpi/vocoders/releases/download/nsf-hifigan-v1/nsf_hifigan_20221211.zip # Alternatively, you can manually download and place it in the pretrain/nsf_hifigan directory # URL:https://github.com/openvpi/vocoders/releases/tag/nsf-hifigan-v1 ``` @@ -128,15 +150,39 @@ python resample.py ### 2. Automatically split the dataset into training and validation sets, and generate configuration files. ```shell -python preprocess_flist_config.py +python preprocess_flist_config.py --speech_encoder vec768l12 ``` +speech_encoder has three choices + +``` +vec768l12 +vec256l9 +hubertsoft +``` + +If the speech_encoder argument is omitted, the default value is vec768l12 + + ### 3. Generate hubert and f0 ```shell -python preprocess_hubert_f0.py +python preprocess_hubert_f0.py --f0_predictor dio ``` +f0_predictor has four options + +``` +crepe +dio +pm +harvest +``` + +If the training set is too noisy, use crepe to handle f0 + +If the f0_predictor parameter is omitted, the default value is dio + After completing the above steps, the dataset directory will contain the preprocessed data, and the dataset_raw folder can be deleted. #### You can modify some parameters in the generated config.json diff --git a/README_zh_CN.md b/README_zh_CN.md index 835bb30..88cd383 100644 --- a/README_zh_CN.md +++ b/README_zh_CN.md @@ -39,7 +39,20 @@ ### 🆕 4.0-Vec768-Layer12 版本更新内容 -+ 特征输入更换为 [Content Vec](https://github.com/auspicious3000/contentvec) 的第12层Transformer输出,该分支不兼容4.0的模型 ++ 特征输入更换为 [Content Vec](https://github.com/auspicious3000/contentvec) 的第12层Transformer输出 + +### 🆕 关于兼容主分支模型的问题 + ++ 可通过修改主分支模型的config.json对主分支的模型进行支持,需要在config.json的model字段中添加speech_encoder字段,具体见下 + +``` + "model": { + ......... + "ssl_dim": 768, + "n_speakers": 200, + "speech_encoder":"vec256l9" + } +``` ## 💬 关于 Python 版本问题 @@ -49,15 +62,22 @@ #### **必须项** +**以下编码器需要选择一个使用** + +##### **1. 若使用contentvec作为声音编码器** + contentvec :[checkpoint_best_legacy_500.pt](https://ibm.box.com/s/z1wgl1stco8ffooyatzdwsqn2psd9lrr) - + 放在`hubert`目录下 + + 放在`pretrain`目录下 ```shell # contentvec -http://obs.cstcloud.cn/share/obs/sankagenkeshi/checkpoint_best_legacy_500.pt -# 也可手动下载放在hubert目录 +wget -P pretrain/ http://obs.cstcloud.cn/share/obs/sankagenkeshi/checkpoint_best_legacy_500.pt +# 也可手动下载放在pretrain目录 ``` +##### **2. 若使用hubertsoft作为声音编码器** ++ soft vc hubert:[hubert-soft-0d54a1f4.pt](https://github.com/bshall/hubert/releases/download/v0.1/hubert-soft-0d54a1f4.pt) + + 放在`pretrain`目录下 + #### **可选项(强烈建议使用)** + 预训练底模文件: `G_0.pth` `D_0.pth` @@ -76,7 +96,7 @@ http://obs.cstcloud.cn/share/obs/sankagenkeshi/checkpoint_best_legacy_500.pt ```shell # nsf_hifigan -https://github.com/openvpi/vocoders/releases/download/nsf-hifigan-v1/nsf_hifigan_20221211.zip +wget -P pretrain/ https://github.com/openvpi/vocoders/releases/download/nsf-hifigan-v1/nsf_hifigan_20221211.zip # 也可手动下载放在pretrain/nsf_hifigan目录 # 地址:https://github.com/openvpi/vocoders/releases/tag/nsf-hifigan-v1 ``` @@ -128,15 +148,36 @@ python resample.py ### 2. 自动划分训练集、验证集,以及自动生成配置文件 ```shell -python preprocess_flist_config.py +python preprocess_flist_config.py --speech_encoder vec768l12 ``` +speech_encoder拥有三个选择 +``` +vec768l12 +vec256l9 +hubertsoft +``` + +如果省略speech_encoder参数,默认值为vec768l12 + ### 3. 生成hubert与f0 ```shell -python preprocess_hubert_f0.py +python preprocess_hubert_f0.py --f0_predictor dio ``` +f0_predictor拥有四个选择 +``` +crepe +dio +pm +harvest +``` + +如果训练集过于嘈杂,请使用crepe处理f0 + +如果省略f0_predictor参数,默认值为dio + 执行完以上步骤后 dataset 目录便是预处理完成的数据,可以删除 dataset_raw 文件夹了 #### 此时可以在生成的config.json修改部分参数