Updata FCPE

This commit is contained in:
ylzz1997 2023-07-23 00:25:03 +08:00
parent ec6f7f5ade
commit 7c7496536f
4 changed files with 23 additions and 5 deletions

View File

@ -174,6 +174,13 @@ If you are using the `rmvpe` F0 Predictor, you will need to download the pre-tra
- download model at [rmvpe.pt](https://huggingface.co/datasets/ylzz1997/rmvpe_pretrain_model/resolve/main/rmvpe.pt)
- Place it under the `pretrain` directory
##### FCPE
If you are using the `fcpe` F0 Predictor, you will need to download the pre-trained RMVPE model.
- download model at [fcpe.pt](https://huggingface.co/datasets/ylzz1997/rmvpe_pretrain_model/resolve/main/fcpe.pt)
- Place it under the `pretrain` directory
## 📊 Dataset Preparation
Simply place the dataset in the `dataset_raw` directory with the following file structure:
@ -312,6 +319,7 @@ dio
pm
harvest
rmvpe
fcpe
```
If the training set is too noisy,it is recommended to use `crepe` to handle f0
@ -363,7 +371,7 @@ Required parameters:
Optional parameters: see the next section
- `-lg` | `--linear_gradient`: The cross fade length of two audio slices in seconds. If there is a discontinuous voice after forced slicing, you can adjust this value. Otherwise, it is recommended to use the default value of 0.
- `-f0p` | `--f0_predictor`: Select a F0 predictor, options are `crepe`, `pm`, `dio`, `harvest`, `rmvpe`, default value is `pm`(note: f0 mean pooling will be enable when using `crepe`)
- `-f0p` | `--f0_predictor`: Select a F0 predictor, options are `crepe`, `pm`, `dio`, `harvest`, `rmvpe`,`fcpe`, default value is `pm`(note: f0 mean pooling will be enable when using `crepe`)
- `-a` | `--auto_predict_f0`: automatic pitch prediction, do not enable this when converting singing voices as it can cause serious pitch issues.
- `-cm` | `--cluster_model_path`: Cluster model or feature retrieval index path, if left blank, it will be automatically set as the default path of these models. If there is no training cluster or feature retrieval, fill in at will.
- `-cr` | `--cluster_infer_ratio`: The proportion of clustering scheme or feature retrieval ranges from 0 to 1. If there is no training clustering model or feature retrieval, the default is 0.

View File

@ -142,7 +142,7 @@ wget -P pretrain/ https://huggingface.co/lj1995/VoiceConversionWebUI/resolve/mai
+ 预训练底模文件: `G_0.pth` `D_0.pth`
+ 放在`logs/44k`目录下
+ 扩散模型预训练底模文件: `model_0.pt `
+ 扩散模型预训练底模文件: `model_0.pt`
+ 放在`logs/44k/diffusion`目录下
从 svc-develop-team待定或任何其他地方获取 Sovits 底模
@ -175,6 +175,15 @@ unzip -od pretrain/nsf_hifigan pretrain/nsf_hifigan_20221211.zip
+ 下载模型 [rmvpe.pt](https://huggingface.co/datasets/ylzz1997/rmvpe_pretrain_model/resolve/main/rmvpe.pt)
+ 放在`pretrain`目录下
##### FCPE
> 你说的对,但是[FCPE](https://github.com/CNChTu/MelPE)是由svc-develop-team自主研发的一款全新的F0预测器后面忘了
如果使用 `fcpe` F0预测器的话需要下载预训练的 FCPE 模型
+ 下载模型 [fcpe.pt](https://huggingface.co/datasets/ylzz1997/rmvpe_pretrain_model/resolve/main/fcpe.pt)
+ 放在`pretrain`目录下
## 📊 数据集准备
@ -313,6 +322,7 @@ dio
pm
harvest
rmvpe
fcpe
```
如果训练集过于嘈杂,请使用 crepe 处理 f0
@ -364,7 +374,7 @@ python inference_main.py -m "logs/44k/G_30400.pth" -c "configs/config.json" -n "
可选项部分:部分具体见下一节
+ `-lg` | `--linear_gradient`:两段音频切片的交叉淡入长度,如果强制切片后出现人声不连贯可调整该数值,如果连贯建议采用默认值 0单位为秒
+ `-f0p` | `--f0_predictor`:选择 F0 预测器,可选择 crepe,pm,dio,harvest,rmvpe, 默认为 pm注意crepe 为原 F0 使用均值滤波器)
+ `-f0p` | `--f0_predictor`:选择 F0 预测器,可选择 crepe,pm,dio,harvest,rmvpe,fcpe, 默认为 pm注意crepe 为原 F0 使用均值滤波器)
+ `-a` | `--auto_predict_f0`:语音转换自动预测音高,转换歌声时不要打开这个会严重跑调
+ `-cm` | `--cluster_model_path`:聚类模型或特征检索索引路径,留空则自动设为各方案模型的默认路径,如果没有训练聚类或特征检索则随便填
+ `-cr` | `--cluster_infer_ratio`:聚类方案或特征检索占比,范围 0-1若没有训练聚类模型或特征检索则默认 0 即可

View File

@ -29,7 +29,7 @@ def main():
parser.add_argument('-cm', '--cluster_model_path', type=str, default="", help='聚类模型或特征检索索引路径,留空则自动设为各方案模型的默认路径,如果没有训练聚类或特征检索则随便填')
parser.add_argument('-cr', '--cluster_infer_ratio', type=float, default=0, help='聚类方案或特征检索占比范围0-1若没有训练聚类模型或特征检索则默认0即可')
parser.add_argument('-lg', '--linear_gradient', type=float, default=0, help='两段音频切片的交叉淡入长度如果强制切片后出现人声不连贯可调整该数值如果连贯建议采用默认值0单位为秒')
parser.add_argument('-f0p', '--f0_predictor', type=str, default="pm", help='选择F0预测器,可选择crepe,pm,dio,harvest,rmvpe,默认为pm(注意crepe为原F0使用均值滤波器)')
parser.add_argument('-f0p', '--f0_predictor', type=str, default="pm", help='选择F0预测器,可选择crepe,pm,dio,harvest,rmvpe,fcpe默认为pm(注意crepe为原F0使用均值滤波器)')
parser.add_argument('-eh', '--enhance', action='store_true', default=False, help='是否使用NSF_HIFIGAN增强器,该选项对部分训练集少的模型有一定的音质增强效果,但是对训练好的模型有反面效果,默认关闭')
parser.add_argument('-shd', '--shallow_diffusion', action='store_true', default=False, help='是否使用浅层扩散使用后可解决一部分电音问题默认关闭该选项打开时NSF_HIFIGAN增强器将会被禁止')
parser.add_argument('-usm', '--use_spk_mix', action='store_true', default=False, help='是否使用角色融合')

View File

@ -137,7 +137,7 @@ if __name__ == "__main__":
'--use_diff',action='store_true', help='Whether to use the diffusion model'
)
parser.add_argument(
'--f0_predictor', type=str, default="dio", help='Select F0 predictor, can select crepe,pm,dio,harvest,rmvpe, default pm(note: crepe is original F0 using mean filter)'
'--f0_predictor', type=str, default="dio", help='Select F0 predictor, can select crepe,pm,dio,harvest,rmvpe,fcpe|default: pm(note: crepe is original F0 using mean filter)'
)
parser.add_argument(
'--num_processes', type=int, default=1, help='You are advised to set the number of processes to the same as the number of CPU cores'