Update README.md

2023-04-04 22:44:55 +08:00 · 2023-04-04 22:44:55 +08:00 · be9ed0be95
parent 0002794d54
commit be9ed0be95
3 changed files with 24 additions and 24 deletions
--- a/README.md
+++ b/README.md
@ -134,9 +134,11 @@ Required parameters:
 - -n, --clean_names: a list of wav file names located in the raw folder.
 - -t, --trans: pitch adjustment, supports positive and negative (semitone) values.
 - -s, --spk_list: target speaker name for synthesis.
- -cl, --clip: voice auto-split,set to 0 to turn off,duration in seconds.
+- -cl, --clip: voice forced slicing, set to 0 to turn off(default), duration in seconds.

 Optional parameters: see the next section
+- -lg, --linear_gradient：The cross fade length of two audio slices in seconds. If there is a discontinuous voice after forced slicing, you can adjust this value. Otherwise, it is recommended to use the default value of 0.
+- -fmp, --f0_mean_pooling：是否对F0使用均值滤波器(池化)，对部分哑音可能有改善。注意，启动该选项会导致推理速度下降，默认关闭
 - -a, --auto_predict_f0: automatic pitch prediction for voice conversion, do not enable this when converting songs as it can cause serious pitch issues.
 - -cm, --cluster_model_path: path to the clustering model, fill in any value if clustering is not trained.
 - -cr, --cluster_infer_ratio: proportion of the clustering solution, range 0-1, fill in 0 if the clustering model is not trained.
@ -148,7 +150,7 @@ If the results from the previous section are satisfactory, or if you didn't unde
 ### Automatic f0 prediction

 During the 4.0 model training, an f0 predictor is also trained, which can be used for automatic pitch prediction during voice conversion. However, if the effect is not good, manual pitch prediction can be used instead. But please do not enable this feature when converting singing voice as it may cause serious pitch shifting!
- Set "auto_predict_f0" to true in inference_main.
+- Set `auto_predict_f0` to true in inference_main.

 ### Cluster-based timbre leakage control

@ -210,9 +212,7 @@ For some reason the author deleted the original repository. Because of the negli

 ##### 第一千零一十九条 

-任何组织或者个人不得以丑化、污损，或者利用信息技术手段伪造等方式侵害他人的肖像权。未经肖像权人同意，不得制作、使用、公开肖像权人的肖像，但是法律另有规定的除外。
-未经肖像权人同意，肖像作品权利人不得以发表、复制、发行、出租、展览等方式使用或者公开肖像权人的肖像。
-对自然人声音的保护，参照适用肖像权保护的有关规定。
+任何组织或者个人不得以丑化、污损，或者利用信息技术手段伪造等方式侵害他人的肖像权。未经肖像权人同意，不得制作、使用、公开肖像权人的肖像，但是法律另有规定的除外。未经肖像权人同意，肖像作品权利人不得以发表、复制、发行、出租、展览等方式使用或者公开肖像权人的肖像。对自然人声音的保护，参照适用肖像权保护的有关规定。

 #####  第一千零二十四条 

@ -220,8 +220,7 @@ For some reason the author deleted the original repository. Because of the negli

 #####  第一千零二十七条

-【作品侵害名誉权】行为人发表的文学、艺术作品以真人真事或者特定人为描述对象，含有侮辱、诽谤内容，侵害他人名誉权的，受害人有权依法请求该行为人承担民事责任。
-行为人发表的文学、艺术作品不以特定人为描述对象，仅其中的情节与该特定人的情况相似的，不承担民事责任。  
+【作品侵害名誉权】行为人发表的文学、艺术作品以真人真事或者特定人为描述对象，含有侮辱、诽谤内容，侵害他人名誉权的，受害人有权依法请求该行为人承担民事责任。行为人发表的文学、艺术作品不以特定人为描述对象，仅其中的情节与该特定人的情况相似的，不承担民事责任。  

 #### 《[中华人民共和国宪法](http://www.gov.cn/guoqing/2018-03/22/content_5276318.htm)》

--- a/README_zh_CN.md
+++ b/README_zh_CN.md
@ -129,17 +129,19 @@ python inference_main.py -m "logs/44k/G_30400.pth" -c "configs/config.json" -n "
 ```

 必填项部分
-+ -m, --model_path：模型路径。
-+ -c, --config_path：配置文件路径。
-+ -n, --clean_names：wav 文件名列表，放在 raw 文件夹下。
-+ -t, --trans：音高调整，支持正负（半音）。
-+ -s, --spk_list：合成目标说话人名称。
-+ -cl, --clip：音频自动切片，0为不切片，单位为秒/s。
+ -m, --model_path：模型路径
+ -c, --config_path：配置文件路径
+ -n, --clean_names：wav 文件名列表，放在 raw 文件夹下
+ -t, --trans：音高调整，支持正负（半音）
+ -s, --spk_list：合成目标说话人名称
+ -cl, --clip：音频强制切片，默认0为自动切片，单位为秒/s

-可选项部分：见下一节
-+ -a, --auto_predict_f0：语音转换自动预测音高，转换歌声时不要打开这个会严重跑调。
-+ -cm, --cluster_model_path：聚类模型路径，如果没有训练聚类则随便填。
-+ -cr, --cluster_infer_ratio：聚类方案占比，范围 0-1，若没有训练聚类模型则填 0 即可。
+可选项部分：部分具体见下一节
+ -lg, --linear_gradient：两段音频切片的交叉淡入长度，如果强制切片后出现人声不连贯可调整该数值，如果连贯建议采用默认值0，单位为秒
+ -fmp, --f0_mean_pooling：是否对F0使用均值滤波器(池化)，对部分哑音可能有改善。注意，启动该选项会导致推理速度下降，默认关闭
+ -a, --auto_predict_f0：语音转换自动预测音高，转换歌声时不要打开这个会严重跑调
+ -cm, --cluster_model_path：聚类模型路径，如果没有训练聚类则随便填
+ -cr, --cluster_infer_ratio：聚类方案占比，范围0-1，若没有训练聚类模型则默认0即可

 ## 🤔 可选项

@ -210,7 +212,7 @@ python inference_main.py -m "logs/44k/G_30400.pth" -c "configs/config.json" -n "

 ##### 第一千零一十九条

-任何组织或者个人不得以丑化、污损，或者利用信息技术手段伪造等方式侵害他人的肖像权。未经肖像权人同意，不得制作、使用、公开肖像权人的肖像，但是法律另有规定的除外。 未经肖像权人同意，肖像作品权利人不得以发表、复制、发行、出租、展览等方式使用或者公开肖像权人的肖像。 对自然人声音的保护，参照适用肖像权保护的有关规定。
+任何组织或者个人不得以丑化、污损，或者利用信息技术手段伪造等方式侵害他人的肖像权。未经肖像权人同意，不得制作、使用、公开肖像权人的肖像，但是法律另有规定的除外。未经肖像权人同意，肖像作品权利人不得以发表、复制、发行、出租、展览等方式使用或者公开肖像权人的肖像。对自然人声音的保护，参照适用肖像权保护的有关规定。

 ##### 第一千零二十四条

@ -218,7 +220,7 @@ python inference_main.py -m "logs/44k/G_30400.pth" -c "configs/config.json" -n "

 ##### 第一千零二十七条

-【作品侵害名誉权】行为人发表的文学、艺术作品以真人真事或者特定人为描述对象，含有侮辱、诽谤内容，侵害他人名誉权的，受害人有权依法请求该行为人承担民事责任。 行为人发表的文学、艺术作品不以特定人为描述对象，仅其中的情节与该特定人的情况相似的，不承担民事责任。
+【作品侵害名誉权】行为人发表的文学、艺术作品以真人真事或者特定人为描述对象，含有侮辱、诽谤内容，侵害他人名誉权的，受害人有权依法请求该行为人承担民事责任。行为人发表的文学、艺术作品不以特定人为描述对象，仅其中的情节与该特定人的情况相似的，不承担民事责任。

 #### 《[中华人民共和国宪法](http://www.gov.cn/guoqing/2018-03/22/content_5276318.htm)》

--- a/inference_main.py
+++ b/inference_main.py
@ -25,17 +25,16 @@ def main():
    # 一定要设置的部分
    parser.add_argument('-m', '--model_path', type=str, default="logs/44k/G_0.pth", help='模型路径')
    parser.add_argument('-c', '--config_path', type=str, default="configs/config.json", help='配置文件路径')
-    parser.add_argument('-cl', '--clip', type=float, default=0, help='音频自动切片，0为不切片，单位为秒/s')
+    parser.add_argument('-cl', '--clip', type=float, default=0, help='音频强制切片，默认0为自动切片，单位为秒/s')
    parser.add_argument('-n', '--clean_names', type=str, nargs='+', default=["君の知らない物語-src.wav"], help='wav文件名列表，放在raw文件夹下')
    parser.add_argument('-t', '--trans', type=int, nargs='+', default=[0], help='音高调整，支持正负（半音）')
    parser.add_argument('-s', '--spk_list', type=str, nargs='+', default=['nen'], help='合成目标说话人名称')

    # 可选项部分
-    parser.add_argument('-a', '--auto_predict_f0', action='store_true', default=False,
-                        help='语音转换自动预测音高，转换歌声时不要打开这个会严重跑调')
+    parser.add_argument('-a', '--auto_predict_f0', action='store_true', default=False,help='语音转换自动预测音高，转换歌声时不要打开这个会严重跑调')
    parser.add_argument('-cm', '--cluster_model_path', type=str, default="logs/44k/kmeans_10000.pt", help='聚类模型路径，如果没有训练聚类则随便填')
-    parser.add_argument('-cr', '--cluster_infer_ratio', type=float, default=0, help='聚类方案占比，范围0-1，若没有训练聚类模型则填0即可')
-    parser.add_argument('-lg', '--linear_gradient', type=float, default=0, help='两段音频切片的交叉淡入长度，如果自动切片后出现人声不连贯可调整该数值，如果连贯建议采用默认值0，单位为秒/s')
+    parser.add_argument('-cr', '--cluster_infer_ratio', type=float, default=0, help='聚类方案占比，范围0-1，若没有训练聚类模型则默认0即可')
+    parser.add_argument('-lg', '--linear_gradient', type=float, default=0, help='两段音频切片的交叉淡入长度，如果强制切片后出现人声不连贯可调整该数值，如果连贯建议采用默认值0，单位为秒')
    parser.add_argument('-fmp', '--f0_mean_pooling', type=bool, default=False, help='是否对F0使用均值滤波器(池化)，对部分哑音可能有改善。注意，启动该选项会导致推理速度下降，默认关闭')

    # 不用动的部分