From 2b281bc446f0ccc3b27c6e5aa2380660d2730c2f Mon Sep 17 00:00:00 2001 From: ylzz1997 Date: Tue, 30 May 2023 18:07:07 +0800 Subject: [PATCH] Updata Readme --- README.md | 18 ++++++++++-------- README_zh_CN.md | 17 +++++++++-------- 2 files changed, 19 insertions(+), 16 deletions(-) diff --git a/README.md b/README.md index a86b3d3..763f6e5 100644 --- a/README.md +++ b/README.md @@ -226,14 +226,6 @@ whisper-ppg If the speech_encoder argument is omitted, the default value is vec768l12 -#### You can modify some parameters in the generated config.json and diffusion.yaml - -* `keep_ckpts`: Keep the last `keep_ckpts` models during training. Set to `0` will keep them all. Default is `3`. - -* `all_in_mem`, `cache_all_data`: Load all dataset to RAM. It can be enabled when the disk IO of some platforms is too low and the system memory is **much larger** than your dataset. - -* `batch_size`: The amount of data loaded to the GPU for a single training session can be adjusted to a size lower than the video memory capacity. - **Use loudness embedding** Add `--vol_aug` if you want to enable loudness embedding: @@ -244,6 +236,16 @@ python preprocess_flist_config.py --speech_encoder vec768l12 --vol_aug After enabling loudness embedding, the trained model will match the loudness of the input source; otherwise, it will be the loudness of the training set. +#### You can modify some parameters in the generated config.json and diffusion.yaml + +* `keep_ckpts`: Keep the last `keep_ckpts` models during training. Set to `0` will keep them all. Default is `3`. + +* `all_in_mem`, `cache_all_data`: Load all dataset to RAM. It can be enabled when the disk IO of some platforms is too low and the system memory is **much larger** than your dataset. + +* `batch_size`: The amount of data loaded to the GPU for a single training session can be adjusted to a size lower than the video memory capacity. + + + ### 3. Generate hubert and f0 ```shell diff --git a/README_zh_CN.md b/README_zh_CN.md index 0703333..9fc898e 100644 --- a/README_zh_CN.md +++ b/README_zh_CN.md @@ -228,14 +228,6 @@ whisper-ppg 如果省略speech_encoder参数,默认值为vec768l12 -#### 此时可以在生成的config.json与diffusion.yaml修改部分参数 - -* `keep_ckpts`:训练时保留最后几个模型,`0`为保留所有,默认只保留最后`3`个 - -* `all_in_mem`,`cache_all_data`:加载所有数据集到内存中,某些平台的硬盘IO过于低下、同时内存容量 **远大于** 数据集体积时可以启用 - -* `batch_size`:单次训练加载到GPU的数据量,调整到低于显存容量的大小即可 - **使用响度嵌入** 若使用响度嵌入,需要增加`--vol_aug`参数,比如: @@ -246,6 +238,15 @@ python preprocess_flist_config.py --speech_encoder vec768l12 --vol_aug 使用后训练出的模型将匹配到输入源响度,否则为训练集响度。 +#### 此时可以在生成的config.json与diffusion.yaml修改部分参数 + +* `keep_ckpts`:训练时保留最后几个模型,`0`为保留所有,默认只保留最后`3`个 + +* `all_in_mem`,`cache_all_data`:加载所有数据集到内存中,某些平台的硬盘IO过于低下、同时内存容量 **远大于** 数据集体积时可以启用 + +* `batch_size`:单次训练加载到GPU的数据量,调整到低于显存容量的大小即可 + + ### 3. 生成hubert与f0 ```shell