Updata Readme

This commit is contained in:
ylzz1997 2023-05-30 18:07:07 +08:00
parent a2f85c71a0
commit 2b281bc446
2 changed files with 19 additions and 16 deletions

View File

@ -226,14 +226,6 @@ whisper-ppg
If the speech_encoder argument is omitted, the default value is vec768l12
#### You can modify some parameters in the generated config.json and diffusion.yaml
* `keep_ckpts`: Keep the last `keep_ckpts` models during training. Set to `0` will keep them all. Default is `3`.
* `all_in_mem`, `cache_all_data`: Load all dataset to RAM. It can be enabled when the disk IO of some platforms is too low and the system memory is **much larger** than your dataset.
* `batch_size`: The amount of data loaded to the GPU for a single training session can be adjusted to a size lower than the video memory capacity.
**Use loudness embedding**
Add `--vol_aug` if you want to enable loudness embedding:
@ -244,6 +236,16 @@ python preprocess_flist_config.py --speech_encoder vec768l12 --vol_aug
After enabling loudness embedding, the trained model will match the loudness of the input source; otherwise, it will be the loudness of the training set.
#### You can modify some parameters in the generated config.json and diffusion.yaml
* `keep_ckpts`: Keep the last `keep_ckpts` models during training. Set to `0` will keep them all. Default is `3`.
* `all_in_mem`, `cache_all_data`: Load all dataset to RAM. It can be enabled when the disk IO of some platforms is too low and the system memory is **much larger** than your dataset.
* `batch_size`: The amount of data loaded to the GPU for a single training session can be adjusted to a size lower than the video memory capacity.
### 3. Generate hubert and f0
```shell

View File

@ -228,14 +228,6 @@ whisper-ppg
如果省略speech_encoder参数默认值为vec768l12
#### 此时可以在生成的config.json与diffusion.yaml修改部分参数
* `keep_ckpts`:训练时保留最后几个模型,`0`为保留所有,默认只保留最后`3`个
* `all_in_mem`,`cache_all_data`加载所有数据集到内存中某些平台的硬盘IO过于低下、同时内存容量 **远大于** 数据集体积时可以启用
* `batch_size`单次训练加载到GPU的数据量调整到低于显存容量的大小即可
**使用响度嵌入**
若使用响度嵌入,需要增加`--vol_aug`参数,比如:
@ -246,6 +238,15 @@ python preprocess_flist_config.py --speech_encoder vec768l12 --vol_aug
使用后训练出的模型将匹配到输入源响度,否则为训练集响度。
#### 此时可以在生成的config.json与diffusion.yaml修改部分参数
* `keep_ckpts`:训练时保留最后几个模型,`0`为保留所有,默认只保留最后`3`个
* `all_in_mem`,`cache_all_data`加载所有数据集到内存中某些平台的硬盘IO过于低下、同时内存容量 **远大于** 数据集体积时可以启用
* `batch_size`单次训练加载到GPU的数据量调整到低于显存容量的大小即可
### 3. 生成hubert与f0
```shell