Updata Readme
This commit is contained in:
parent
a2f85c71a0
commit
2b281bc446
18
README.md
18
README.md
|
@ -226,14 +226,6 @@ whisper-ppg
|
||||||
|
|
||||||
If the speech_encoder argument is omitted, the default value is vec768l12
|
If the speech_encoder argument is omitted, the default value is vec768l12
|
||||||
|
|
||||||
#### You can modify some parameters in the generated config.json and diffusion.yaml
|
|
||||||
|
|
||||||
* `keep_ckpts`: Keep the last `keep_ckpts` models during training. Set to `0` will keep them all. Default is `3`.
|
|
||||||
|
|
||||||
* `all_in_mem`, `cache_all_data`: Load all dataset to RAM. It can be enabled when the disk IO of some platforms is too low and the system memory is **much larger** than your dataset.
|
|
||||||
|
|
||||||
* `batch_size`: The amount of data loaded to the GPU for a single training session can be adjusted to a size lower than the video memory capacity.
|
|
||||||
|
|
||||||
**Use loudness embedding**
|
**Use loudness embedding**
|
||||||
|
|
||||||
Add `--vol_aug` if you want to enable loudness embedding:
|
Add `--vol_aug` if you want to enable loudness embedding:
|
||||||
|
@ -244,6 +236,16 @@ python preprocess_flist_config.py --speech_encoder vec768l12 --vol_aug
|
||||||
|
|
||||||
After enabling loudness embedding, the trained model will match the loudness of the input source; otherwise, it will be the loudness of the training set.
|
After enabling loudness embedding, the trained model will match the loudness of the input source; otherwise, it will be the loudness of the training set.
|
||||||
|
|
||||||
|
#### You can modify some parameters in the generated config.json and diffusion.yaml
|
||||||
|
|
||||||
|
* `keep_ckpts`: Keep the last `keep_ckpts` models during training. Set to `0` will keep them all. Default is `3`.
|
||||||
|
|
||||||
|
* `all_in_mem`, `cache_all_data`: Load all dataset to RAM. It can be enabled when the disk IO of some platforms is too low and the system memory is **much larger** than your dataset.
|
||||||
|
|
||||||
|
* `batch_size`: The amount of data loaded to the GPU for a single training session can be adjusted to a size lower than the video memory capacity.
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
### 3. Generate hubert and f0
|
### 3. Generate hubert and f0
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
|
|
|
@ -228,14 +228,6 @@ whisper-ppg
|
||||||
|
|
||||||
如果省略speech_encoder参数,默认值为vec768l12
|
如果省略speech_encoder参数,默认值为vec768l12
|
||||||
|
|
||||||
#### 此时可以在生成的config.json与diffusion.yaml修改部分参数
|
|
||||||
|
|
||||||
* `keep_ckpts`:训练时保留最后几个模型,`0`为保留所有,默认只保留最后`3`个
|
|
||||||
|
|
||||||
* `all_in_mem`,`cache_all_data`:加载所有数据集到内存中,某些平台的硬盘IO过于低下、同时内存容量 **远大于** 数据集体积时可以启用
|
|
||||||
|
|
||||||
* `batch_size`:单次训练加载到GPU的数据量,调整到低于显存容量的大小即可
|
|
||||||
|
|
||||||
**使用响度嵌入**
|
**使用响度嵌入**
|
||||||
|
|
||||||
若使用响度嵌入,需要增加`--vol_aug`参数,比如:
|
若使用响度嵌入,需要增加`--vol_aug`参数,比如:
|
||||||
|
@ -246,6 +238,15 @@ python preprocess_flist_config.py --speech_encoder vec768l12 --vol_aug
|
||||||
|
|
||||||
使用后训练出的模型将匹配到输入源响度,否则为训练集响度。
|
使用后训练出的模型将匹配到输入源响度,否则为训练集响度。
|
||||||
|
|
||||||
|
#### 此时可以在生成的config.json与diffusion.yaml修改部分参数
|
||||||
|
|
||||||
|
* `keep_ckpts`:训练时保留最后几个模型,`0`为保留所有,默认只保留最后`3`个
|
||||||
|
|
||||||
|
* `all_in_mem`,`cache_all_data`:加载所有数据集到内存中,某些平台的硬盘IO过于低下、同时内存容量 **远大于** 数据集体积时可以启用
|
||||||
|
|
||||||
|
* `batch_size`:单次训练加载到GPU的数据量,调整到低于显存容量的大小即可
|
||||||
|
|
||||||
|
|
||||||
### 3. 生成hubert与f0
|
### 3. 生成hubert与f0
|
||||||
|
|
||||||
```shell
|
```shell
|
||||||
|
|
Loading…
Reference in New Issue