Updata stft complex

2023-06-16 02:06:51 +08:00 · 2023-06-16 02:06:51 +08:00 · 23a48ff5d6
parent 2def595e02
commit 23a48ff5d6
3 changed files with 4 additions and 4 deletions
--- a/README.md
+++ b/README.md
@ -265,7 +265,7 @@ After enabling loudness embedding, the trained model will match the loudness of

 * `timesteps`: The total number of steps in the diffusion model, which defaults to 1000.

-* `k_step_max`: Training can only train 'k_step_max' step diffusion to save training time, note that the value must be less than 'timesteps', 0 is to train the entire diffusion model, **Note: if you do not train the entire diffusion model will not be able to use only_diffusion!**
+* `k_step_max`: Training can only train `k_step_max` step diffusion to save training time, note that the value must be less than `timesteps`, 0 is to train the entire diffusion model, **Note: if you do not train the entire diffusion model will not be able to use only_diffusion!**

 ##### **List of Vocoders**

--- a/README_zh_CN.md
+++ b/README_zh_CN.md
@ -265,7 +265,7 @@ python preprocess_flist_config.py --speech_encoder vec768l12 --vol_aug

 * `timesteps` : 扩散模型总步数，默认为1000.

-* `k_step_max` : 训练时可仅训练`k_step_max`步扩散以节约训练时间，注意，该值必须小于`timesteps`，0为训练全部整个扩散模型，**注意，如果不训练整个扩散模型将无法使用仅扩散推理!**
+* `k_step_max` : 训练时可仅训练`k_step_max`步扩散以节约训练时间，注意，该值必须小于`timesteps`，0为训练整个扩散模型，**注意，如果不训练整个扩散模型将无法使用仅扩散模型推理!**
  
 ##### **声码器列表**

--- a/modules/mel_processing.py
+++ b/modules/mel_processing.py
@ -64,8 +64,8 @@ def spectrogram_torch(y, n_fft, sampling_rate, hop_size, win_size, center=False)
    y = y.squeeze(1)

    spec = torch.stft(y, n_fft, hop_length=hop_size, win_length=win_size, window=hann_window[wnsize_dtype_device],
-                      center=center, pad_mode='reflect', normalized=False, onesided=True, return_complex=False)
-
+                      center=center, pad_mode='reflect', normalized=False, onesided=True, return_complex=True)
+    spec = torch.view_as_real(spec)
    spec = torch.sqrt(spec.pow(2).sum(-1) + 1e-6)
    return spec