Updata stft complex

This commit is contained in:
ylzz1997 2023-06-16 02:06:51 +08:00
parent 2def595e02
commit 23a48ff5d6
3 changed files with 4 additions and 4 deletions

View File

@ -265,7 +265,7 @@ After enabling loudness embedding, the trained model will match the loudness of
* `timesteps`: The total number of steps in the diffusion model, which defaults to 1000.
* `k_step_max`: Training can only train 'k_step_max' step diffusion to save training time, note that the value must be less than 'timesteps', 0 is to train the entire diffusion model, **Note: if you do not train the entire diffusion model will not be able to use only_diffusion!**
* `k_step_max`: Training can only train `k_step_max` step diffusion to save training time, note that the value must be less than `timesteps`, 0 is to train the entire diffusion model, **Note: if you do not train the entire diffusion model will not be able to use only_diffusion!**
##### **List of Vocoders**

View File

@ -265,7 +265,7 @@ python preprocess_flist_config.py --speech_encoder vec768l12 --vol_aug
* `timesteps` : 扩散模型总步数默认为1000.
* `k_step_max` : 训练时可仅训练`k_step_max`步扩散以节约训练时间,注意,该值必须小于`timesteps`0为训练全部整个扩散模型,**注意,如果不训练整个扩散模型将无法使用仅扩散推理!**
* `k_step_max` : 训练时可仅训练`k_step_max`步扩散以节约训练时间,注意,该值必须小于`timesteps`0为训练整个扩散模型**注意,如果不训练整个扩散模型将无法使用仅扩散模型推理!**
##### **声码器列表**

View File

@ -64,8 +64,8 @@ def spectrogram_torch(y, n_fft, sampling_rate, hop_size, win_size, center=False)
y = y.squeeze(1)
spec = torch.stft(y, n_fft, hop_length=hop_size, win_length=win_size, window=hann_window[wnsize_dtype_device],
center=center, pad_mode='reflect', normalized=False, onesided=True, return_complex=False)
center=center, pad_mode='reflect', normalized=False, onesided=True, return_complex=True)
spec = torch.view_as_real(spec)
spec = torch.sqrt(spec.pow(2).sum(-1) + 1e-6)
return spec