|
|
|
@ -17,7 +17,7 @@
|
|
|
|
|
|
|
|
|
|
## Model Introduction
|
|
|
|
|
|
|
|
|
|
The singing voice conversion model uses SoftVC content encoder to extract source audio speech features, and inputs them together with F0 into VITS instead of the original text input to achieve the effect of song conversion. At the same time, the vocoder is changed to [NSF HiFiGAN](https://github.com/openvpi/DiffSinger/tree/refactor/modules/nsf_hifigan) to solve the problem of sound interruption.
|
|
|
|
|
The singing voice conversion model uses SoftVC content encoder to extract source audio speech features, then the vectors are directly fed into VITS instead of converting to a text based intermediate; thus the pitch and intonations are conserved. Additionally, the vocoder is changed to [NSF HiFiGAN](https://github.com/openvpi/DiffSinger/tree/refactor/modules/nsf_hifigan) to solve the problem of sound interruption.
|
|
|
|
|
|
|
|
|
|
### 4.0 Version Update Content
|
|
|
|
|
|
|
|
|
@ -145,7 +145,7 @@ The existing steps before clustering do not need to be changed. All you need to
|
|
|
|
|
- Specify "cluster_model_path" in inference_main.
|
|
|
|
|
- Specify "cluster_infer_ratio" in inference_main, where 0 means not using clustering at all, 1 means only using clustering, and usually 0.5 is sufficient.
|
|
|
|
|
|
|
|
|
|
### [![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1kv-3y2DmZo0uya8pEr1xk7cSB-4e_Pct?usp=sharing) [sovits4 for colab.ipynb](https://colab.research.google.com/drive/1kv-3y2DmZo0uya8pEr1xk7cSB-4e_Pct?usp=sharing)
|
|
|
|
|
### [![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/drive/1kv-3y2DmZo0uya8pEr1xk7cSB-4e_Pct?usp=sharing) [sovits4_for_colab.ipynb](https://colab.research.google.com/drive/1kv-3y2DmZo0uya8pEr1xk7cSB-4e_Pct?usp=sharing)
|
|
|
|
|
|
|
|
|
|
## Exporting to Onnx
|
|
|
|
|
|
|
|
|
|