Updata Readme.md

This commit is contained in:
ylzz1997 2023-05-22 23:28:53 +08:00
parent 95ea8a8021
commit 28dd4fa032
3 changed files with 8 additions and 8 deletions

View File

@ -41,10 +41,10 @@ This project is only a framework project, which does not have the function of sp
The singing voice conversion model uses SoftVC content encoder to extract source audio speech features, then the vectors are directly fed into VITS instead of converting to a text based intermediate; thus the pitch and intonations are conserved. Additionally, the vocoder is changed to [NSF HiFiGAN](https://github.com/openvpi/DiffSinger/tree/refactor/modules/nsf_hifigan) to solve the problem of sound interruption. The singing voice conversion model uses SoftVC content encoder to extract source audio speech features, then the vectors are directly fed into VITS instead of converting to a text based intermediate; thus the pitch and intonations are conserved. Additionally, the vocoder is changed to [NSF HiFiGAN](https://github.com/openvpi/DiffSinger/tree/refactor/modules/nsf_hifigan) to solve the problem of sound interruption.
### 🆕 4.0-Vec768-Layer12 Version Update Content ### 🆕 4.1-Stable Version Update Content
- Feature input is changed to [Content Vec](https://github.com/auspicious3000/contentvec) Transformer output of 12 layer, the branch is not compatible with 4.0 model - Feature input is changed to [Content Vec](https://github.com/auspicious3000/contentvec) Transformer output of 12 layer, And compatible with 4.0 branches.
- Update the shallow diffusion, you can use the shallow diffusion model to improve the sound quality - Update the shallow diffusion, you can use the shallow diffusion model to improve the sound quality.
### 🆕 Questions about compatibility with the 4.0 model ### 🆕 Questions about compatibility with the 4.0 model
@ -53,7 +53,7 @@ The singing voice conversion model uses SoftVC content encoder to extract source
``` ```
"model": { "model": {
......... .........
"ssl_dim": 768, "ssl_dim": 256,
"n_speakers": 200, "n_speakers": 200,
"speech_encoder":"vec256l9" "speech_encoder":"vec256l9"
} }

View File

@ -39,9 +39,9 @@
歌声音色转换模型通过SoftVC内容编码器提取源音频语音特征与F0同时输入VITS替换原本的文本输入达到歌声转换的效果。同时更换声码器为 [NSF HiFiGAN](https://github.com/openvpi/DiffSinger/tree/refactor/modules/nsf_hifigan) 解决断音问题 歌声音色转换模型通过SoftVC内容编码器提取源音频语音特征与F0同时输入VITS替换原本的文本输入达到歌声转换的效果。同时更换声码器为 [NSF HiFiGAN](https://github.com/openvpi/DiffSinger/tree/refactor/modules/nsf_hifigan) 解决断音问题
### 🆕 4.0-Vec768-Layer12 版本更新内容 ### 🆕 4.1-Stable 版本更新内容
+ 特征输入更换为 [Content Vec](https://github.com/auspicious3000/contentvec) 的第12层Transformer输出 + 特征输入更换为 [Content Vec](https://github.com/auspicious3000/contentvec) 的第12层Transformer输出并兼容4.0分支
+ 更新浅层扩散,可以使用浅层扩散模型提升音质 + 更新浅层扩散,可以使用浅层扩散模型提升音质
### 🆕 关于兼容4.0模型的问题 ### 🆕 关于兼容4.0模型的问题
@ -51,7 +51,7 @@
``` ```
"model": { "model": {
......... .........
"ssl_dim": 768, "ssl_dim": 256,
"n_speakers": 200, "n_speakers": 200,
"speech_encoder":"vec256l9" "speech_encoder":"vec256l9"
} }

View File

@ -77,7 +77,7 @@
"\n", "\n",
"#@markdown\n", "#@markdown\n",
"\n", "\n",
"!git clone https://github.com/svc-develop-team/so-vits-svc -b 4.0-Vec768-Layer12\n", "!git clone https://github.com/svc-develop-team/so-vits-svc -b 4.1-Stable\n",
"%pip uninstall -y torchdata torchtext\n", "%pip uninstall -y torchdata torchtext\n",
"%pip install --upgrade pip setuptools numpy numba\n", "%pip install --upgrade pip setuptools numpy numba\n",
"%pip install pyworld praat-parselmouth fairseq tensorboardX torchcrepe librosa==0.9.1 pyyaml pynvml pyloudnorm\n", "%pip install pyworld praat-parselmouth fairseq tensorboardX torchcrepe librosa==0.9.1 pyyaml pynvml pyloudnorm\n",