Updata Readme.md
This commit is contained in:
parent
95ea8a8021
commit
28dd4fa032
|
@ -41,10 +41,10 @@ This project is only a framework project, which does not have the function of sp
|
||||||
|
|
||||||
The singing voice conversion model uses SoftVC content encoder to extract source audio speech features, then the vectors are directly fed into VITS instead of converting to a text based intermediate; thus the pitch and intonations are conserved. Additionally, the vocoder is changed to [NSF HiFiGAN](https://github.com/openvpi/DiffSinger/tree/refactor/modules/nsf_hifigan) to solve the problem of sound interruption.
|
The singing voice conversion model uses SoftVC content encoder to extract source audio speech features, then the vectors are directly fed into VITS instead of converting to a text based intermediate; thus the pitch and intonations are conserved. Additionally, the vocoder is changed to [NSF HiFiGAN](https://github.com/openvpi/DiffSinger/tree/refactor/modules/nsf_hifigan) to solve the problem of sound interruption.
|
||||||
|
|
||||||
### 🆕 4.0-Vec768-Layer12 Version Update Content
|
### 🆕 4.1-Stable Version Update Content
|
||||||
|
|
||||||
- Feature input is changed to [Content Vec](https://github.com/auspicious3000/contentvec) Transformer output of 12 layer, the branch is not compatible with 4.0 model
|
- Feature input is changed to [Content Vec](https://github.com/auspicious3000/contentvec) Transformer output of 12 layer, And compatible with 4.0 branches.
|
||||||
- Update the shallow diffusion, you can use the shallow diffusion model to improve the sound quality
|
- Update the shallow diffusion, you can use the shallow diffusion model to improve the sound quality.
|
||||||
|
|
||||||
### 🆕 Questions about compatibility with the 4.0 model
|
### 🆕 Questions about compatibility with the 4.0 model
|
||||||
|
|
||||||
|
@ -53,7 +53,7 @@ The singing voice conversion model uses SoftVC content encoder to extract source
|
||||||
```
|
```
|
||||||
"model": {
|
"model": {
|
||||||
.........
|
.........
|
||||||
"ssl_dim": 768,
|
"ssl_dim": 256,
|
||||||
"n_speakers": 200,
|
"n_speakers": 200,
|
||||||
"speech_encoder":"vec256l9"
|
"speech_encoder":"vec256l9"
|
||||||
}
|
}
|
||||||
|
|
|
@ -39,9 +39,9 @@
|
||||||
|
|
||||||
歌声音色转换模型,通过SoftVC内容编码器提取源音频语音特征,与F0同时输入VITS替换原本的文本输入达到歌声转换的效果。同时,更换声码器为 [NSF HiFiGAN](https://github.com/openvpi/DiffSinger/tree/refactor/modules/nsf_hifigan) 解决断音问题
|
歌声音色转换模型,通过SoftVC内容编码器提取源音频语音特征,与F0同时输入VITS替换原本的文本输入达到歌声转换的效果。同时,更换声码器为 [NSF HiFiGAN](https://github.com/openvpi/DiffSinger/tree/refactor/modules/nsf_hifigan) 解决断音问题
|
||||||
|
|
||||||
### 🆕 4.0-Vec768-Layer12 版本更新内容
|
### 🆕 4.1-Stable 版本更新内容
|
||||||
|
|
||||||
+ 特征输入更换为 [Content Vec](https://github.com/auspicious3000/contentvec) 的第12层Transformer输出
|
+ 特征输入更换为 [Content Vec](https://github.com/auspicious3000/contentvec) 的第12层Transformer输出,并兼容4.0分支
|
||||||
+ 更新浅层扩散,可以使用浅层扩散模型提升音质
|
+ 更新浅层扩散,可以使用浅层扩散模型提升音质
|
||||||
|
|
||||||
### 🆕 关于兼容4.0模型的问题
|
### 🆕 关于兼容4.0模型的问题
|
||||||
|
@ -51,7 +51,7 @@
|
||||||
```
|
```
|
||||||
"model": {
|
"model": {
|
||||||
.........
|
.........
|
||||||
"ssl_dim": 768,
|
"ssl_dim": 256,
|
||||||
"n_speakers": 200,
|
"n_speakers": 200,
|
||||||
"speech_encoder":"vec256l9"
|
"speech_encoder":"vec256l9"
|
||||||
}
|
}
|
||||||
|
|
|
@ -77,7 +77,7 @@
|
||||||
"\n",
|
"\n",
|
||||||
"#@markdown\n",
|
"#@markdown\n",
|
||||||
"\n",
|
"\n",
|
||||||
"!git clone https://github.com/svc-develop-team/so-vits-svc -b 4.0-Vec768-Layer12\n",
|
"!git clone https://github.com/svc-develop-team/so-vits-svc -b 4.1-Stable\n",
|
||||||
"%pip uninstall -y torchdata torchtext\n",
|
"%pip uninstall -y torchdata torchtext\n",
|
||||||
"%pip install --upgrade pip setuptools numpy numba\n",
|
"%pip install --upgrade pip setuptools numpy numba\n",
|
||||||
"%pip install pyworld praat-parselmouth fairseq tensorboardX torchcrepe librosa==0.9.1 pyyaml pynvml pyloudnorm\n",
|
"%pip install pyworld praat-parselmouth fairseq tensorboardX torchcrepe librosa==0.9.1 pyyaml pynvml pyloudnorm\n",
|
||||||
|
|
Loading…
Reference in New Issue