From 28dd4fa032a8382db6159cc1f4a9256ed9d6d6e8 Mon Sep 17 00:00:00 2001
From: ylzz1997 <ylzz1997@outlook.com>
Date: Mon, 22 May 2023 23:28:53 +0800
Subject: [PATCH] Updata Readme.md

---
 README.md               | 8 ++++----
 README_zh_CN.md         | 6 +++---
 sovits4_for_colab.ipynb | 2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/README.md b/README.md
index d047f48..dcc026a 100644
--- a/README.md
+++ b/README.md
@@ -41,10 +41,10 @@ This project is only a framework project, which does not have the function of sp
 
 The singing voice conversion model uses SoftVC content encoder to extract source audio speech features, then the vectors are directly fed into VITS instead of converting to a text based intermediate; thus the pitch and intonations are conserved. Additionally, the vocoder is changed to [NSF HiFiGAN](https://github.com/openvpi/DiffSinger/tree/refactor/modules/nsf_hifigan) to solve the problem of sound interruption.
 
-### 🆕 4.0-Vec768-Layer12 Version Update Content
+### 🆕 4.1-Stable Version Update Content
 
-- Feature input is changed to [Content Vec](https://github.com/auspicious3000/contentvec) Transformer output of 12 layer, the branch is not compatible with 4.0 model
-- Update the shallow diffusion, you can use the shallow diffusion model to improve the sound quality
+- Feature input is changed to [Content Vec](https://github.com/auspicious3000/contentvec) Transformer output of 12 layer, And compatible with 4.0 branches.
+- Update the shallow diffusion, you can use the shallow diffusion model to improve the sound quality.
   
 ### 🆕 Questions about compatibility with the 4.0 model
 
@@ -53,7 +53,7 @@ The singing voice conversion model uses SoftVC content encoder to extract source
 ```
   "model": {
     .........
-    "ssl_dim": 768,
+    "ssl_dim": 256,
     "n_speakers": 200,
     "speech_encoder":"vec256l9"
   }
diff --git a/README_zh_CN.md b/README_zh_CN.md
index e1c258f..e41c00a 100644
--- a/README_zh_CN.md
+++ b/README_zh_CN.md
@@ -39,9 +39,9 @@
 
 歌声音色转换模型，通过SoftVC内容编码器提取源音频语音特征，与F0同时输入VITS替换原本的文本输入达到歌声转换的效果。同时，更换声码器为 [NSF HiFiGAN](https://github.com/openvpi/DiffSinger/tree/refactor/modules/nsf_hifigan) 解决断音问题
 
-### 🆕 4.0-Vec768-Layer12 版本更新内容
+### 🆕 4.1-Stable 版本更新内容
 
-+ 特征输入更换为 [Content Vec](https://github.com/auspicious3000/contentvec) 的第12层Transformer输出
++ 特征输入更换为 [Content Vec](https://github.com/auspicious3000/contentvec) 的第12层Transformer输出，并兼容4.0分支
 + 更新浅层扩散，可以使用浅层扩散模型提升音质
 
 ### 🆕 关于兼容4.0模型的问题
@@ -51,7 +51,7 @@
 ```
   "model": {
     .........
-    "ssl_dim": 768,
+    "ssl_dim": 256,
     "n_speakers": 200,
     "speech_encoder":"vec256l9"
   }
diff --git a/sovits4_for_colab.ipynb b/sovits4_for_colab.ipynb
index 850409a..394bfd4 100644
--- a/sovits4_for_colab.ipynb
+++ b/sovits4_for_colab.ipynb
@@ -77,7 +77,7 @@
     "\n",
     "#@markdown\n",
     "\n",
-    "!git clone https://github.com/svc-develop-team/so-vits-svc -b 4.0-Vec768-Layer12\n",
+    "!git clone https://github.com/svc-develop-team/so-vits-svc -b 4.1-Stable\n",
     "%pip uninstall -y torchdata torchtext\n",
     "%pip install --upgrade pip setuptools numpy numba\n",
     "%pip install pyworld praat-parselmouth fairseq tensorboardX torchcrepe librosa==0.9.1 pyyaml pynvml pyloudnorm\n",