Compare commits

...

11 Commits

Author SHA1 Message Date
Miuzarte 58322242ac Update README.md 2023-03-24 16:59:47 +08:00
红血球AE3803 27ef997952
Merge pull request #83 from svc-develop-team/optimize-some-code
删除了一些无意义代码
2023-03-24 14:46:45 +09:00
Miuzarte a0f7a031cb Update README.md 2023-03-24 13:42:38 +08:00
Lengyue 32cfec751e
remove redundent spec_gen and fix related bug 2023-03-24 01:00:14 -04:00
Miuzarte 75522a6ede Update issues template 2023-03-24 12:58:22 +08:00
Miuzarte 6a953317b9 Update issues template 2023-03-24 12:47:31 +08:00
Lengyue 2854013a8a
rm test dataset that is never used 2023-03-24 00:43:29 -04:00
Miuzarte f0ada33687 Update issues template 2023-03-24 12:41:56 +08:00
Miuzarte eb8ef9a305 Update issues template 2023-03-24 12:36:19 +08:00
Miuzarte 4ce3a869f6 Update issues template 2023-03-24 12:33:38 +08:00
Miuzarte 1c7b153285 Update issues template 2023-03-24 12:27:43 +08:00
10 changed files with 100 additions and 71 deletions

View File

@ -7,9 +7,9 @@ body:
- type: markdown
attributes:
value: |
#### 提问前请先自己去尝试解决可以借助chatgpt或一些搜索引擎谷歌/必应/New Bing/StackOverflow等等。如果实在无法自己解决再发issue在提issue之前请先了解《[提问的智慧](https://github.com/ryanhanwu/How-To-Ask-Questions-The-Smart-Way/blob/main/README-zh_CN.md)》
#### 提问前请先自己去尝试解决可以借助chatgpt或一些搜索引擎谷歌/必应/New Bing/StackOverflow等等。如果实在无法自己解决再发issue在提issue之前请先了解《[提问的智慧](https://github.com/ryanhanwu/How-To-Ask-Questions-The-Smart-Way/blob/main/README-zh_CN.md)》
---
### 什么样的issues会被close
### 什么样的issue会被直接close
1. 伸手党
2. 一键包/环境包相关
3. 提供的信息不全
@ -22,11 +22,11 @@ body:
attributes:
label: 请勾选下方的确认框。
options:
- label: "我已仔细阅读README.md"
- label: "我已仔细阅读README.md"
required: true
- label: "我已通过各种搜索引擎排查问题,我要提出的问题并不常见"
- label: "我已通过各种搜索引擎排查问题,我要提出的问题并不常见"
required: true
- label: "我未在使用由第三方用户提供的一键包/环境包"
- label: "我未在使用由第三方用户提供的一键包/环境包"
required: true
- type: markdown
@ -98,7 +98,7 @@ body:
id: Log
attributes:
label: 日志
description: 从执行命令到执行完毕输出的所有信息
description: 从执行命令到执行完毕输出的所有信息(包括你所执行的命令)
render: python
validations:
required: true
@ -106,7 +106,7 @@ body:
- type: textarea
id: ValidOneClick
attributes:
label: 截图`so-vits-svc`文件夹并粘贴到此处
label: 截图`so-vits-svc`、`logs/44k`文件夹并粘贴到此处
validations:
required: true

View File

@ -7,9 +7,9 @@ body:
- type: markdown
attributes:
value: |
#### Please try to solve the problem yourself before asking for helpYou can use chatgpt or some search engines like google, bing, new bing and StackOverflow until you really find that you can't solve it by yourself. And before you raise an issue, please understand *[How To Ask Questions The Smart Way](http://www.catb.org/~esr/faqs/smart-questions.html)* in advance
#### Please try to solve the problem yourself before asking for help. You can use chatgpt or some search engines like google, bing, new bing and StackOverflow until you really find that you can't solve it by yourself. And before you raise an issue, please understand *[How To Ask Questions The Smart Way](http://www.catb.org/~esr/faqs/smart-questions.html)* in advance.
---
### What kind of issue will be close immediately
### What kind of issue will be closed immediately
1. Beggars or Free Riders
2. One click package / Environment package (Not using `pip install -r requirement.txt`)
3. Incomplete information
@ -22,11 +22,11 @@ body:
attributes:
label: Please check the checkboxes below.
options:
- label: "I have read README.md carefully"
- label: "I have read README.md carefully."
required: true
- label: "I have been troubleshooting issues through various search engines. The questions I want to ask are not common"
- label: "I have been troubleshooting issues through various search engines. The questions I want to ask are not common."
required: true
- label: "I am NOT using one click package / environment package"
- label: "I am NOT using one click package / environment package."
required: true
- type: markdown
@ -74,7 +74,7 @@ body:
id: DatasetSource
attributes:
label: Dataset source (Used to judge the dataset quality)
description: Like UVR-processed streaming audio / Recorded in recording studio
description: Such as UVR-processed streaming audio / Recorded in recording studio
validations:
required: true
@ -82,7 +82,7 @@ body:
id: WhereOccurs
attributes:
label: Where thr problem occurs or what command you executed
description: Like Preprocessing / Training / `python preprocess_hubert_f0.py`
description: Such as Preprocessing / Training / `python preprocess_hubert_f0.py`
validations:
required: true
@ -98,7 +98,7 @@ body:
id: Log
attributes:
label: Log
description: All information output from the command you executed to the end of execution
description: All information output from the command you executed to the end of execution (include the command)
render: python
validations:
required: true
@ -106,7 +106,7 @@ body:
- type: textarea
id: ValidOneClick
attributes:
label: Screenshot `so-vits-svc` folder and paste here
label: Screenshot `so-vits-svc` and `logs/44k` folders and paste here
validations:
required: true

7
.github/ISSUE_TEMPLATE/default.md vendored Normal file
View File

@ -0,0 +1,7 @@
---
name: Default issue
about: 如果模板中没有你想发起的issue类型可以选择此项但这个issue也许会获得一个较低的处理优先级 / If there is no issue type you want to raise, you can start with this one. But this issue maybe will get a lower priority to deal with.
title: ''
labels: 'not urgent'
assignees: ''
---

View File

@ -1,7 +0,0 @@
---
name: Default issue
about: 如果模板中没有你想发起的issue类型可以选择此项但这个issue会获得一个较低的处理优先级 / If there is no issue type you want to raise, you can start with this one. But this issue will get a lower priority to deal with.
title: ''
labels: 'lower priority'
assignees: ''
---

View File

@ -61,7 +61,7 @@ Although the pretrained model generally does not cause any copyright problems, p
Simply place the dataset in the `dataset_raw` directory with the following file structure.
```shell
```
dataset_raw
├───speaker0
│ ├───xxx1-xxx1.wav
@ -73,15 +73,25 @@ dataset_raw
└───xxx7-xxx007.wav
```
You can customize the speaker name.
```
dataset_raw
└───suijiSUI
├───1.wav
├───...
└───25788785-20221210-200143-856_01_(Vocals)_0_0.wav
```
## 🛠️ Preprocessing
1. Resample to 44100hz
1. Resample to 44100Hz and mono
```shell
python resample.py
```
2. Automatically split the dataset into training, validation, and test sets, and generate configuration files
2. Automatically split the dataset into training and validation sets, and generate configuration files
```shell
python preprocess_flist_config.py
@ -170,7 +180,7 @@ Use [onnx_export.py](https://github.com/svc-develop-team/so-vits-svc/blob/4.0/on
Note: For Hubert Onnx models, please use the models provided by MoeSS. Currently, they cannot be exported on their own (Hubert in fairseq has many unsupported operators and things involving constants that can cause errors or result in problems with the input/output shape and results when exported.) [Hubert4.0](https://huggingface.co/NaruseMioShirakana/MoeSS-SUBModel)
## Previous contributors
## ☀️ Previous contributors
For some reason the author deleted the original repository. Because of the negligence of the organization members, the contributor list was cleared because all files were directly reuploaded to this repository at the beginning of the reconstruction of this repository. Now add a previous contributor list to README.md.

View File

@ -61,7 +61,7 @@ http://obs.cstcloud.cn/share/obs/sankagenkeshi/checkpoint_best_legacy_500.pt
仅需要以以下文件结构将数据集放入dataset_raw目录即可
```shell
```
dataset_raw
├───speaker0
│ ├───xxx1-xxx1.wav
@ -73,15 +73,25 @@ dataset_raw
└───xxx7-xxx007.wav
```
可以自定义说话人名称
```
dataset_raw
└───suijiSUI
├───1.wav
├───...
└───25788785-20221210-200143-856_01_(Vocals)_0_0.wav
```
## 🛠️ 数据预处理
1. 重采样至 44100hz
1. 重采样至44100Hz单声道
```shell
python resample.py
```
2. 自动划分训练集 验证集 测试集 以及自动生成配置文件
2. 自动划分训练集、验证集,以及自动生成配置文件
```shell
python preprocess_flist_config.py
@ -170,7 +180,7 @@ python inference_main.py -m "logs/44k/G_30400.pth" -c "configs/config.json" -n "
+ 注意Hubert Onnx模型请使用MoeSS提供的模型目前无法自行导出fairseq中Hubert有不少onnx不支持的算子和涉及到常量的东西在导出时会报错或者导出的模型输入输出shape和结果都有问题
[Hubert4.0](https://huggingface.co/NaruseMioShirakana/MoeSS-SUBModel)
## 旧贡献者
## ☀️ 旧贡献者
因为某些原因原作者进行了删库处理本仓库重建之初由于组织成员疏忽直接重新上传了所有文件导致以前的contributors全部木大现在在README里重新添加一个旧贡献者列表

View File

@ -47,6 +47,8 @@ class TextAudioSpeakerLoader(torch.utils.data.Dataset):
audio_norm = audio / self.max_wav_value
audio_norm = audio_norm.unsqueeze(0)
spec_filename = filename.replace(".wav", ".spec.pt")
# Ideally, all data generated after Mar 25 should have .spec.pt
if os.path.exists(spec_filename):
spec = torch.load(spec_filename)
else:

View File

@ -25,13 +25,11 @@ if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--train_list", type=str, default="./filelists/train.txt", help="path to train list")
parser.add_argument("--val_list", type=str, default="./filelists/val.txt", help="path to val list")
parser.add_argument("--test_list", type=str, default="./filelists/test.txt", help="path to test list")
parser.add_argument("--source_dir", type=str, default="./dataset/44k", help="path to source dir")
args = parser.parse_args()
train = []
val = []
test = []
idx = 0
spk_dict = {}
spk_id = 0
@ -51,13 +49,11 @@ if __name__ == "__main__":
new_wavs.append(file)
wavs = new_wavs
shuffle(wavs)
train += wavs[2:-2]
train += wavs[2:]
val += wavs[:2]
test += wavs[-2:]
shuffle(train)
shuffle(val)
shuffle(test)
print("Writing", args.train_list)
with open(args.train_list, "w") as f:
@ -70,12 +66,6 @@ if __name__ == "__main__":
for fname in tqdm(val):
wavpath = fname
f.write(wavpath + "\n")
print("Writing", args.test_list)
with open(args.test_list, "w") as f:
for fname in tqdm(test):
wavpath = fname
f.write(wavpath + "\n")
config_template["spk"] = spk_dict
config_template["model"]["n_speakers"] = spk_id

View File

@ -7,10 +7,12 @@ from random import shuffle
import torch
from glob import glob
from tqdm import tqdm
from modules.mel_processing import spectrogram_torch
import utils
import logging
logging.getLogger('numba').setLevel(logging.WARNING)
logging.getLogger("numba").setLevel(logging.WARNING)
import librosa
import numpy as np
@ -29,11 +31,42 @@ def process_one(filename, hmodel):
wav16k = torch.from_numpy(wav16k).to(device)
c = utils.get_hubert_content(hmodel, wav_16k_tensor=wav16k)
torch.save(c.cpu(), soft_path)
f0_path = filename + ".f0.npy"
if not os.path.exists(f0_path):
f0 = utils.compute_f0_dio(wav, sampling_rate=sampling_rate, hop_length=hop_length)
f0 = utils.compute_f0_dio(
wav, sampling_rate=sampling_rate, hop_length=hop_length
)
np.save(f0_path, f0)
spec_path = filename.replace(".wav", ".spec.pt")
if not os.path.exists(spec_path):
# Process spectrogram
# The following code can't be replaced by torch.FloatTensor(wav)
# because load_wav_to_torch return a tensor that need to be normalized
audio, sr = utils.load_wav_to_torch(filename)
if sr != hps.data.sampling_rate:
raise ValueError(
"{} SR doesn't match target {} SR".format(
sr, hps.data.sampling_rate
)
)
audio_norm = audio / hps.data.max_wav_value
audio_norm = audio_norm.unsqueeze(0)
spec = spectrogram_torch(
audio_norm,
hps.data.filter_length,
hps.data.sampling_rate,
hps.data.hop_length,
hps.data.win_length,
center=False,
)
spec = torch.squeeze(spec, 0)
torch.save(spec, spec_path)
def process_batch(filenames):
print("Loading hubert for content...")
@ -46,17 +79,23 @@ def process_batch(filenames):
if __name__ == "__main__":
parser = argparse.ArgumentParser()
parser.add_argument("--in_dir", type=str, default="dataset/44k", help="path to input dir")
parser.add_argument(
"--in_dir", type=str, default="dataset/44k", help="path to input dir"
)
args = parser.parse_args()
filenames = glob(f'{args.in_dir}/*/*.wav', recursive=True) # [:10]
filenames = glob(f"{args.in_dir}/*/*.wav", recursive=True) # [:10]
shuffle(filenames)
multiprocessing.set_start_method('spawn',force=True)
multiprocessing.set_start_method("spawn", force=True)
num_processes = 1
chunk_size = int(math.ceil(len(filenames) / num_processes))
chunks = [filenames[i:i + chunk_size] for i in range(0, len(filenames), chunk_size)]
chunks = [
filenames[i : i + chunk_size] for i in range(0, len(filenames), chunk_size)
]
print([len(c) for c in chunks])
processes = [multiprocessing.Process(target=process_batch, args=(chunk,)) for chunk in chunks]
processes = [
multiprocessing.Process(target=process_batch, args=(chunk,)) for chunk in chunks
]
for p in processes:
p.start()

View File

@ -1,22 +0,0 @@
from data_utils import TextAudioSpeakerLoader
import json
from tqdm import tqdm
from utils import HParams
config_path = 'configs/config.json'
with open(config_path, "r") as f:
data = f.read()
config = json.loads(data)
hps = HParams(**config)
train_dataset = TextAudioSpeakerLoader("filelists/train.txt", hps)
test_dataset = TextAudioSpeakerLoader("filelists/test.txt", hps)
eval_dataset = TextAudioSpeakerLoader("filelists/val.txt", hps)
for _ in tqdm(train_dataset):
pass
for _ in tqdm(eval_dataset):
pass
for _ in tqdm(test_dataset):
pass