Compare commits
11 Commits
097e4e1ca7
...
58322242ac
Author | SHA1 | Date |
---|---|---|
Miuzarte | 58322242ac | |
红血球AE3803 | 27ef997952 | |
Miuzarte | a0f7a031cb | |
Lengyue | 32cfec751e | |
Miuzarte | 75522a6ede | |
Miuzarte | 6a953317b9 | |
Lengyue | 2854013a8a | |
Miuzarte | f0ada33687 | |
Miuzarte | eb8ef9a305 | |
Miuzarte | 4ce3a869f6 | |
Miuzarte | 1c7b153285 |
|
@ -7,9 +7,9 @@ body:
|
|||
- type: markdown
|
||||
attributes:
|
||||
value: |
|
||||
#### 提问前请先自己去尝试解决,可以借助chatgpt或一些搜索引擎(谷歌/必应/New Bing/StackOverflow等等)。如果实在无法自己解决再发issue,在提issue之前,请先了解《[提问的智慧](https://github.com/ryanhanwu/How-To-Ask-Questions-The-Smart-Way/blob/main/README-zh_CN.md)》
|
||||
#### 提问前请先自己去尝试解决,可以借助chatgpt或一些搜索引擎(谷歌/必应/New Bing/StackOverflow等等)。如果实在无法自己解决再发issue,在提issue之前,请先了解《[提问的智慧](https://github.com/ryanhanwu/How-To-Ask-Questions-The-Smart-Way/blob/main/README-zh_CN.md)》。
|
||||
---
|
||||
### 什么样的issues会被close
|
||||
### 什么样的issue会被直接close
|
||||
1. 伸手党
|
||||
2. 一键包/环境包相关
|
||||
3. 提供的信息不全
|
||||
|
@ -22,11 +22,11 @@ body:
|
|||
attributes:
|
||||
label: 请勾选下方的确认框。
|
||||
options:
|
||||
- label: "我已仔细阅读README.md"
|
||||
- label: "我已仔细阅读README.md。"
|
||||
required: true
|
||||
- label: "我已通过各种搜索引擎排查问题,我要提出的问题并不常见"
|
||||
- label: "我已通过各种搜索引擎排查问题,我要提出的问题并不常见。"
|
||||
required: true
|
||||
- label: "我未在使用由第三方用户提供的一键包/环境包"
|
||||
- label: "我未在使用由第三方用户提供的一键包/环境包。"
|
||||
required: true
|
||||
|
||||
- type: markdown
|
||||
|
@ -98,7 +98,7 @@ body:
|
|||
id: Log
|
||||
attributes:
|
||||
label: 日志
|
||||
description: 从执行命令到执行完毕输出的所有信息
|
||||
description: 从执行命令到执行完毕输出的所有信息(包括你所执行的命令)
|
||||
render: python
|
||||
validations:
|
||||
required: true
|
||||
|
@ -106,7 +106,7 @@ body:
|
|||
- type: textarea
|
||||
id: ValidOneClick
|
||||
attributes:
|
||||
label: 截图`so-vits-svc`文件夹并粘贴到此处
|
||||
label: 截图`so-vits-svc`、`logs/44k`文件夹并粘贴到此处
|
||||
validations:
|
||||
required: true
|
||||
|
|
@ -7,9 +7,9 @@ body:
|
|||
- type: markdown
|
||||
attributes:
|
||||
value: |
|
||||
#### Please try to solve the problem yourself before asking for help,You can use chatgpt or some search engines like google, bing, new bing and StackOverflow until you really find that you can't solve it by yourself. And before you raise an issue, please understand *[How To Ask Questions The Smart Way](http://www.catb.org/~esr/faqs/smart-questions.html)* in advance
|
||||
#### Please try to solve the problem yourself before asking for help. You can use chatgpt or some search engines like google, bing, new bing and StackOverflow until you really find that you can't solve it by yourself. And before you raise an issue, please understand *[How To Ask Questions The Smart Way](http://www.catb.org/~esr/faqs/smart-questions.html)* in advance.
|
||||
---
|
||||
### What kind of issue will be close immediately
|
||||
### What kind of issue will be closed immediately
|
||||
1. Beggars or Free Riders
|
||||
2. One click package / Environment package (Not using `pip install -r requirement.txt`)
|
||||
3. Incomplete information
|
||||
|
@ -22,11 +22,11 @@ body:
|
|||
attributes:
|
||||
label: Please check the checkboxes below.
|
||||
options:
|
||||
- label: "I have read README.md carefully"
|
||||
- label: "I have read README.md carefully."
|
||||
required: true
|
||||
- label: "I have been troubleshooting issues through various search engines. The questions I want to ask are not common"
|
||||
- label: "I have been troubleshooting issues through various search engines. The questions I want to ask are not common."
|
||||
required: true
|
||||
- label: "I am NOT using one click package / environment package"
|
||||
- label: "I am NOT using one click package / environment package."
|
||||
required: true
|
||||
|
||||
- type: markdown
|
||||
|
@ -74,7 +74,7 @@ body:
|
|||
id: DatasetSource
|
||||
attributes:
|
||||
label: Dataset source (Used to judge the dataset quality)
|
||||
description: Like UVR-processed streaming audio / Recorded in recording studio
|
||||
description: Such as UVR-processed streaming audio / Recorded in recording studio
|
||||
validations:
|
||||
required: true
|
||||
|
||||
|
@ -82,7 +82,7 @@ body:
|
|||
id: WhereOccurs
|
||||
attributes:
|
||||
label: Where thr problem occurs or what command you executed
|
||||
description: Like Preprocessing / Training / `python preprocess_hubert_f0.py`
|
||||
description: Such as Preprocessing / Training / `python preprocess_hubert_f0.py`
|
||||
validations:
|
||||
required: true
|
||||
|
||||
|
@ -98,7 +98,7 @@ body:
|
|||
id: Log
|
||||
attributes:
|
||||
label: Log
|
||||
description: All information output from the command you executed to the end of execution
|
||||
description: All information output from the command you executed to the end of execution (include the command)
|
||||
render: python
|
||||
validations:
|
||||
required: true
|
||||
|
@ -106,7 +106,7 @@ body:
|
|||
- type: textarea
|
||||
id: ValidOneClick
|
||||
attributes:
|
||||
label: Screenshot `so-vits-svc` folder and paste here
|
||||
label: Screenshot `so-vits-svc` and `logs/44k` folders and paste here
|
||||
validations:
|
||||
required: true
|
||||
|
|
@ -0,0 +1,7 @@
|
|||
---
|
||||
name: Default issue
|
||||
about: 如果模板中没有你想发起的issue类型,可以选择此项,但这个issue也许会获得一个较低的处理优先级 / If there is no issue type you want to raise, you can start with this one. But this issue maybe will get a lower priority to deal with.
|
||||
title: ''
|
||||
labels: 'not urgent'
|
||||
assignees: ''
|
||||
---
|
|
@ -1,7 +0,0 @@
|
|||
---
|
||||
name: Default issue
|
||||
about: 如果模板中没有你想发起的issue类型,可以选择此项,但这个issue会获得一个较低的处理优先级 / If there is no issue type you want to raise, you can start with this one. But this issue will get a lower priority to deal with.
|
||||
title: ''
|
||||
labels: 'lower priority'
|
||||
assignees: ''
|
||||
---
|
18
README.md
18
README.md
|
@ -61,7 +61,7 @@ Although the pretrained model generally does not cause any copyright problems, p
|
|||
|
||||
Simply place the dataset in the `dataset_raw` directory with the following file structure.
|
||||
|
||||
```shell
|
||||
```
|
||||
dataset_raw
|
||||
├───speaker0
|
||||
│ ├───xxx1-xxx1.wav
|
||||
|
@ -73,15 +73,25 @@ dataset_raw
|
|||
└───xxx7-xxx007.wav
|
||||
```
|
||||
|
||||
You can customize the speaker name.
|
||||
|
||||
```
|
||||
dataset_raw
|
||||
└───suijiSUI
|
||||
├───1.wav
|
||||
├───...
|
||||
└───25788785-20221210-200143-856_01_(Vocals)_0_0.wav
|
||||
```
|
||||
|
||||
## 🛠️ Preprocessing
|
||||
|
||||
1. Resample to 44100hz
|
||||
1. Resample to 44100Hz and mono
|
||||
|
||||
```shell
|
||||
python resample.py
|
||||
```
|
||||
|
||||
2. Automatically split the dataset into training, validation, and test sets, and generate configuration files
|
||||
2. Automatically split the dataset into training and validation sets, and generate configuration files
|
||||
|
||||
```shell
|
||||
python preprocess_flist_config.py
|
||||
|
@ -170,7 +180,7 @@ Use [onnx_export.py](https://github.com/svc-develop-team/so-vits-svc/blob/4.0/on
|
|||
|
||||
Note: For Hubert Onnx models, please use the models provided by MoeSS. Currently, they cannot be exported on their own (Hubert in fairseq has many unsupported operators and things involving constants that can cause errors or result in problems with the input/output shape and results when exported.) [Hubert4.0](https://huggingface.co/NaruseMioShirakana/MoeSS-SUBModel)
|
||||
|
||||
## Previous contributors
|
||||
## ☀️ Previous contributors
|
||||
|
||||
For some reason the author deleted the original repository. Because of the negligence of the organization members, the contributor list was cleared because all files were directly reuploaded to this repository at the beginning of the reconstruction of this repository. Now add a previous contributor list to README.md.
|
||||
|
||||
|
|
|
@ -61,7 +61,7 @@ http://obs.cstcloud.cn/share/obs/sankagenkeshi/checkpoint_best_legacy_500.pt
|
|||
|
||||
仅需要以以下文件结构将数据集放入dataset_raw目录即可
|
||||
|
||||
```shell
|
||||
```
|
||||
dataset_raw
|
||||
├───speaker0
|
||||
│ ├───xxx1-xxx1.wav
|
||||
|
@ -73,15 +73,25 @@ dataset_raw
|
|||
└───xxx7-xxx007.wav
|
||||
```
|
||||
|
||||
可以自定义说话人名称
|
||||
|
||||
```
|
||||
dataset_raw
|
||||
└───suijiSUI
|
||||
├───1.wav
|
||||
├───...
|
||||
└───25788785-20221210-200143-856_01_(Vocals)_0_0.wav
|
||||
```
|
||||
|
||||
## 🛠️ 数据预处理
|
||||
|
||||
1. 重采样至 44100hz
|
||||
1. 重采样至44100Hz单声道
|
||||
|
||||
```shell
|
||||
python resample.py
|
||||
```
|
||||
|
||||
2. 自动划分训练集 验证集 测试集 以及自动生成配置文件
|
||||
2. 自动划分训练集、验证集,以及自动生成配置文件
|
||||
|
||||
```shell
|
||||
python preprocess_flist_config.py
|
||||
|
@ -170,7 +180,7 @@ python inference_main.py -m "logs/44k/G_30400.pth" -c "configs/config.json" -n "
|
|||
+ 注意:Hubert Onnx模型请使用MoeSS提供的模型,目前无法自行导出(fairseq中Hubert有不少onnx不支持的算子和涉及到常量的东西,在导出时会报错或者导出的模型输入输出shape和结果都有问题)
|
||||
[Hubert4.0](https://huggingface.co/NaruseMioShirakana/MoeSS-SUBModel)
|
||||
|
||||
## 旧贡献者
|
||||
## ☀️ 旧贡献者
|
||||
|
||||
因为某些原因原作者进行了删库处理,本仓库重建之初由于组织成员疏忽直接重新上传了所有文件导致以前的contributors全部木大,现在在README里重新添加一个旧贡献者列表
|
||||
|
||||
|
|
|
@ -47,6 +47,8 @@ class TextAudioSpeakerLoader(torch.utils.data.Dataset):
|
|||
audio_norm = audio / self.max_wav_value
|
||||
audio_norm = audio_norm.unsqueeze(0)
|
||||
spec_filename = filename.replace(".wav", ".spec.pt")
|
||||
|
||||
# Ideally, all data generated after Mar 25 should have .spec.pt
|
||||
if os.path.exists(spec_filename):
|
||||
spec = torch.load(spec_filename)
|
||||
else:
|
||||
|
|
|
@ -25,13 +25,11 @@ if __name__ == "__main__":
|
|||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("--train_list", type=str, default="./filelists/train.txt", help="path to train list")
|
||||
parser.add_argument("--val_list", type=str, default="./filelists/val.txt", help="path to val list")
|
||||
parser.add_argument("--test_list", type=str, default="./filelists/test.txt", help="path to test list")
|
||||
parser.add_argument("--source_dir", type=str, default="./dataset/44k", help="path to source dir")
|
||||
args = parser.parse_args()
|
||||
|
||||
train = []
|
||||
val = []
|
||||
test = []
|
||||
idx = 0
|
||||
spk_dict = {}
|
||||
spk_id = 0
|
||||
|
@ -51,13 +49,11 @@ if __name__ == "__main__":
|
|||
new_wavs.append(file)
|
||||
wavs = new_wavs
|
||||
shuffle(wavs)
|
||||
train += wavs[2:-2]
|
||||
train += wavs[2:]
|
||||
val += wavs[:2]
|
||||
test += wavs[-2:]
|
||||
|
||||
shuffle(train)
|
||||
shuffle(val)
|
||||
shuffle(test)
|
||||
|
||||
print("Writing", args.train_list)
|
||||
with open(args.train_list, "w") as f:
|
||||
|
@ -70,12 +66,6 @@ if __name__ == "__main__":
|
|||
for fname in tqdm(val):
|
||||
wavpath = fname
|
||||
f.write(wavpath + "\n")
|
||||
|
||||
print("Writing", args.test_list)
|
||||
with open(args.test_list, "w") as f:
|
||||
for fname in tqdm(test):
|
||||
wavpath = fname
|
||||
f.write(wavpath + "\n")
|
||||
|
||||
config_template["spk"] = spk_dict
|
||||
config_template["model"]["n_speakers"] = spk_id
|
||||
|
|
|
@ -7,10 +7,12 @@ from random import shuffle
|
|||
import torch
|
||||
from glob import glob
|
||||
from tqdm import tqdm
|
||||
from modules.mel_processing import spectrogram_torch
|
||||
|
||||
import utils
|
||||
import logging
|
||||
logging.getLogger('numba').setLevel(logging.WARNING)
|
||||
|
||||
logging.getLogger("numba").setLevel(logging.WARNING)
|
||||
import librosa
|
||||
import numpy as np
|
||||
|
||||
|
@ -29,11 +31,42 @@ def process_one(filename, hmodel):
|
|||
wav16k = torch.from_numpy(wav16k).to(device)
|
||||
c = utils.get_hubert_content(hmodel, wav_16k_tensor=wav16k)
|
||||
torch.save(c.cpu(), soft_path)
|
||||
|
||||
f0_path = filename + ".f0.npy"
|
||||
if not os.path.exists(f0_path):
|
||||
f0 = utils.compute_f0_dio(wav, sampling_rate=sampling_rate, hop_length=hop_length)
|
||||
f0 = utils.compute_f0_dio(
|
||||
wav, sampling_rate=sampling_rate, hop_length=hop_length
|
||||
)
|
||||
np.save(f0_path, f0)
|
||||
|
||||
spec_path = filename.replace(".wav", ".spec.pt")
|
||||
if not os.path.exists(spec_path):
|
||||
# Process spectrogram
|
||||
# The following code can't be replaced by torch.FloatTensor(wav)
|
||||
# because load_wav_to_torch return a tensor that need to be normalized
|
||||
|
||||
audio, sr = utils.load_wav_to_torch(filename)
|
||||
if sr != hps.data.sampling_rate:
|
||||
raise ValueError(
|
||||
"{} SR doesn't match target {} SR".format(
|
||||
sr, hps.data.sampling_rate
|
||||
)
|
||||
)
|
||||
|
||||
audio_norm = audio / hps.data.max_wav_value
|
||||
audio_norm = audio_norm.unsqueeze(0)
|
||||
|
||||
spec = spectrogram_torch(
|
||||
audio_norm,
|
||||
hps.data.filter_length,
|
||||
hps.data.sampling_rate,
|
||||
hps.data.hop_length,
|
||||
hps.data.win_length,
|
||||
center=False,
|
||||
)
|
||||
spec = torch.squeeze(spec, 0)
|
||||
torch.save(spec, spec_path)
|
||||
|
||||
|
||||
def process_batch(filenames):
|
||||
print("Loading hubert for content...")
|
||||
|
@ -46,17 +79,23 @@ def process_batch(filenames):
|
|||
|
||||
if __name__ == "__main__":
|
||||
parser = argparse.ArgumentParser()
|
||||
parser.add_argument("--in_dir", type=str, default="dataset/44k", help="path to input dir")
|
||||
parser.add_argument(
|
||||
"--in_dir", type=str, default="dataset/44k", help="path to input dir"
|
||||
)
|
||||
|
||||
args = parser.parse_args()
|
||||
filenames = glob(f'{args.in_dir}/*/*.wav', recursive=True) # [:10]
|
||||
filenames = glob(f"{args.in_dir}/*/*.wav", recursive=True) # [:10]
|
||||
shuffle(filenames)
|
||||
multiprocessing.set_start_method('spawn',force=True)
|
||||
multiprocessing.set_start_method("spawn", force=True)
|
||||
|
||||
num_processes = 1
|
||||
chunk_size = int(math.ceil(len(filenames) / num_processes))
|
||||
chunks = [filenames[i:i + chunk_size] for i in range(0, len(filenames), chunk_size)]
|
||||
chunks = [
|
||||
filenames[i : i + chunk_size] for i in range(0, len(filenames), chunk_size)
|
||||
]
|
||||
print([len(c) for c in chunks])
|
||||
processes = [multiprocessing.Process(target=process_batch, args=(chunk,)) for chunk in chunks]
|
||||
processes = [
|
||||
multiprocessing.Process(target=process_batch, args=(chunk,)) for chunk in chunks
|
||||
]
|
||||
for p in processes:
|
||||
p.start()
|
||||
|
|
22
spec_gen.py
22
spec_gen.py
|
@ -1,22 +0,0 @@
|
|||
from data_utils import TextAudioSpeakerLoader
|
||||
import json
|
||||
from tqdm import tqdm
|
||||
|
||||
from utils import HParams
|
||||
|
||||
config_path = 'configs/config.json'
|
||||
with open(config_path, "r") as f:
|
||||
data = f.read()
|
||||
config = json.loads(data)
|
||||
hps = HParams(**config)
|
||||
|
||||
train_dataset = TextAudioSpeakerLoader("filelists/train.txt", hps)
|
||||
test_dataset = TextAudioSpeakerLoader("filelists/test.txt", hps)
|
||||
eval_dataset = TextAudioSpeakerLoader("filelists/val.txt", hps)
|
||||
|
||||
for _ in tqdm(train_dataset):
|
||||
pass
|
||||
for _ in tqdm(eval_dataset):
|
||||
pass
|
||||
for _ in tqdm(test_dataset):
|
||||
pass
|
Loading…
Reference in New Issue