Data Package for TTS

These details have not been verified by PyPI

Project links

Homepage

Project description

Dodrio

Data format designed for TTS training

数据准备

首先需要准备含有wav或mp3格式音频的文件夹

test_data_dir = '/home/jovyan/chenyixiang/workspace/20250324_dodrio/testdata/origin_data/test_data'

确定输出文件夹路径

import os
outdir = '/home/jovyan/chenyixiang/workspace/20250324_dodrio/testdata/testout'
dataname = 'test'
stockdir = outdir + '/stockdir'
usagedir = outdir + '/usagedir'
parquet_dir = os.path.join(stockdir, dataname, 'parquet_dir')
pack_dir = os.path.join(usagedir, dataname, 'pack_dir')

parquet_dir 为 parquet 格式文件夹 pack_dir 为package格式文件夹 info_outdir 为text和spk等信息的存储文件夹

数据打包

import dodrio

# 输入 test_data_dir 生成 parquet 数据包
dodrio.gen_parquet(test_data_dir, parquet_dir, mid_name=dataname, file_type='wav')

# 输入 test_data_dir 生成 package 数据包 注意需要指定 采样率，pack会统一音频采样率存储
dodrio.gen_package(test_data_dir, pack_dir, mid_name=dataname, target_sample_rate=48000, file_type='wav')

# 也可以通过parquet数据格式生成package数据包
dodrio.parquet2package(parquet_dir, pack_dir, sample_rate=48000)

还原音频

将数据包中数据还原成音频

reout = '/home/jovyan/chenyixiang/workspace/20250324_dodrio/testdata/reout'
re_paruquet = os.path.join(reout, 'reparquet_dir')
re_pack = os.path.join(reout, 'repack_dir')

# parquet还原音频是还原成原始的格式，比如之前是mp3还原后还是mp3，且采样率这些不变
dodrio.parquet2wav(parquet_dir, re_paruquet)

# package 还原音频只会还原成 特定采样率的wav，对应采样率在一开始打包的时候已经设定好，且比特率和通道这些都固定
dodrio.package2wav(pack_dir, re_pack)

打包文本信息

打包存储 text 等信息

# parquet
info_type = 'libritts'
info_outdir = os.path.join(stockdir, dataname, 'info_dir')
dodrio.gen_infodir(parquet_dir, test_data_dir, info_outdir, info_type, kl=['text', 'unnorm_text'], lang='en', from_type='parquet')

# package
info_type = 'libritts'
pack_info_outdir = os.path.join(usagedir, dataname, 'info_dir')
dodrio.gen_infodir(pack_dir, test_data_dir, pack_info_outdir, info_type, kl=['text', 'unnorm_text'], lang='en', from_type='pack')

parquet 和 package 调用的函数相同，都是 gen_infodir 。

需要注意的是因为原始数据的存储方式千奇百怪，且文本不会按照唯一方式存储，所以调用的访问函数实际是不同的，这里预设了几种数据类型，比如上面的代码中就是从 libritts里加载数据格式

函数中的参数第一个 parquet_dir 为打包好的音频数据文件夹，这里主要是为了和打包数据分块列表一致所以载入；第二个参数 test_data_dir为文本等信息存储的文件夹；第三个参数 info_outdir 为 info的输出文件夹。

参数 info_type 为指定数据类型，目前只支持几种特定数据类型。参数 kl 是keys list 这是因为有时文本有不同版本的文本，所以在这里设定一个帮助参数，参数 lang 为language的默认值，数据文件有时会不带语种标签，在这里可以硬指定。

特征提取

可以用 extract_feat 提取特征并存储

dodrio.extract_feat(extractor_func, featname, input_dir, out_dir, from_type, **params)

extractor_func 为特征提取函数， featname 为对应特征名， input_dir为对应 package数据包，out_dir为输出文件夹， from_type为输入的数据包类型（目前仅支持 package）；params为特征提取所需额外参数

以cosyvoice的embedding举例，目前内置了对应的特征提取函数

# 预设模型加载
from dodrio.afeat.exp_fun import extractor_embedding
tt = extractor_embedding(onnx_path)
extractor_func = tt.extractor

from_type = 'package'
featname = 'embed'
input_dir = pack_dir
out_dir = os.path.join(usagedir, dataname, featname+'_dir')

# 准备需要的额外参数 utt2spk
utt2spk = dodrio.get_utt2spk(info_outdir)

# 提取 embedding
dodrio.extract_feat(extractor_func, featname, input_dir, out_dir, from_type, utt2spk=utt2spk)

# 根据 spk 计算 spk embedding 均值
tt.mean_spk_embedding()

# 提取 spk 的平均 embedding
featname = 'spkembed'
input_dir = pack_dir
out_dir = os.path.join(usagedir, dataname, featname+'_dir')
extractor_func = tt.spk_embedding_save
dodrio.extract_feat(extractor_func, featname, input_dir, out_dir, from_type, utt2spk=utt2spk)

准备训练所需列表

目前也有预设的列表准备版本

# supdir_list 可以包含多个数据包 目录
supdir_list = [os.path.join(usagedir, dataname)]
listoutdir= 'listoutdir'
# featlist 为需要添加的特征
featlist= ['embed', 'spkembed', 'speechtoken']

# check_func 为数据筛选函数， prefix 为 数据表名前缀
dodrio.gen_datalist(supdir_list, listoutdir, featlist, dodrio.check_func, prefix='test')

数据读取

以上面表格为例，加载单条数据可以通过 load_data_from_line 得到

infoline = '296_142727_000010_000000|/home/jovyan/chenyixiang/workspace/20250324_dodrio/testdata/testout/usagedir/test/pack_dir/wav_test_00000.pack|119946232|121158716|4|This reduction, if admitted, would much facilitate the introduction of emotion into our system, which, being founded on the distinction between the consciousness and the object, is likewise an intellectualist system.|en|embed|/home/jovyan/chenyixiang/workspace/20250324_dodrio/testdata/testout/usagedir/test/embed_dir/wav_test_00000.embed|135168|135936|192|spkembed|/home/jovyan/chenyixiang/workspace/20250324_dodrio/testdata/testout/usagedir/test/spkembed_dir/wav_test_00000.spkembed|135168|135936|192|speechtoken|/home/jovyan/chenyixiang/workspace/20250324_dodrio/testdata/testout/usagedir/test/speechtoken_dir/wav_test_00000.speechtoken|250040|252568|632'

data_dict = dodrio.load_data_from_line(infoline)
# data_dict.keys()
# dict_keys(['uttid', 'audio', 'spkid', 'text', 'language', 'embed', 'spkembed', 'speechtoken'])

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.3.9

Mar 9, 2026

0.3.8

Nov 11, 2025

0.3.7

Nov 10, 2025

0.3.6

Aug 18, 2025

0.3.5

Aug 18, 2025

0.3.4

May 12, 2025

0.3.2

Apr 22, 2025

0.3.1

Apr 21, 2025

0.2.1

Apr 3, 2025

0.1.2

Apr 1, 2025

This version

0.1.1

Mar 31, 2025

0.0.1

Mar 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dodrio-0.1.1.tar.gz (12.3 kB view details)

Uploaded Mar 31, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dodrio-0.1.1-py3-none-any.whl (11.9 kB view details)

Uploaded Mar 31, 2025 Python 3

File details

Details for the file dodrio-0.1.1.tar.gz.

File metadata

Download URL: dodrio-0.1.1.tar.gz
Upload date: Mar 31, 2025
Size: 12.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.14

File hashes

Hashes for dodrio-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`1ab00564851493485fce6d9f0e45860fa65e4e635541bce889a43a2d33448e21`
MD5	`65bc1a56eb93fa46c520d40d5f1b0d8c`
BLAKE2b-256	`69d2d5a4629283cc54b11c97916bf100d0326b70965a1eeb48d3b8e0aa7323d5`

See more details on using hashes here.

File details

Details for the file dodrio-0.1.1-py3-none-any.whl.

File metadata

Download URL: dodrio-0.1.1-py3-none-any.whl
Upload date: Mar 31, 2025
Size: 11.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.14

File hashes

Hashes for dodrio-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1e98849c9d1c2d248bc3c8bbddd687d72896f368079a8c9dbcfaaa411606e26c`
MD5	`84e5d3873975754d0c8dffa87578e637`
BLAKE2b-256	`4b516818d45497dc380f4a32b8435d07e004e08c7e58ac969c0da8b9729fa1e7`

See more details on using hashes here.

dodrio 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Dodrio

数据准备

数据打包

还原音频

打包文本信息

特征提取

准备训练所需列表

数据读取

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes