tungnaa is a text-to-voice model family and musical instrument
Project description
Tungnaa Interactive Voice Instruments
Training and GUI inference for interactive artistic text-to-voice models.
We would love to hear your feedback on using Tungnaá in this short survey: https://forms.gle/F97yhJ1YB5aiZPmn7
Installation
pip install tungnaa[gui] (if you just want to use Tungnaá)
pip install tungnaa[train] (if you want to train your own models)
Usage
tungnaa --help
Running with the Python Audio Engine
- #todo model selection from the Tungnaa gui, including block size and sample rate that match system (currently not possible to have a SR mismatch)
- #todo audio device selection from the Tungnaa gui
tungnaa run --audio-out default
use tungnaa list-devices to get audio devices by index
Using SuperCollider, PureData or Max as Audio Engine
If --latent-audio switch is enabled, Tungnaa will stream RAVE latent trajectories over a single audio-rate channel, which can be piped into another audio engine running the RAVE vocoder. The piping can be done relatively easily on Linux using JACK, and on MacOS using Blackhole.
SuperCollider:
sclang supercollider/rtvoice-demo.scd
tungnaa run --latent_audio
Training Models
vocoder training
using victor-shepardson RAVE fork
example preprocessing with joining of short files (especially useful for datasets containing many short utterances)
rave preprocess \
--input_path /path/to/audio/directory \
--output_path /path/to/tmp/storage/myravedata \
--num_signal 150000 --sampling_rate 48000 \
--join_short_files
example transfer learning using IIL rave-models:
rave train --name 001-my-vocoder-name \
--config rave-models/voice_multi_b2048_r48000/config.gin --config transfer \
--db_path /path/to/tmp/storage/myravedata \
--out_path /path/to/rave/runs \
--transfer_ckpt rave-models/voice_multi_b2048_r48000/version_0/checkpoints/last.ckpt \
--n_signal 150000
--gpu 0
example export using sign normalization (latents correlate with louder/brighter sounds):
rave export --run /path/to/rave/runs/001-my-vocoder-name \
--streaming --normalize_sign --latent_size ...
Tungnaá preprocessing
see tungnaa prep --help.
To use datasets other than vctk or hifitts, it may be necessary to add an adapter function in prep.py.
example:
tungnaa prep \
--datasets '{kind:"vctk", path:"/path/to/VCTK"}' \
--rave-path /path/to/rave_streaming.ts \
--out-path /path/to/tmp/dataset_name
training
see tungnaa trainer --help
example:
tungnaa trainer --experiment 001-my-tts-name \
--model-dir /path/for/checkpoints \
--log-dir /path/for/logs \
--manifest /path/to/tmp/dataset_name/manifest.json \
--rave-model /path/to/rave_streaming.ts
--lr 3e-4 --lr-text 3e-5 --epoch-size 200 --save-epochs 20 \
--device cuda:0 \
train
resume a stopped training: add --checkpoint /path/to/checkpoint
transfer learning: add --checkpoint /path/to/checkpoint --resume False
in-text annotations
--speaker_annotate prepends the speaker id determined during prepreprocessing. With --speaker_dataset, it includes the dataset.
--csv /path/to/file.csv accepts a jvs_labels_encoder_k7.csv-style CSV file. First column contains the audio filename without extension, second column contains an annotation to be prepended to text.
If you were to use all three options, you would get:
"csvval:[dataset:speaker] original text"
Developing
Poetry is used for packaging and dependency management. Conda is used for environments and Python version management, and may be replaced by virtualenv or similar.
cd tungnaaconda create -n tungnaa python=3.12 ffmpegconda activate tungnaapoetry install
Note that poetry should not be installed in the project environment, but rather from the system package manager, with pipx, or in a separate environment.
To add a dependency, use poetry add, or edit pyproject.toml and then run poetry lock; poetry install.
To add a model, use dvc add /path/to/model, then git add /path/to/model.dvc. Tungnaá models should go in models/tts/, and be accompanied by a model.md file. Vocoders should go in models/vocoders.
docs
run mkdocs serve to build and view documentation
run mkdocs gh-deploy to deploy to github pages
Non-Python Dependencies
- Poetry (https://python-poetry.org/) for packaging
Python Dependencies
See pyproject.toml
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file tungnaa-0.2.0.tar.gz.
File metadata
- Download URL: tungnaa-0.2.0.tar.gz
- Upload date:
- Size: 102.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.9.13 Darwin/24.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1228dab1b9eaeeb0f73158dc20eab0852d69635f6625bb45595a9ad7fdaf14ea
|
|
| MD5 |
a22fde1640406275a3650dfda26b5f89
|
|
| BLAKE2b-256 |
527c7bcd6661b042234fe9a87ed9c14014f4573d57e3842b91735a2fd945b7f2
|
File details
Details for the file tungnaa-0.2.0-py3-none-any.whl.
File metadata
- Download URL: tungnaa-0.2.0-py3-none-any.whl
- Upload date:
- Size: 102.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.7.1 CPython/3.9.13 Darwin/24.6.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2effeefe4dbf13b5d69d5a280f189cec7777456d2c679b00b9d666496b5042d0
|
|
| MD5 |
b04119e2d19495492ff00ecf3485d474
|
|
| BLAKE2b-256 |
4adcef71dc5e162d73b33669ec29cb1622024c97c7ce269913dfb365035bfc5d
|