Your one-stop solution for voice dataset creation

These details have not been verified by PyPI

Project description

An End-to-End Toolkit for Voice Datasets

VocalForge is an open-source toolkit written in Python 🐍 that is meant to cut down the time to create datasets for, TTS models, hotword detection models, and more so you can spend more time training, and less time sifting through audio data.

Using Nvidia's NEMO, PyAnnote, CTC segmentation , OpenAI's Whisper, this repo will take you from raw audio to a fully formatted dataset, refining both the audio and text automatically.

NOTE: While this does reduce time on spent on dataset curation, verifying the output at each step is important as it isn't perfect

this is a very much an experimental release, so bugs and updates will be frequent

a flow chart of how this repo works

Features:

`audio_demo.ipynb`

⬇️ Download audio from a YouTube playlist (perfect for podcasts/interviews) OR input your own raw audio files (wav format)
🎵 Remove Non Speech Data
🗣🗣 Remove Overlapping Speech
👥 Split Audio File Into Speakers
👤 Isolate the same speaker across multiple files (voice verification)
🧽 Use DeepFilterNet to reduce background noise
🧮 Normalize Audio
➡️ Export with user defined parameters

`text_demo.ipynb`

📜 Batch transcribe text using OpenAI's Whisper
🧮 Run text normalization
🫶 Use CTC segmentation to line up text to audio
🖖 Split audio based on quality of CTC segmentation confidence
✅ Generate a metadata.csv and dataset in the format of LJSpeech

Setup/Requirements

Python 3.8 has been tested, newer versions should work

CUDA is required to run all models

a Hugging Face account is required (it's free and super helpful!)

#install system libraries
apt-get update && apt-get install -y libsndfile1 ffmpeg

conda create -n VocalForge python=3.8 pytorch=1.11.0 torchvision=0.12.0 torchaudio=0.11.0 cudatoolkit=11.3.1 -c pytorch

conda activate VocalForge
#to install from pip
pip install VocalForge
#to install source
git clone https://github.com/rioharper/VocalForge
cd VocalForge
pip install -r requirements.txt

#enter huggingface token, token can be found at https://huggingface.co/settings/tokens
huggingface-cli login

Pyannote models need to be "signed up for" in Hugging Face for research purposes. Don't worry, all it asks for is your purpose, website and organization. The following models will have to be manually visited and given the appropriate info: an example of signing up for a model

API Example

from VocalForge.audio import RefineAudio

refine = RefineAudio(
	input_dir='raw_audio', 
	vad_dir='vad', 
	vad_theshold=0.9
)
refine.VoiceDetection.run()

TODO

Refactor functions for API and toolkit support
"Sync" datasets with the metadata file if audio clips are deleted after being generated
Add a step in the audio refinement processs to remove emotional speech (in progresss)
Create a model to remove non-speech utterences and portions with background music (in progresss)
Update code documentation
Add other normalization methods for audio
Add other dataset formats for generation
Utilize TTS models to automatically generate datasets, with audio augmentation to create diversity
Create a Google Colab Notebook

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.1

Jul 20, 2023

0.1.0

Jul 19, 2023

0.0.6

Jun 11, 2023

0.0.5

Jun 11, 2023

0.0.4

Jun 1, 2023

0.0.3 yanked

Jun 1, 2023

Reason this release was yanked:

no pyannote.audio

0.0.1 yanked

Jun 1, 2023

Reason this release was yanked:

requirements error

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

VocalForge-0.1.1.tar.gz (27.0 kB view details)

Uploaded Jul 20, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

VocalForge-0.1.1-py3-none-any.whl (33.1 kB view details)

Uploaded Jul 20, 2023 Python 3

File details

Details for the file VocalForge-0.1.1.tar.gz.

File metadata

Download URL: VocalForge-0.1.1.tar.gz
Upload date: Jul 20, 2023
Size: 27.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for VocalForge-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`3965c0ac0b28dd88182126dcb56d0a54826dca6fa1d17d666a4532e53f677ab9`
MD5	`8acc994b93e085fb0cc52c9e33c07a51`
BLAKE2b-256	`fe2b2a5f7bbfa4e277c116870c29686d2d8fa7b2e63228e312c6222121cc63c9`

See more details on using hashes here.

File details

Details for the file VocalForge-0.1.1-py3-none-any.whl.

File metadata

Download URL: VocalForge-0.1.1-py3-none-any.whl
Upload date: Jul 20, 2023
Size: 33.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.9.13

File hashes

Hashes for VocalForge-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5b2c56571d6b8df7a4bbab40989391e0c6bcc58eccf9ea7ec42187139bbcc985`
MD5	`b542fae41769439cf532cbf75fa5b761`
BLAKE2b-256	`6ee7ef1aa19d4695d6e61e8ab3094e67493450cd9ac2deb8d4f8a3e7e9daa2a3`

See more details on using hashes here.

VocalForge 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

An End-to-End Toolkit for Voice Datasets

Features:

`audio_demo.ipynb`

`text_demo.ipynb`

Setup/Requirements

API Example

TODO

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes