Go from raw audio to a text-audio dataset with OpenAI's Whisper

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

whisperer

Go from raw audio files to a speaker separated text-audio datasets automatically.

plot

Summary
Key Features
Instalation
How to use:
- Using Multiple-GPUS
- Configuration
To Do
Acknowledgements

Summary

This repo takes a directory of audio files and converts them to a text-audio dataset with normalized distribution of audio lengths. See AnalyzeDataset.ipynb for examples of the dataset distributions across audio and text length

The output is a text-audio dataset that can be used for training a speech-to-text model or text-to-speech. The dataset structure is as follows:

│── /dataset
│   ├── metadata.txt
│   └── wavs/
│      ├── audio1.wav
│      └── audio2.wav

metadata.txt

peters_0.wav|Beautiful is better than ugly.
peters_1.wav|Explicit is better than implicit.

Key Features

Audio files are automatically split by speakers
Speakers are auto-labeled across the files
Audio splits on silences
Audio splitting is configurable
The dataset creation is done so that it follows Gaussian-like distributions on clip length. Which, in turn, can lead to Gaussian-like distributions on the rest of the dataset statistics. Of course, this is highly dependent on your audio sources.
Leverages the GPUs available on your machine. GPUs also be set explicitly if you only want to use some.

Instalation

You have two options

Install from PyPi with pip

pip install whisperer-ml

User Friendly WebApp Whisperer Web

Note: Under Development but ready to be used

How to use:

Create data folder and move audio files to it

mkdir data data/raw_files

There are four commands

Convert

whisperer_ml convert path/to/data/raw_files

Diarize

whisperer_ml diarize path/to/data/raw_files

Auto-Label

whisperer_ml auto-label path/to/data/raw_files number_speakers

Transcribe

whisperer_ml transcribe path/to/data/raw_files your_dataset_name

Help lists all commands
```
whisperer_ml --help 
```
You can run help on a specific command

   whisperer_ml convert --help

Use the AnalyseDataset.ipynb notebook to visualize the distribution of the dataset
Use the AnalyseSilence.ipynb notebook to experiment with silence detection configuration

Using Multiple-GPUS

The code automatically detects how many GPU's are available and distributes the audio files in data/wav_files evenly across the GPUs. The automatic detection is done through nvidia-smi.

You can to make the available GPU's explicit by setting the environment variable CUDA_AVAILABLE_DEVICES.

Configuration

Modify config.py file to change the parameters of the dataset creation. Including silence detection.

To Do

Speech Diarization
Replace click with typer

Acknowledgements

Project details

These details have not been verified by PyPI

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.1.7

Mar 7, 2023

0.1.6

Mar 7, 2023

0.1.5

Mar 7, 2023

0.1.4

Feb 26, 2023

0.1.3

Feb 20, 2023

0.1.2

Feb 20, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whisperer_ml-0.1.7.tar.gz (11.9 kB view hashes)

Uploaded Mar 7, 2023 Source

Built Distribution

whisperer_ml-0.1.7-py3-none-any.whl (13.4 kB view hashes)

Uploaded Mar 7, 2023 Python 3

Hashes for whisperer_ml-0.1.7.tar.gz

Hashes for whisperer_ml-0.1.7.tar.gz
Algorithm	Hash digest
SHA256	`bb7a529ffb7297da2a5ecf9139f9162526f00229bc230f8f163de862d971146a`
MD5	`af9a06f5470f222855aae517ed5c343e`
BLAKE2b-256	`decdb8a292fd23bd89c74855f1a64baf5b991e1e5c582671831b5b5956bca454`

Hashes for whisperer_ml-0.1.7-py3-none-any.whl

Hashes for whisperer_ml-0.1.7-py3-none-any.whl
Algorithm	Hash digest
SHA256	`7034348640e7644ab938f34a3311ac636182529cbed0762ada53c09b601e5018`
MD5	`b81d09aa36442613eac8bc755216a966`
BLAKE2b-256	`8ac586b3d454efd8e156cc44358c8d23ba3622358d2fc367af46b988fde92810`