Skip to main content

Go from raw audio to a text-audio dataset with OpenAI's Whisper

Project description

whisperer

Go from raw audio files to a speaker separated text-audio datasets automatically.

plot

Table of Contents

Summary

This repo takes a directory of audio files and converts them to a text-audio dataset with normalized distribution of audio lengths. See AnalyzeDataset.ipynb for examples of the dataset distributions across audio and text length

The output is a text-audio dataset that can be used for training a speech-to-text model or text-to-speech. The dataset structure is as follows:

│── /dataset
│   ├── metadata.txt
│   └── wavs/
│      ├── audio1.wav
│      └── audio2.wav

metadata.txt

peters_0.wav|Beautiful is better than ugly.
peters_1.wav|Explicit is better than implicit.

Key Features

  • Audio files are automatically split by speakers
  • Speakers are auto-labeled across the files
  • Audio splits on silences
  • Audio splitting is configurable
  • The dataset creation is done so that it follows Gaussian-like distributions on clip length. Which, in turn, can lead to Gaussian-like distributions on the rest of the dataset statistics. Of course, this is highly dependent on your audio sources.
  • Leverages the GPUs available on your machine. GPUs also be set explicitly if you only want to use some.

Instalation

You have two options

  1. Install from PyPi with pip
pip install whisperer-ml
  1. User Friendly WebApp Whisperer Web

Note: Under Development but ready to be used

How to use:

  1. Create data folder and move audio files to it
mkdir data data/raw_files
  1. There are four commands

    1. Convert
      whisperer_ml convert path/to/data/raw_files
      
    2. Diarize
      whisperer_ml diarize path/to/data/raw_files
      
    3. Auto-Label
      whisperer_ml auto-label path/to/data/raw_files number_speakers
      
    4. Transcribe
      whisperer_ml transcribe path/to/data/raw_files your_dataset_name
      
    5. Help lists all commands
      whisperer_ml --help 
      
    6. You can run help on a specific command
       whisperer_ml convert --help
    
  2. Use the AnalyseDataset.ipynb notebook to visualize the distribution of the dataset

  3. Use the AnalyseSilence.ipynb notebook to experiment with silence detection configuration

Using Multiple-GPUS

The code automatically detects how many GPU's are available and distributes the audio files in data/wav_files evenly across the GPUs. The automatic detection is done through nvidia-smi.

You can to make the available GPU's explicit by setting the environment variable CUDA_AVAILABLE_DEVICES.

Configuration

Modify config.py file to change the parameters of the dataset creation. Including silence detection.

To Do

  • Speech Diarization
  • Replace click with typer

Acknowledgements

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whisperer_ml-0.1.7.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

whisperer_ml-0.1.7-py3-none-any.whl (13.4 kB view details)

Uploaded Python 3

File details

Details for the file whisperer_ml-0.1.7.tar.gz.

File metadata

  • Download URL: whisperer_ml-0.1.7.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.10.9 Linux/6.2.2-arch1-1

File hashes

Hashes for whisperer_ml-0.1.7.tar.gz
Algorithm Hash digest
SHA256 bb7a529ffb7297da2a5ecf9139f9162526f00229bc230f8f163de862d971146a
MD5 af9a06f5470f222855aae517ed5c343e
BLAKE2b-256 decdb8a292fd23bd89c74855f1a64baf5b991e1e5c582671831b5b5956bca454

See more details on using hashes here.

File details

Details for the file whisperer_ml-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: whisperer_ml-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 13.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.10.9 Linux/6.2.2-arch1-1

File hashes

Hashes for whisperer_ml-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 7034348640e7644ab938f34a3311ac636182529cbed0762ada53c09b601e5018
MD5 b81d09aa36442613eac8bc755216a966
BLAKE2b-256 8ac586b3d454efd8e156cc44358c8d23ba3622358d2fc367af46b988fde92810

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page