Skip to main content

A Python library for generating synthetic speech datasets using TTS providers.

Project description

py-speech-gen

PyPI version Python versions License: MIT

A Python library for generating synthetic speech datasets using TTS providers. Supports ElevenLabs, Piper TTS, and Google Cloud TTS out of the box, with an extensible provider system for adding custom backends.

Features

  • Multi-provider support — ElevenLabs, Piper TTS, or your own custom provider
  • Text preprocessing — cleaning, normalization, number-to-words, sentence segmentation
  • Parameter randomization — per-sample variation for voice diversity
  • Background noise injection — 8 synthetic noise types (white, pink, brown, traffic, cafe, home, crowd, mic)
  • Flexible output formats — WAV, MP3, FLAC at configurable sample rates
  • Reproducible generation — export/load configs for deterministic datasets
  • Export options — JSON, CSV, pandas DataFrame

Installation

Basic install (core features only)

pip install py-speech-gen

The base installation includes text processing, dataset management, randomization, and noise mixing.

Install with TTS providers

# Piper TTS (local, offline)
pip install "py-speech-gen[piper]"

# ElevenLabs (cloud API)
pip install "py-speech-gen[elevenlabs]"

# Google Cloud TTS (cloud API)
pip install "py-speech-gen[googlecloud]"

# All providers
pip install "py-speech-gen[all]"

Requirements

  • Python 3.11+
  • For Piper TTS: ONNX Runtime (GPU or CPU variant)
  • For ElevenLabs: valid API key
  • For Google Cloud TTS: Google Cloud credentials and enabled Text-to-Speech API

Documentation

  • Usage Guide — Quick start, examples, presets, and detailed API reference
  • Create a Provider — Step-by-step guide to adding custom TTS providers
  • Provider Documentation — Each provider has its own docs:

Key Components

Component Description
Providers TTS backends (PiperProvider, ElevenLabsProvider, GoogleCloudProvider) implementing BaseProvider
DatasetGenerator Orchestrator that manages generation across multiple providers
Dataset Data model with save/load/export (JSON, CSV, pandas)
TextProcessor Text cleaning, normalization, number-to-words, sentence segmentation
Randomizer Per-sample parameter randomization for voice diversity
NoiseMixer Background noise injection for realistic conditions

Running Tests

pip install "py-speech-gen[dev]"
pytest tests/ -v

License

MIT License — see LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

py_speech_gen-0.2.0.tar.gz (205.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

py_speech_gen-0.2.0-py3-none-any.whl (33.4 kB view details)

Uploaded Python 3

File details

Details for the file py_speech_gen-0.2.0.tar.gz.

File metadata

  • Download URL: py_speech_gen-0.2.0.tar.gz
  • Upload date:
  • Size: 205.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for py_speech_gen-0.2.0.tar.gz
Algorithm Hash digest
SHA256 a386b722b2aee1560bd02fcf132408770cd51bc47bd49deee23e963e80233752
MD5 2d166bebea5527664863a45844f15814
BLAKE2b-256 8f74fd27e3edb836b8ec97bb16528dc00edcab2063f685fa0937fcdd1150179a

See more details on using hashes here.

File details

Details for the file py_speech_gen-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: py_speech_gen-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 33.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.2 {"installer":{"name":"uv","version":"0.11.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for py_speech_gen-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 69f0ad514140afea6d466b7555022421066e3231b1cbce600f151b2e780fddea
MD5 31ad255bcbdccd84ebbb55c869e7df3b
BLAKE2b-256 2f0cec81a96a53f3f02b25d35f735427ce295047196bf6fb27e1ec777a13877a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page