A researcher-friendly, declarative speech data processing toolkit

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

xiaoqinfeng

These details have not been verified by PyPI

Project links

Documentation

Project description

VoxKitchen logo

VoxKitchen

Turn raw speech recordings into clean, inspectable training datasets.

VoxKitchen handles the repetitive audio prep around ASR, TTS, speaker analysis, and data cleaning: convert, segment, label, filter, and export from one Docker-backed YAML pipeline.

Docker-first 51 operators

Status: Pre-alpha. APIs and Docker image contents may change between releases.

Use VoxKitchen when you want to:

turn long recordings into ASR training data;
prepare and inspect TTS datasets;
diarize speakers, tag languages, or run speech quality checks;
clean, filter, and package audio without maintaining one-off scripts.

Why VoxKitchen

Speech data preparation is usually a chain of fragile scripts: convert audio, split speech, denoise, transcribe, diarize, filter, and export. VoxKitchen makes that chain explicit and repeatable:

Docker-first execution: prebuilt runtimes avoid local dependency conflicts.
One YAML pipeline: define ingest, stages, filters, and output packs in one file.
51 built-in operators: audio prep, VAD, ASR, diarization, TTS, quality metrics, and packing.
Resumable by design: every stage checkpoints under ./work.
Inspectable outputs: reports, cut statistics, provenance, and per-stage errors.

Quick Start

Requirements:

Docker
Python 3.10+ for the lightweight vkit launcher

Install the vkit launcher from PyPI:

pipx install voxkitchen      # recommended — isolates the launcher
# or
pip install voxkitchen

This installs only the lightweight launcher and inspection commands (a few MB, no torch / ASR / TTS dependencies). All pipeline runtime dependencies stay inside the prebuilt Docker images.

Run the included demo with the smallest runtime image. No repository clone is required; the published image includes the demo pipeline and demo audio.

vkit docker pull --tag slim
vkit docker run --tag slim examples/pipelines/demo-no-asr.yaml
vkit inspect run ./work/demo-no-asr

vkit docker run writes run artifacts under ./work and exported datasets under ./output with your host user ID. It also mounts ./data automatically when that directory exists.

What You Can Build

Goal	Start with	Runtime image
Clean and filter raw speech audio	`vkit init my-cleaning --template cleaning`	`slim`
Build ASR training manifests	`vkit init my-asr --template asr`	`asr`
Analyze speakers and languages	`vkit init my-speakers --template speaker`	`latest`
Prepare TTS training data (quality gate)	`vkit init my-tts --template tts`	`asr`
Synthesize speech in a built-in voice	see Speaker TTS tutorial	`tts`
Clone a voice from a 3–10 s reference	see Voice Cloning & TTS tutorial	`tts` or `fish-speech`

How It Works

VoxKitchen pipeline overview

A pipeline is a YAML file. Each stage reads a CutSet, writes a checkpoint, and passes the result to the next stage.

version: "0.1"
name: my-pipeline
work_dir: ./work/${name}-${run_id}

ingest:
  source: dir
  args:
    root: ./data
    recursive: true

stages:
  - name: resample
    op: resample
    args: { target_sr: 16000, target_channels: 1 }

  - name: vad
    op: silero_vad
    args: { threshold: 0.5 }

  - name: asr
    op: faster_whisper_asr
    args: { model: large-v3, compute_type: float16 }

  - name: filter
    op: quality_score_filter
    args:
      conditions: ["duration > 1", "duration < 30", "metrics.snr > 10"]

  - name: pack
    op: pack_jsonl

Interrupted runs resume from completed checkpoints.

Create A Project

vkit init my-project --template asr
cd my-project

# Put your audio files in ./data first.
vkit validate pipeline.yaml
vkit docker run --tag asr pipeline.yaml --dry-run
vkit docker run --tag asr pipeline.yaml
vkit inspect run work/

List templates:

vkit init --list-templates

Not sure which image a pipeline needs? Run:

vkit validate pipeline.yaml

It prints the recommended vkit docker pull --tag ... and vkit docker run --tag ... commands for that YAML.

Runtime Images

Every vkit docker command accepts --tag <name>:

Tag	Use when	GPU	Approx. size
`slim`	CPU-friendly cleaning, VAD, quality, pack, enhancement	no	~13 GB
`asr`	Faster-Whisper, FunASR, Qwen3-ASR, forced alignment	yes	~48 GB
`diarize`	Pyannote speaker diarization	yes	~32 GB
`tts`	Kokoro, ChatTTS, CosyVoice	yes	~44 GB
`fish-speech`	Fish-Speech isolated runtime	yes	~57 GB
`latest`	Mixed pipelines across ASR, diarization, TTS, or Fish-Speech	yes	~123 GB

Use latest when one pipeline mixes multiple runtime families, such as ASR plus diarization or ASR plus TTS. Otherwise, prefer the smallest image that contains the operators you need.

Useful checks:

vkit docker pull --tag asr
vkit docker doctor --tag asr --expect asr
vkit docker doctor --tag latest

Configuration

Some operators require API tokens. Create ./.env; vkit docker run passes it into the container automatically.

cp .env.example .env

Variable	Required by	Notes
`HF_TOKEN`	`pyannote_diarize`	Accept the pyannote model agreement on HuggingFace first.

Common Commands

vkit init <path> --template asr           # Scaffold a project
vkit validate pipeline.yaml               # Validate YAML and recommend an image
vkit docker run --tag asr pipeline.yaml --dry-run
vkit docker run --tag asr pipeline.yaml
vkit inspect run work/                    # Stage summary
vkit inspect cuts <cuts.jsonl.gz>          # CutSet statistics
vkit inspect errors work/                  # Per-stage failed cuts
vkit operators search <keyword>            # Find operators by name or summary
vkit operators --category quality          # List one category's operators
vkit schema export --out pipeline.schema.json  # Editor autocompletion for YAML
vkit recipes                               # List dataset recipes
vkit docker download --tag slim librispeech --root ./data/librispeech --subsets dev-clean
vkit docker doctor --tag latest            # Check image health

Documentation

Agent Skill

The repo includes an agent-neutral VoxKitchen skill at skill/. Claude, Codex, and other SKILL.md-compatible agents can copy, symlink, or import that folder into their own skill search path. The skill follows the Docker-first vkit workflow in this README.

License

Apache 2.0. See LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

xiaoqinfeng

These details have not been verified by PyPI

Project links

Documentation

Release history Release notifications | RSS feed

This version

0.3.0

May 22, 2026

0.2.1.dev38 pre-release

May 22, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

voxkitchen-0.3.0.tar.gz (2.0 MB view details)

Uploaded May 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

voxkitchen-0.3.0-py3-none-any.whl (222.8 kB view details)

Uploaded May 22, 2026 Python 3

File details

Details for the file voxkitchen-0.3.0.tar.gz.

File metadata

Download URL: voxkitchen-0.3.0.tar.gz
Upload date: May 22, 2026
Size: 2.0 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for voxkitchen-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`18c90fe358b8e3e2cc7b07f1adbaa7ebcc7067d9ff61347b781a665c311cd647`
MD5	`bf7a75fa8d18f065ffe97077543ec26e`
BLAKE2b-256	`6ec29cef60f406e3e5dff7a1bdf7de4408a6b1ac1e88fbfd576599e358819554`

See more details on using hashes here.

Provenance

The following attestation bundles were made for voxkitchen-0.3.0.tar.gz:

Publisher: publish.yml on XqFeng-Josie/VoxKitchen

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: voxkitchen-0.3.0.tar.gz
- Subject digest: 18c90fe358b8e3e2cc7b07f1adbaa7ebcc7067d9ff61347b781a665c311cd647
- Sigstore transparency entry: 1599786273
- Sigstore integration time: May 22, 2026
Source repository:
- Permalink: XqFeng-Josie/VoxKitchen@6110785e3be09d4c51f213b7bdf33efe42713a92
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/XqFeng-Josie
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6110785e3be09d4c51f213b7bdf33efe42713a92
- Trigger Event: push

File details

Details for the file voxkitchen-0.3.0-py3-none-any.whl.

File metadata

Download URL: voxkitchen-0.3.0-py3-none-any.whl
Upload date: May 22, 2026
Size: 222.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for voxkitchen-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b5dc7179109d18a8f1d479c321936642e1032ab89ea0e02a71bd694bebe25cb3`
MD5	`4e4a3039eecd1af2d72fe638fd0bb849`
BLAKE2b-256	`8ea0c344b1a5efa6f9ec016d09c47c4bcf55aa3f89aa7eb1112a2fd6b718f9a0`

See more details on using hashes here.

Provenance

The following attestation bundles were made for voxkitchen-0.3.0-py3-none-any.whl:

Publisher: publish.yml on XqFeng-Josie/VoxKitchen

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: voxkitchen-0.3.0-py3-none-any.whl
- Subject digest: b5dc7179109d18a8f1d479c321936642e1032ab89ea0e02a71bd694bebe25cb3
- Sigstore transparency entry: 1599786365
- Sigstore integration time: May 22, 2026
Source repository:
- Permalink: XqFeng-Josie/VoxKitchen@6110785e3be09d4c51f213b7bdf33efe42713a92
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/XqFeng-Josie
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@6110785e3be09d4c51f213b7bdf33efe42713a92
- Trigger Event: push

voxkitchen 0.3.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

VoxKitchen

Why VoxKitchen

Quick Start

What You Can Build

How It Works

Create A Project

Runtime Images

Configuration

Common Commands

Documentation

Agent Skill

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance