Local TTS and audio transcription web app
Project description
VocalFlow
Local TTS, voice cloning, and transcription for Windows.
Self-hosted web app that runs entirely on your machine. No cloud APIs, no data leaves your PC.
Features
- Voice Cloning — clone any voice from a short audio clip (3-10s), save & reuse voice prompts
- Custom Voice — 9 preset speakers with emotion/tone control ("say it angrily", "whisper softly")
- Voice Design — create new voices from text descriptions, no reference audio needed
- Transcription — Whisper-powered transcription with word-level timestamps, 6 model sizes
- Smart GPU — automatic model load/unload between switches, only one model in VRAM at a time
- Flash Attention 2 for faster inference
- 10+ languages with auto-detection
Requirements
- Windows 10/11 with a CUDA GPU (6+ GB VRAM)
- Python 3.11
- FFmpeg —
winget install ffmpeg - SoX —
winget install sox
Quick Start
pip install vocalflow
vocalflow
Open http://localhost:5001. Models download automatically on first use.
From source
git clone https://github.com/0xBinayak/VocalFlow.git
cd VocalFlow
uv sync
uv run app.py
For auto-reload during development: uv run gradio app.py
Models
| Model | Params | Use |
|---|---|---|
| Qwen3-TTS-1.7B-Base | 1.7B | Voice cloning (best quality) |
| Qwen3-TTS-0.6B-Base | 0.6B | Voice cloning (faster) |
| Qwen3-TTS-1.7B-CustomVoice | 1.7B | Preset speakers + instructions |
| Qwen3-TTS-0.6B-CustomVoice | 0.6B | Preset speakers only |
| Qwen3-TTS-1.7B-VoiceDesign | 1.7B | Voice from text description |
| Whisper (tiny-turbo) | 39M-1.5B | Transcription |
All TTS models run in bfloat16 with SDPA/Flash Attention. Whisper falls back to CPU if no GPU.
Contributing
- Fork & clone,
uv sync, create a branch - Ensure
uvx ruff check app.py transcribe.py main.pypasses - Open a PR against
main
Open an issue if you find a bug.
License
MIT
Acknowledgments
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vocalflow-0.7.0.tar.gz.
File metadata
- Download URL: vocalflow-0.7.0.tar.gz
- Upload date:
- Size: 67.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e799603cd2030b4bad3a3a982666099faa7c44e8e1e2d60b4eac4d3cde1fec46
|
|
| MD5 |
1e640dbe14fe951bc7d9db18358cd32f
|
|
| BLAKE2b-256 |
626ad5dc5457f7eb4795c4adbd63c70e1fd07e31d93972b6a99ae87eef3fef2f
|
Provenance
The following attestation bundles were made for vocalflow-0.7.0.tar.gz:
Publisher:
publish.yml on 0xBinayak/VocalFlow
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vocalflow-0.7.0.tar.gz -
Subject digest:
e799603cd2030b4bad3a3a982666099faa7c44e8e1e2d60b4eac4d3cde1fec46 - Sigstore transparency entry: 1156203328
- Sigstore integration time:
-
Permalink:
0xBinayak/VocalFlow@2a660e54fd545061bd27d64e78c46912ed1af100 -
Branch / Tag:
refs/tags/v0.7.0 - Owner: https://github.com/0xBinayak
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@2a660e54fd545061bd27d64e78c46912ed1af100 -
Trigger Event:
push
-
Statement type:
File details
Details for the file vocalflow-0.7.0-py3-none-any.whl.
File metadata
- Download URL: vocalflow-0.7.0-py3-none-any.whl
- Upload date:
- Size: 14.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
caec20afcd324e9d3b8eeef04d619f7b7e5c533640f1649cf34ac5685580566d
|
|
| MD5 |
aa764098428f7af658ff391dba69561c
|
|
| BLAKE2b-256 |
a59633a9ac27de2004531fc37e5056abe1590ae7ab6bdbaaf5fa381ea62bb98a
|
Provenance
The following attestation bundles were made for vocalflow-0.7.0-py3-none-any.whl:
Publisher:
publish.yml on 0xBinayak/VocalFlow
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
vocalflow-0.7.0-py3-none-any.whl -
Subject digest:
caec20afcd324e9d3b8eeef04d619f7b7e5c533640f1649cf34ac5685580566d - Sigstore transparency entry: 1156203334
- Sigstore integration time:
-
Permalink:
0xBinayak/VocalFlow@2a660e54fd545061bd27d64e78c46912ed1af100 -
Branch / Tag:
refs/tags/v0.7.0 - Owner: https://github.com/0xBinayak
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@2a660e54fd545061bd27d64e78c46912ed1af100 -
Trigger Event:
push
-
Statement type: