Local-first CLI to turn Markdown scripts into multi-speaker audio using Coqui XTTS v2.
Project description
Podvoice
Local-first, open-source CLI that turns simple Markdown scripts into multi-speaker audio using Coqui XTTS v2.
Podvoice is designed for developers who want a practical way to turn podcast-style scripts or conversational content into audio, without cloud services or paid APIs.
Why this tool exists
- Many TTS tools are tied to proprietary cloud APIs.
- Podcast creators and developers often just want a simple, script-based workflow.
- Running everything locally gives you full control over data, reproducibility, and cost.
Podvoice aims to be a small, honest, hackable starting point: no research complexity, no training code, just a clear command line tool built on stable open-source components.
Features
-
Markdown-based scripts Write your content as a
.mdfile with clear speaker blocks. -
Multiple logical speakers Each speaker name is mapped consistently to a voice in the XTTS model.
-
Single output file Podvoice generates one stitched audio file for the whole script.
-
WAV or MP3 export WAV by default, MP3 when the output path ends with
.mp3. -
Local-only inference Uses the pre-trained Coqui XTTS v2 model, downloaded once and cached.
-
CPU-friendly by default Runs on CPU out of the box; GPU is optional if available.
-
Beginner-friendly code Small, modular Python 3.10+ codebase with comments and clear structure.
🎧 Demo
Input format
Podvoice expects a Markdown file with blocks like this:
[SpeakerA | calm]
Hello and welcome to the show.
[SpeakerB | excited]
Aaj hum AI ke baare mein baat karenge.
Rules:
- Speaker name is required.
- Emotion is optional and can be any free-form tag.
- Text continues until the next
[Speaker | emotion]block. - Blank lines are allowed inside a block.
In v0.1, the emotion tag is parsed and preserved but not interpreted by
XTTS directly. You can still use it for your own tooling or future
extensions.
Quick start
1. Prerequisites
- Python 3.10+
ffmpeginstalled on your system (required bypydub)- A stable internet connection only for the first run, so that the pre-trained XTTS v2 model can be downloaded and cached locally.
- Enough disk space for the model weights (several GB is recommended).
On Ubuntu/Debian, you can typically install ffmpeg with:
sudo apt-get install ffmpeg
2. Install dependencies
From the project root:
pip install -r requirements.txt
This will install:
- PyTorch + torchaudio
- Coqui TTS (including XTTS v2)
- pydub
- Typer + Rich
- The
podvoicepackage itself (editable install)
3. Run the demo
From the project root:
podvoice render examples/demo.md --out demo.wav
or to export MP3:
podvoice render examples/demo.md --out demo.mp3
On first run, Coqui TTS will download the XTTS v2 model and cache it in your local environment. Subsequent runs reuse the cached model.
CLI usage
The main command is:
podvoice render SCRIPT.md --out OUTPUT
Basic example:
podvoice render examples/demo.md --out output.wav
With explicit options:
podvoice render \
examples/demo.md \
--out podcast.mp3 \
--language en \
--device cpu
Options:
-
SCRIPT(positional) Path to the input Markdown file. -
--out/-oOutput audio path. If omitted, Podvoice defaults toSCRIPTwith a.wavextension. -
--language/-lLanguage code for XTTS v2 (for exampleen,de,fr). Default isen. -
--device/-dTorch device to run on. Default iscpu. If you have a compatible GPU, you can trycuda.
If anything goes wrong (file not found, invalid Markdown format, model load issue, or synthesis error), the CLI prints a clear error message and exits with a non-zero status code.
How voices are assigned
Podvoice does not train or fine-tune new voices. Instead, it:
- Uses the pre-trained Coqui XTTS v2 model.
- Queries the list of built-in speakers exposed by the model (if available).
- Maps each
speakername from your Markdown script to one of these built-in speakers using a deterministic hash.
This means:
- Each logical speaker name (like
Host,Guest,Narrator) gets a consistent voice for the whole script. - Changing the speaker name (for example,
AlicevsBob) can change which built-in voice is used. - If the underlying XTTS speaker list changes between versions, the mapping may also change.
If the model does not expose named speakers, Podvoice falls back to the model's default voice for all segments.
Hardware requirements
This project is intentionally conservative so it can run on typical developer machines.
-
CPU-only by default No GPU is required. The CLI passes
--device cpuunless you override it. -
Memory 8 GB of RAM is a comfortable minimum. More will help when running larger scripts.
-
Disk space Expect several gigabytes of disk usage for the XTTS v2 model weights and cache.
-
Runtime On CPU, generating longer podcasts can take a while. You can monitor progress via the Rich progress bar in the terminal.
Example Markdown script
Here is the example provided in examples/demo.md:
[Host | calm]
Hello and welcome to the Podvoice demo.
In this short example, we will generate a tiny podcast-style conversation
from a Markdown script.
[Guest | excited]
Aaj hum AI ke baare mein baat karenge.
Yeh saara audio aapke local machine par generate ho raha hai.
[Host | calm]
Thanks for listening. Happy hacking!
You can copy this file and adapt it to your own podcast episodes or conversational content.
Project structure
podvoice/
├── podvoice/
│ ├── __init__.py
│ ├── cli.py # Typer CLI entrypoint
│ ├── parser.py # Markdown script parser
│ ├── tts.py # XTTS loading + inference
│ ├── audio.py # Audio concatenation/export
│ └── utils.py # Shared helpers
│
├── examples/
│ └── demo.md # Sample Markdown script
│
├── requirements.txt
├── pyproject.toml
└── README.md
Each module is small and documented so you can easily read and modify it for your own needs.
Responsible use
Podvoice uses a powerful pre-trained TTS model that can generate natural sounding speech. Please use it responsibly:
- Do not use generated voices to impersonate real people without their clear, informed consent.
- Do not use this tool for harassment, fraud, or misleading activities.
- Make it clear to listeners when content has been generated or synthesized.
You are responsible for how you use the tool and for complying with the licenses of all dependencies, including the Coqui XTTS v2 model.
Contributing
This is an early, practical v0.1. Bug reports, small improvements, and clear documentation fixes are especially welcome.
Feel free to:
- Open issues with script examples that fail to parse.
- Suggest better defaults for audio normalization or silence between segments.
- Improve error messages and CLI UX.
The goal is to keep Podvoice simple, understandable, and genuinely useful for local-first workflows.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file podvoice-0.1.0.tar.gz.
File metadata
- Download URL: podvoice-0.1.0.tar.gz
- Upload date:
- Size: 13.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
59c699c96a1032ada38b193212b4367e0fd565711149f8fc75b1d24564bc7d99
|
|
| MD5 |
26647f81242e7c4b75861ded373b6121
|
|
| BLAKE2b-256 |
7d05c3910384836656a2d8847299fa7dfd1dba183d032065284cd3109c96e263
|
Provenance
The following attestation bundles were made for podvoice-0.1.0.tar.gz:
Publisher:
python-publish.yml on aman179102/podvoice
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
podvoice-0.1.0.tar.gz -
Subject digest:
59c699c96a1032ada38b193212b4367e0fd565711149f8fc75b1d24564bc7d99 - Sigstore transparency entry: 941918438
- Sigstore integration time:
-
Permalink:
aman179102/podvoice@38c066efc60300e648c10674e3d0c2b7383f7b27 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/aman179102
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@38c066efc60300e648c10674e3d0c2b7383f7b27 -
Trigger Event:
release
-
Statement type:
File details
Details for the file podvoice-0.1.0-py3-none-any.whl.
File metadata
- Download URL: podvoice-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
202d30a345c2b50429ee8d15f1f5b8a4ac65d74e9d96ea8ccfa779c3bca5f502
|
|
| MD5 |
77a7619e20a99c62ba7aa50dbd68ed88
|
|
| BLAKE2b-256 |
98c59a9fa5a07122185de651afcd8db2d338374aedc1add9b69f58a7994b547b
|
Provenance
The following attestation bundles were made for podvoice-0.1.0-py3-none-any.whl:
Publisher:
python-publish.yml on aman179102/podvoice
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
podvoice-0.1.0-py3-none-any.whl -
Subject digest:
202d30a345c2b50429ee8d15f1f5b8a4ac65d74e9d96ea8ccfa779c3bca5f502 - Sigstore transparency entry: 941918446
- Sigstore integration time:
-
Permalink:
aman179102/podvoice@38c066efc60300e648c10674e3d0c2b7383f7b27 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/aman179102
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@38c066efc60300e648c10674e3d0c2b7383f7b27 -
Trigger Event:
release
-
Statement type: