A model for speech generation with an AR + diffusion architecture.

These details have not been verified by PyPI

Project links

Project description

VibeVoice OpenAI-Compatible TTS API

This is a VibeVoice OpenAI-compatible TTS API.

Community

Join the unofficial Discord community: https://discord.gg/ZDEYTTRxWG - share samples, ask questions, discuss fine-tuning, etc.

Installation

git clone https://github.com/vibevoice-community/VibeVoice-API
cd VibeVoice/

uv pip install -e .

Model Zoo

Model	Context Length	Generation Length	Weight
VibeVoice-1.5B	64K	~90 min	HF link
VibeVoice-Large	32K	~45 min	HF link

Getting Started

Run a local server that is compatible with the OpenAI audio API (client.audio.speech.create). It wraps VibeVoice to synthesize speech from text.

Start the server

python -m vibevoice_api.server --model_path vibevoice/VibeVoice-1.5B --port 8000

API base path (default: `/v1`)

All routes are mounted on /v1 by default. To override the prefix, set VIBEVOICE_API_BASE_PATH (leading slash required) before launching the server:

export VIBEVOICE_API_BASE_PATH=/api
python -m vibevoice_api.server --model_path vibevoice/VibeVoice-1.5B --port 8000

Clients must include the same prefix when constructing URLs. The static console is served at <base_path>/web/console.html.

Endpoints

POST `<base_path>/audio/speech`

Synthesize speech from text.

Request fields (OpenAI-compatible):

model (string): model id or local path (e.g., vibevoice/VibeVoice-1.5B).
voice (string): name mapped to a reference voice, a filesystem path (prefix with path: or absolute), or an alias from a voice map.
input (string): the input text.
response_format (string): wav, pcm (native), or mp3 / opus / aac (require ffmpeg).
stream_format (string, optional): set to sse for Server-Sent Events (streamed base64 PCM chunks).
extra_body (object, optional):
- voice_path: absolute/relative path to a reference audio file.
- voice_data: base64-encoded WAV bytes (optionally as a data URL).

Python example (OpenAI SDK ≥ 1.40):

from openai import OpenAI

base_path = "/v1"  # or your VIBEVOICE_API_BASE_PATH
client = OpenAI(base_url=f"http://127.0.0.1:8000{base_path}", api_key="<YOUR_API_KEY>")

speech = client.audio.speech.create(
    model="vibevoice/VibeVoice-1.5B",
    voice="Andrew",
    input="Hello from VibeVoice!",
    response_format="wav",
)

with open("out.wav", "wb") as f:
    f.write(speech.read())

Pure HTTP example (cURL):

curl -X POST "http://127.0.0.1:8000/v1/audio/speech"   -H "Content-Type: application/json"   -H "Authorization: Bearer <YOUR_API_KEY>"   -d '{
    "model": "vibevoice/VibeVoice-1.5B",
    "voice": "alloy",
    "input": "Hello!",
    "response_format": "mp3"
  }' --output out.mp3

Streaming (SSE): Set "stream_format": "sse" in the request body to receive a stream of SSE events carrying base64-encoded PCM audio chunks. A JS example client is provided in scripts/js/openai_sse_client.mjs.

Voice Mapping

You can define stable, human-friendly voice names via a YAML file that is auto-loaded on each request.

Voice YAML mapping: You can use YAML to manage aliases or automatically scan multiple folders (see next section).

Search order (first found):

Path from VIBEVOICE_VOICE_MAP (relative to repo root or absolute)
./voice_map.yaml
./config/voice_map.yaml

Example (voice_map.yaml):

alloy: en-Frank_man
ash: en-Carter_man

aliases:
  promo_female: demo/voices/en-Alice_woman.wav

directories:
  - demo/custom_voices

Then call with voice: "alloy", or use extra_body.voice_path / extra_body.voice_data per request.

Formats

wav, pcm: native outputs (no extra dependencies).
mp3, opus, aac: require a working ffmpeg binary. Either ensure ffmpeg is on PATH or set VIBEVOICE_FFMPEG to the binary path.

Authentication & Admin (optional)

By default, API-key auth is disabled. To enable:

export VIBEVOICE_REQUIRE_API_KEY=1

With auth enabled, include Authorization: Bearer <YOUR_API_KEY> in client requests.

Admin key management (requires VIBEVOICE_ADMIN_TOKEN; routes respect your <base_path> and default to /v1):

List stored key hashes

curl -sS -H "Authorization: Bearer $VIBEVOICE_ADMIN_TOKEN"   http://127.0.0.1:8000/v1/admin/keys

Create/import a key (omit body to auto-generate with the given prefix)

curl -sS -X POST -H "Authorization: Bearer $VIBEVOICE_ADMIN_TOKEN"   -H "Content-Type: application/json"   -d '{"prefix": "sk-"}'   http://127.0.0.1:8000/v1/admin/keys

Revoke a key by stored hash

curl -sS -X DELETE -H "Authorization: Bearer $VIBEVOICE_ADMIN_TOKEN"   http://127.0.0.1:8000/v1/admin/keys/<key_hash>

Logs are written under logs/ and can be configured via:

VIBEVOICE_LOG_DIR
VIBEVOICE_LOG_PROMPTS=1
VIBEVOICE_PROMPT_MAXLEN=4096

Notes

Only TTS (/audio/speech) is implemented; there are no STT endpoints.
Legacy root routes (e.g., /audio/speech, /metrics) remain for backwards compatibility, but new integrations should prefer the explicit <base_path>.

License

The source code and models are licensed under the MIT License. See the LICENSE file for details.

Note: Microsoft has removed the original repo and models. This fork is based off of the MIT-licensed code from Microsoft.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.0.1

Sep 27, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vibevoice_api-0.0.1.tar.gz (34.8 kB view details)

Uploaded Sep 27, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vibevoice_api-0.0.1-py3-none-any.whl (39.8 kB view details)

Uploaded Sep 27, 2025 Python 3

File details

Details for the file vibevoice_api-0.0.1.tar.gz.

File metadata

Download URL: vibevoice_api-0.0.1.tar.gz
Upload date: Sep 27, 2025
Size: 34.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.10

File hashes

Hashes for vibevoice_api-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`612a48efe1b957d58189761d98be31ff30937ce5d6bc1e6c19731c27fbeaa227`
MD5	`efd7aad0f46d3ccb88bfe1013c67b848`
BLAKE2b-256	`f959b690fa854702c7b01c2716ee3223294bf9ebecf6fa37322ae2d37dfcf529`

See more details on using hashes here.

File details

Details for the file vibevoice_api-0.0.1-py3-none-any.whl.

File metadata

Download URL: vibevoice_api-0.0.1-py3-none-any.whl
Upload date: Sep 27, 2025
Size: 39.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.12.10

File hashes

Hashes for vibevoice_api-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`545988a9eaca9d5a8962028c565bb8d93bc63b334a9acd47643e9fb65f5d7d9d`
MD5	`7122e2526b9fa869d8d72800bcb05464`
BLAKE2b-256	`f4156e63313d77cbd10b5b0a763a7a840eb9be0f9d29bee5deafbe6236c672d6`

See more details on using hashes here.

vibevoice-api 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

VibeVoice OpenAI-Compatible TTS API

Community

Installation

Model Zoo

Getting Started

Start the server

API base path (default: `/v1`)

Endpoints

POST `<base_path>/audio/speech`

Voice Mapping

Formats

Authentication & Admin (optional)

Notes

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

vibevoice-api 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

VibeVoice OpenAI-Compatible TTS API

Community

Installation

Model Zoo

Getting Started

Start the server

API base path (default: /v1)

Endpoints

POST <base_path>/audio/speech

Voice Mapping

Formats

Authentication & Admin (optional)

Notes

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

API base path (default: `/v1`)

POST `<base_path>/audio/speech`