Skip to main content

A model for speech generation with an AR + diffusion architecture.

Project description

VibeVoice OpenAI-Compatible TTS API

This is a VibeVoice OpenAI-compatible TTS API.

Community

Join the unofficial Discord community: https://discord.gg/ZDEYTTRxWG - share samples, ask questions, discuss fine-tuning, etc.

Installation

git clone https://github.com/vibevoice-community/VibeVoice-API
cd VibeVoice/

uv pip install -e .

Model Zoo

Model Context Length Generation Length Weight
VibeVoice-1.5B 64K ~90 min HF link
VibeVoice-Large 32K ~45 min HF link

Getting Started

Run a local server that is compatible with the OpenAI audio API (client.audio.speech.create). It wraps VibeVoice to synthesize speech from text.

Start the server

python -m vibevoice_api.server --model_path vibevoice/VibeVoice-1.5B --port 8000

API base path (default: /v1)

All routes are mounted on /v1 by default. To override the prefix, set VIBEVOICE_API_BASE_PATH (leading slash required) before launching the server:

export VIBEVOICE_API_BASE_PATH=/api
python -m vibevoice_api.server --model_path vibevoice/VibeVoice-1.5B --port 8000

Clients must include the same prefix when constructing URLs. The static console is served at <base_path>/web/console.html.

Endpoints

POST <base_path>/audio/speech

Synthesize speech from text.

Request fields (OpenAI-compatible):

  • model (string): model id or local path (e.g., vibevoice/VibeVoice-1.5B).
  • voice (string): name mapped to a reference voice, a filesystem path (prefix with path: or absolute), or an alias from a voice map.
  • input (string): the input text.
  • response_format (string): wav, pcm (native), or mp3 / opus / aac (require ffmpeg).
  • stream_format (string, optional): set to sse for Server-Sent Events (streamed base64 PCM chunks).
  • extra_body (object, optional):
    • voice_path: absolute/relative path to a reference audio file.
    • voice_data: base64-encoded WAV bytes (optionally as a data URL).

Python example (OpenAI SDK ≥ 1.40):

from openai import OpenAI

base_path = "/v1"  # or your VIBEVOICE_API_BASE_PATH
client = OpenAI(base_url=f"http://127.0.0.1:8000{base_path}", api_key="<YOUR_API_KEY>")

speech = client.audio.speech.create(
    model="vibevoice/VibeVoice-1.5B",
    voice="Andrew",
    input="Hello from VibeVoice!",
    response_format="wav",
)

with open("out.wav", "wb") as f:
    f.write(speech.read())

Pure HTTP example (cURL):

curl -X POST "http://127.0.0.1:8000/v1/audio/speech"   -H "Content-Type: application/json"   -H "Authorization: Bearer <YOUR_API_KEY>"   -d '{
    "model": "vibevoice/VibeVoice-1.5B",
    "voice": "alloy",
    "input": "Hello!",
    "response_format": "mp3"
  }' --output out.mp3

Streaming (SSE): Set "stream_format": "sse" in the request body to receive a stream of SSE events carrying base64-encoded PCM audio chunks. A JS example client is provided in scripts/js/openai_sse_client.mjs.

Voice Mapping

You can define stable, human-friendly voice names via a YAML file that is auto-loaded on each request.

  • Voice YAML mapping: You can use YAML to manage aliases or automatically scan multiple folders (see next section).

Search order (first found):

  1. Path from VIBEVOICE_VOICE_MAP (relative to repo root or absolute)
  2. ./voice_map.yaml
  3. ./config/voice_map.yaml

Example (voice_map.yaml):

alloy: en-Frank_man
ash: en-Carter_man

aliases:
  promo_female: demo/voices/en-Alice_woman.wav

directories:
  - demo/custom_voices

Then call with voice: "alloy", or use extra_body.voice_path / extra_body.voice_data per request.

Formats

  • wav, pcm: native outputs (no extra dependencies).
  • mp3, opus, aac: require a working ffmpeg binary. Either ensure ffmpeg is on PATH or set VIBEVOICE_FFMPEG to the binary path.

Authentication & Admin (optional)

By default, API-key auth is disabled. To enable:

export VIBEVOICE_REQUIRE_API_KEY=1

With auth enabled, include Authorization: Bearer <YOUR_API_KEY> in client requests.

Admin key management (requires VIBEVOICE_ADMIN_TOKEN; routes respect your <base_path> and default to /v1):

List stored key hashes

curl -sS -H "Authorization: Bearer $VIBEVOICE_ADMIN_TOKEN"   http://127.0.0.1:8000/v1/admin/keys

Create/import a key (omit body to auto-generate with the given prefix)

curl -sS -X POST -H "Authorization: Bearer $VIBEVOICE_ADMIN_TOKEN"   -H "Content-Type: application/json"   -d '{"prefix": "sk-"}'   http://127.0.0.1:8000/v1/admin/keys

Revoke a key by stored hash

curl -sS -X DELETE -H "Authorization: Bearer $VIBEVOICE_ADMIN_TOKEN"   http://127.0.0.1:8000/v1/admin/keys/<key_hash>

Logs are written under logs/ and can be configured via:

  • VIBEVOICE_LOG_DIR
  • VIBEVOICE_LOG_PROMPTS=1
  • VIBEVOICE_PROMPT_MAXLEN=4096

Notes

  • Only TTS (/audio/speech) is implemented; there are no STT endpoints.
  • Legacy root routes (e.g., /audio/speech, /metrics) remain for backwards compatibility, but new integrations should prefer the explicit <base_path>.

License

The source code and models are licensed under the MIT License. See the LICENSE file for details.

Note: Microsoft has removed the original repo and models. This fork is based off of the MIT-licensed code from Microsoft.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vibevoice_api-0.0.1.tar.gz (34.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vibevoice_api-0.0.1-py3-none-any.whl (39.8 kB view details)

Uploaded Python 3

File details

Details for the file vibevoice_api-0.0.1.tar.gz.

File metadata

  • Download URL: vibevoice_api-0.0.1.tar.gz
  • Upload date:
  • Size: 34.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.10

File hashes

Hashes for vibevoice_api-0.0.1.tar.gz
Algorithm Hash digest
SHA256 612a48efe1b957d58189761d98be31ff30937ce5d6bc1e6c19731c27fbeaa227
MD5 efd7aad0f46d3ccb88bfe1013c67b848
BLAKE2b-256 f959b690fa854702c7b01c2716ee3223294bf9ebecf6fa37322ae2d37dfcf529

See more details on using hashes here.

File details

Details for the file vibevoice_api-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: vibevoice_api-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 39.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.12.10

File hashes

Hashes for vibevoice_api-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 545988a9eaca9d5a8962028c565bb8d93bc63b334a9acd47643e9fb65f5d7d9d
MD5 7122e2526b9fa869d8d72800bcb05464
BLAKE2b-256 f4156e63313d77cbd10b5b0a763a7a840eb9be0f9d29bee5deafbe6236c672d6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page