Skip to main content

Vox box

Project description

Vox Box

A text-to-speech and speech-to-text server compatible with the OpenAI API, powered by backend support from Whisper, FunASR, Bark, Dia and CosyVoice.

Requirements

Installation

You can install the project using pip:

pip install vox-box

# For MacOS, you need to manually install `openfst`, `pynini`, and `wetextprocessing` after installing `vox-box` to make `cosyvoice` work:
brew install openfst
export CPLUS_INCLUDE_PATH=$(brew --prefix openfst)/include
export LIBRARY_PATH=$(brew --prefix openfst)/lib
pip install pynini==2.1.6
pip install wetextprocessing==1.0.4.1

Usage

vox-box start --huggingface-repo-id Systran/faster-whisper-small --data-dir ./cache/data-dir --host 0.0.0.0 --port 80

# Windows
vox-box start --huggingface-repo-id Systran/faster-whisper-small --data-dir C:\Users\michelia\AppData\Roaming\vox-box --host 0.0.0.0 --port 8082

Options

  • -d, --debug: Enable debug mode.
  • --host: Host to bind the server to. Default is 0.0.0.0.
  • --port: Port to bind the server to. Default is 80.
  • --model: model path.
  • --device: Binding device, e.g., cuda:0. Default is cpu.
  • --huggingface-repo-id: Huggingface repo id for the model.
  • --model-scope-model-id: Model scope model id for the model.
  • --data-dir: Directory to store downloaded model data. Default is OS specific.

Supported Models

Model Type Link Verified Platforms
Faster-whisper-large-v3 speech-to-text Hugging Face, ModelScope Linux ✅, Windows ✅, MacOS ✅
Faster-whisper-large-v2 speech-to-text Hugging Face, ModelScope Linux ✅, Windows ✅, MacOS ✅
Faster-whisper-large-v1 speech-to-text Hugging Face, ModelScope
Faster-whisper-medium speech-to-text Hugging Face, ModelScope Linux ✅, Windows ✅, MacOS ✅
Faster-whisper-medium.en speech-to-text Hugging Face, ModelScope
Faster-whisper-small speech-to-text Hugging Face, ModelScope Linux ✅, Windows ✅, MacOS ✅
Faster-whisper-small.en speech-to-text Hugging Face, ModelScope
Faster-distil-whisper-large-v3 speech-to-text Hugging Face, ModelScope MacOS ✅
Faster-distil-whisper-large-v2 speech-to-text Hugging Face, ModelScope MacOS ✅
Faster-distil-whisper-medium.en speech-to-text Hugging Face, ModelScope
Faster-whisper-tiny speech-to-text Hugging Face, ModelScope
Faster-whisper-tiny.en speech-to-text Hugging Face, ModelScope
Paraformer-zh speech-to-text Hugging Face, ModelScope
Paraformer-zh-streaming speech-to-text Hugging Face, ModelScope Linux ✅, MacOS ✅
Paraformer-en speech-to-text Hugging Face, ModelScope
Conformer-en speech-to-text Hugging Face, Modelscope
SenseVoiceSmall speech-to-text Hugging Face, ModelScope Linux ✅, Windows ✅, MacOS ✅
Bark text-to-speech Hugging Face Linux ✅, Windows, MacOS ✅
Bark-small text-to-speech Hugging Face Linux ✅, Windows, MacOS ✅
CosyVoice2-0.5B text-to-speech Hugging Face, ModelScope Linux(ARM not supported) ✅, Windows(Not supported), macOS ✅
CosyVoice-300M-Instruct text-to-speech Hugging Face, ModelScope Linux(ARM not supported) ✅, Windows(Not supported), macOS ✅
CosyVoice-300M-SFT text-to-speech Hugging Face, ModelScope Linux(ARM not supported) ✅, Windows(Not supported), macOS ✅
CosyVoice-300M text-to-speech Hugging Face, ModelScope Linux(ARM not supported) ✅, Windows(Not supported), macOS ✅
CosyVoice-300M-25Hz text-to-speech ModelScope Linux(ARM not supported) ✅, Windows(Not supported), macOS ✅
Dia-1.6B text-to-speech Hugging Face, ModelScope Linux(ARM not supported) ✅, Windows(Not supported), macOS ✅

Supported APIs

Create speech

Endpoint: POST /v1/audio/speech

Generates audio from the input text. Compatible with the OpenAI audio/speech API.

Example Request:

curl http://localhost/v1/audio/speech \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "cosyvoice",
    "input": "Hello world",
    "voice": "English Female"
  }' \
  --output speech.mp3

Response: The audio file content.

Create transcription

Endpoint: POST /v1/audio/transcriptions

Transcribes audio into the input language. Compatible with the OpenAI audio/transcription API.

Example Request:

curl https://localhost/v1/audio/transcriptions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F file="@/path/to/file/audio.mp3" \
  -F model="whisper-large-v3"

Response:

{
  "text": "Hello world."
}

List Models

Endpoint: GET /v1/models

Returns the current running models.

Get Model

Endpoint: GET /v1/models/{model_id}

Returns the current running model.

Get Voices

Endpoint: GET /v1/voices

Returns the supported voice for current running model.

Health Check

Endpoint: GET /health

Returns the heath check result of the Vox Box.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vox_box-0.0.21.tar.gz (2.1 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

vox_box-0.0.21-py3-none-manylinux2014_x86_64.whl (2.2 MB view details)

Uploaded Python 3

vox_box-0.0.21-py3-none-manylinux2014_aarch64.whl (2.2 MB view details)

Uploaded Python 3

File details

Details for the file vox_box-0.0.21.tar.gz.

File metadata

  • Download URL: vox_box-0.0.21.tar.gz
  • Upload date:
  • Size: 2.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.14

File hashes

Hashes for vox_box-0.0.21.tar.gz
Algorithm Hash digest
SHA256 941e379243cf3f7f3cd1a095946e0184c69758c67e90306d626c557c71411816
MD5 dfaa5df456011d7422d906fb0287b7bd
BLAKE2b-256 a29b588fab6951a1ab6f901179ba23b741587314970efec8a0d0f3fb2097c17c

See more details on using hashes here.

File details

Details for the file vox_box-0.0.21-py3-none-manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for vox_box-0.0.21-py3-none-manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 6da6ab33604e99f5a1b5766a36f5dc7870908cc90e8a8a815f314c471853425c
MD5 054fbfb5a5cd52b67c3fff442d65b0ff
BLAKE2b-256 725f05390954b1141f4b307bdd96c2eb388bf45722437dc690940a7142972597

See more details on using hashes here.

File details

Details for the file vox_box-0.0.21-py3-none-manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for vox_box-0.0.21-py3-none-manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 be41621db841dfa5fdf3376f651fbb91006a25ffc36ec4da4be7e10cd5dc535b
MD5 a180bc2383474c5e3c1106c346786997
BLAKE2b-256 850cb6c41274c274b90d69aa76261c858ce7d07fd96ecbda6229249119a27935

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page