Vox box

These details have not been verified by PyPI

Project description

Vox Box

A text-to-speech and speech-to-text server compatible with the OpenAI API, powered by backend support from Whisper, FunASR, Bark, Dia and CosyVoice.

Requirements

Python 3.10 or greater
Support Nvidia GPU, requires the following NVIDIA libraries to be installed:
- cuBLAS for CUDA 12
- cuDNN 9 for CUDA 12

Installation

You can install the project using pip:

pip install vox-box

# For MacOS, you need to manually install `openfst`, `pynini`, and `wetextprocessing` after installing `vox-box` to make `cosyvoice` work:
brew install openfst
export CPLUS_INCLUDE_PATH=$(brew --prefix openfst)/include
export LIBRARY_PATH=$(brew --prefix openfst)/lib
pip install pynini==2.1.6
pip install wetextprocessing==1.0.4.1

Usage

vox-box start --huggingface-repo-id Systran/faster-whisper-small --data-dir ./cache/data-dir --host 0.0.0.0 --port 80

# Windows
vox-box start --huggingface-repo-id Systran/faster-whisper-small --data-dir C:\Users\michelia\AppData\Roaming\vox-box --host 0.0.0.0 --port 8082

Options

-d, --debug: Enable debug mode.
--host: Host to bind the server to. Default is 0.0.0.0.
--port: Port to bind the server to. Default is 80.
--model: model path.
--device: Binding device, e.g., cuda:0. Default is cpu.
--huggingface-repo-id: Huggingface repo id for the model.
--model-scope-model-id: Model scope model id for the model.
--data-dir: Directory to store downloaded model data. Default is OS specific.

Supported Models

Model	Type	Link	Verified Platforms
Faster-whisper-large-v3	speech-to-text	Hugging Face, ModelScope	Linux ✅, Windows ✅, MacOS ✅
Faster-whisper-large-v2	speech-to-text	Hugging Face, ModelScope	Linux ✅, Windows ✅, MacOS ✅
Faster-whisper-large-v1	speech-to-text	Hugging Face, ModelScope
Faster-whisper-medium	speech-to-text	Hugging Face, ModelScope	Linux ✅, Windows ✅, MacOS ✅
Faster-whisper-medium.en	speech-to-text	Hugging Face, ModelScope
Faster-whisper-small	speech-to-text	Hugging Face, ModelScope	Linux ✅, Windows ✅, MacOS ✅
Faster-whisper-small.en	speech-to-text	Hugging Face, ModelScope
Faster-distil-whisper-large-v3	speech-to-text	Hugging Face, ModelScope	MacOS ✅
Faster-distil-whisper-large-v2	speech-to-text	Hugging Face, ModelScope	MacOS ✅
Faster-distil-whisper-medium.en	speech-to-text	Hugging Face, ModelScope
Faster-whisper-tiny	speech-to-text	Hugging Face, ModelScope
Faster-whisper-tiny.en	speech-to-text	Hugging Face, ModelScope
Paraformer-zh	speech-to-text	Hugging Face, ModelScope
Paraformer-zh-streaming	speech-to-text	Hugging Face, ModelScope	Linux ✅, MacOS ✅
Paraformer-en	speech-to-text	Hugging Face, ModelScope
Conformer-en	speech-to-text	Hugging Face, Modelscope
SenseVoiceSmall	speech-to-text	Hugging Face, ModelScope	Linux ✅, Windows ✅, MacOS ✅
Bark	text-to-speech	Hugging Face	Linux ✅, Windows, MacOS ✅
Bark-small	text-to-speech	Hugging Face	Linux ✅, Windows, MacOS ✅
CosyVoice2-0.5B	text-to-speech	Hugging Face, ModelScope	Linux(ARM not supported) ✅, Windows(Not supported), macOS ✅
CosyVoice-300M-Instruct	text-to-speech	Hugging Face, ModelScope	Linux(ARM not supported) ✅, Windows(Not supported), macOS ✅
CosyVoice-300M-SFT	text-to-speech	Hugging Face, ModelScope	Linux(ARM not supported) ✅, Windows(Not supported), macOS ✅
CosyVoice-300M	text-to-speech	Hugging Face, ModelScope	Linux(ARM not supported) ✅, Windows(Not supported), macOS ✅
CosyVoice-300M-25Hz	text-to-speech	ModelScope	Linux(ARM not supported) ✅, Windows(Not supported), macOS ✅
Dia-1.6B	text-to-speech	Hugging Face, ModelScope	Linux(ARM not supported) ✅, Windows(Not supported), macOS ✅

Supported APIs

Create speech

Endpoint: POST /v1/audio/speech

Generates audio from the input text. Compatible with the OpenAI audio/speech API.

Example Request:

curl http://localhost/v1/audio/speech \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "cosyvoice",
    "input": "Hello world",
    "voice": "English Female"
  }' \
  --output speech.mp3

Response: The audio file content.

Create transcription

Endpoint: POST /v1/audio/transcriptions

Transcribes audio into the input language. Compatible with the OpenAI audio/transcription API.

Example Request:

curl https://localhost/v1/audio/transcriptions \
  -H "Authorization: Bearer $OPENAI_API_KEY" \
  -H "Content-Type: multipart/form-data" \
  -F file="@/path/to/file/audio.mp3" \
  -F model="whisper-large-v3"

Response:

{
  "text": "Hello world."
}

List Models

Endpoint: GET /v1/models

Returns the current running models.

Get Model

Endpoint: GET /v1/models/{model_id}

Returns the current running model.

Get Voices

Endpoint: GET /v1/voices

Returns the supported voice for current running model.

Health Check

Endpoint: GET /health

Returns the heath check result of the Vox Box.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.0.21

Dec 23, 2025

0.0.20

Jul 19, 2025

0.0.19

Jul 18, 2025

0.0.18

Jul 9, 2025

0.0.17

Jun 8, 2025

0.0.16

Jun 3, 2025

0.0.15

May 26, 2025

0.0.14

May 14, 2025

0.0.13

Apr 15, 2025

0.0.12

Apr 14, 2025

0.0.11

Jan 14, 2025

0.0.10

Jan 2, 2025

0.0.9

Dec 12, 2024

0.0.8

Dec 12, 2024

0.0.7

Dec 3, 2024

0.0.6

Dec 2, 2024

0.0.5

Nov 28, 2024

0.0.4

Nov 28, 2024

0.0.3

Nov 27, 2024

0.0.2

Nov 27, 2024

0.0.1

Nov 22, 2024

0.0.0

Nov 28, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vox_box-0.0.21.tar.gz (2.1 MB view details)

Uploaded Dec 23, 2025 Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vox_box-0.0.21-py3-none-manylinux2014_x86_64.whl (2.2 MB view details)

Uploaded Dec 23, 2025 Python 3

vox_box-0.0.21-py3-none-manylinux2014_aarch64.whl (2.2 MB view details)

Uploaded Dec 23, 2025 Python 3

File details

Details for the file vox_box-0.0.21.tar.gz.

File metadata

Download URL: vox_box-0.0.21.tar.gz
Upload date: Dec 23, 2025
Size: 2.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.11.14

File hashes

Hashes for vox_box-0.0.21.tar.gz
Algorithm	Hash digest
SHA256	`941e379243cf3f7f3cd1a095946e0184c69758c67e90306d626c557c71411816`
MD5	`dfaa5df456011d7422d906fb0287b7bd`
BLAKE2b-256	`a29b588fab6951a1ab6f901179ba23b741587314970efec8a0d0f3fb2097c17c`

See more details on using hashes here.

File details

Details for the file vox_box-0.0.21-py3-none-manylinux2014_x86_64.whl.

File metadata

Download URL: vox_box-0.0.21-py3-none-manylinux2014_x86_64.whl
Upload date: Dec 23, 2025
Size: 2.2 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.11.14

File hashes

Hashes for vox_box-0.0.21-py3-none-manylinux2014_x86_64.whl
Algorithm	Hash digest
SHA256	`6da6ab33604e99f5a1b5766a36f5dc7870908cc90e8a8a815f314c471853425c`
MD5	`054fbfb5a5cd52b67c3fff442d65b0ff`
BLAKE2b-256	`725f05390954b1141f4b307bdd96c2eb388bf45722437dc690940a7142972597`

See more details on using hashes here.

File details

Details for the file vox_box-0.0.21-py3-none-manylinux2014_aarch64.whl.

File metadata

Download URL: vox_box-0.0.21-py3-none-manylinux2014_aarch64.whl
Upload date: Dec 23, 2025
Size: 2.2 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.11.14

File hashes

Hashes for vox_box-0.0.21-py3-none-manylinux2014_aarch64.whl
Algorithm	Hash digest
SHA256	`be41621db841dfa5fdf3376f651fbb91006a25ffc36ec4da4be7e10cd5dc535b`
MD5	`a180bc2383474c5e3c1106c346786997`
BLAKE2b-256	`850cb6c41274c274b90d69aa76261c858ce7d07fd96ecbda6229249119a27935`

See more details on using hashes here.

vox-box 0.0.21

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Vox Box

Requirements

Installation

Usage

Options

Supported Models

Supported APIs

Create speech

Create transcription

List Models

Get Model

Get Voices

Health Check

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

File details

File metadata

File hashes