Vox box
Project description
Vox Box
A text-to-speech and speech-to-text server compatible with the OpenAI API, powered by backend support from Whisper, FunASR, Bark, Dia and CosyVoice.
Requirements
- Python 3.10 or greater
- Support Nvidia GPU, requires the following NVIDIA libraries to be installed:
Installation
You can install the project using pip:
pip install vox-box
# For MacOS, you need to manually install `openfst`, `pynini`, and `wetextprocessing` after installing `vox-box` to make `cosyvoice` work:
brew install openfst
export CPLUS_INCLUDE_PATH=$(brew --prefix openfst)/include
export LIBRARY_PATH=$(brew --prefix openfst)/lib
pip install pynini==2.1.6
pip install wetextprocessing==1.0.4.1
Usage
vox-box start --huggingface-repo-id Systran/faster-whisper-small --data-dir ./cache/data-dir --host 0.0.0.0 --port 80
# Windows
vox-box start --huggingface-repo-id Systran/faster-whisper-small --data-dir C:\Users\michelia\AppData\Roaming\vox-box --host 0.0.0.0 --port 8082
Options
- -d, --debug: Enable debug mode.
- --host: Host to bind the server to. Default is 0.0.0.0.
- --port: Port to bind the server to. Default is 80.
- --model: model path.
- --device: Binding device, e.g., cuda:0. Default is cpu.
- --huggingface-repo-id: Huggingface repo id for the model.
- --model-scope-model-id: Model scope model id for the model.
- --data-dir: Directory to store downloaded model data. Default is OS specific.
Supported Models
| Model | Type | Link | Verified Platforms |
|---|---|---|---|
| Faster-whisper-large-v3 | speech-to-text | Hugging Face, ModelScope | Linux ✅, Windows ✅, MacOS ✅ |
| Faster-whisper-large-v2 | speech-to-text | Hugging Face, ModelScope | Linux ✅, Windows ✅, MacOS ✅ |
| Faster-whisper-large-v1 | speech-to-text | Hugging Face, ModelScope | |
| Faster-whisper-medium | speech-to-text | Hugging Face, ModelScope | Linux ✅, Windows ✅, MacOS ✅ |
| Faster-whisper-medium.en | speech-to-text | Hugging Face, ModelScope | |
| Faster-whisper-small | speech-to-text | Hugging Face, ModelScope | Linux ✅, Windows ✅, MacOS ✅ |
| Faster-whisper-small.en | speech-to-text | Hugging Face, ModelScope | |
| Faster-distil-whisper-large-v3 | speech-to-text | Hugging Face, ModelScope | MacOS ✅ |
| Faster-distil-whisper-large-v2 | speech-to-text | Hugging Face, ModelScope | MacOS ✅ |
| Faster-distil-whisper-medium.en | speech-to-text | Hugging Face, ModelScope | |
| Faster-whisper-tiny | speech-to-text | Hugging Face, ModelScope | |
| Faster-whisper-tiny.en | speech-to-text | Hugging Face, ModelScope | |
| Paraformer-zh | speech-to-text | Hugging Face, ModelScope | |
| Paraformer-zh-streaming | speech-to-text | Hugging Face, ModelScope | Linux ✅, MacOS ✅ |
| Paraformer-en | speech-to-text | Hugging Face, ModelScope | |
| Conformer-en | speech-to-text | Hugging Face, Modelscope | |
| SenseVoiceSmall | speech-to-text | Hugging Face, ModelScope | Linux ✅, Windows ✅, MacOS ✅ |
| Bark | text-to-speech | Hugging Face | Linux ✅, Windows, MacOS ✅ |
| Bark-small | text-to-speech | Hugging Face | Linux ✅, Windows, MacOS ✅ |
| CosyVoice2-0.5B | text-to-speech | Hugging Face, ModelScope | Linux(ARM not supported) ✅, Windows(Not supported), macOS ✅ |
| CosyVoice-300M-Instruct | text-to-speech | Hugging Face, ModelScope | Linux(ARM not supported) ✅, Windows(Not supported), macOS ✅ |
| CosyVoice-300M-SFT | text-to-speech | Hugging Face, ModelScope | Linux(ARM not supported) ✅, Windows(Not supported), macOS ✅ |
| CosyVoice-300M | text-to-speech | Hugging Face, ModelScope | Linux(ARM not supported) ✅, Windows(Not supported), macOS ✅ |
| CosyVoice-300M-25Hz | text-to-speech | ModelScope | Linux(ARM not supported) ✅, Windows(Not supported), macOS ✅ |
| Dia-1.6B | text-to-speech | Hugging Face, ModelScope | Linux(ARM not supported) ✅, Windows(Not supported), macOS ✅ |
Supported APIs
Create speech
Endpoint: POST /v1/audio/speech
Generates audio from the input text. Compatible with the OpenAI audio/speech API.
Example Request:
curl http://localhost/v1/audio/speech \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "cosyvoice",
"input": "Hello world",
"voice": "English Female"
}' \
--output speech.mp3
Response: The audio file content.
Create transcription
Endpoint: POST /v1/audio/transcriptions
Transcribes audio into the input language. Compatible with the OpenAI audio/transcription API.
Example Request:
curl https://localhost/v1/audio/transcriptions \
-H "Authorization: Bearer $OPENAI_API_KEY" \
-H "Content-Type: multipart/form-data" \
-F file="@/path/to/file/audio.mp3" \
-F model="whisper-large-v3"
Response:
{
"text": "Hello world."
}
List Models
Endpoint: GET /v1/models
Returns the current running models.
Get Model
Endpoint: GET /v1/models/{model_id}
Returns the current running model.
Get Voices
Endpoint: GET /v1/voices
Returns the supported voice for current running model.
Health Check
Endpoint: GET /health
Returns the heath check result of the Vox Box.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vox_box-0.0.21.tar.gz.
File metadata
- Download URL: vox_box-0.0.21.tar.gz
- Upload date:
- Size: 2.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
941e379243cf3f7f3cd1a095946e0184c69758c67e90306d626c557c71411816
|
|
| MD5 |
dfaa5df456011d7422d906fb0287b7bd
|
|
| BLAKE2b-256 |
a29b588fab6951a1ab6f901179ba23b741587314970efec8a0d0f3fb2097c17c
|
File details
Details for the file vox_box-0.0.21-py3-none-manylinux2014_x86_64.whl.
File metadata
- Download URL: vox_box-0.0.21-py3-none-manylinux2014_x86_64.whl
- Upload date:
- Size: 2.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6da6ab33604e99f5a1b5766a36f5dc7870908cc90e8a8a815f314c471853425c
|
|
| MD5 |
054fbfb5a5cd52b67c3fff442d65b0ff
|
|
| BLAKE2b-256 |
725f05390954b1141f4b307bdd96c2eb388bf45722437dc690940a7142972597
|
File details
Details for the file vox_box-0.0.21-py3-none-manylinux2014_aarch64.whl.
File metadata
- Download URL: vox_box-0.0.21-py3-none-manylinux2014_aarch64.whl
- Upload date:
- Size: 2.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
be41621db841dfa5fdf3376f651fbb91006a25ffc36ec4da4be7e10cd5dc535b
|
|
| MD5 |
a180bc2383474c5e3c1106c346786997
|
|
| BLAKE2b-256 |
850cb6c41274c274b90d69aa76261c858ce7d07fd96ecbda6229249119a27935
|