A Speech-to-Text API.

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3 :: Only
- Python :: 3.11

Project description

speech-recognition-inference

This repo provides a command-line tool for performing automatic speech-to-text tasks (i.e., "transcription") using open source models from Hugging Face Hub. For interactive tasks, it allows users to spin up an inference API. For bulk processing, it furnishes a pipeline for running inference on the contents of a specified directory.

Quickstart

Pip

First, install ffmpeg if it is not already installed on your machine.

Next, clone the repository and install it:

git clone https://github.com/princeton-ddss/speech-recognition-inference.git
cd speech-recognition-inference
python -m venv .venv
pip install --upgrade pip
pip install . # or pip install -e . for development

To start an API from the command-line, simply run:

speech_recognition launch \
  --port 8000:8000 \
  --model-id openai/whisper-tiny \
  --model-dir $HOME/.cache/huggingface/hub

Once the application startup is complete, you can submit requests using any HTTP request library or tool, e.g.,

curl localhost:8000/transcribe \
  -X POST \
  -d '{"audio_file": "/tmp/female.wav", "response_format": "json"}' \
  -H 'Content-Type: application/json'

To run batch processing, run:

speech_recognition pipeline /data/audio \
  --model-id openai/whisper-tiny \
  --model-dir $HOME/.cache/huggingface/hub

Docker

We also provide a Docker image, ghcr.io/princeton-ddss/speech-recognition-inference. To run the API via Docker use the following command:

docker run \
  -p 8000:8000 \
  -v $HOME/.cache/huggingface/hub:/data/models \
  -v /tmp:/data/audio \
  ghcr.io/princeton-ddss/speech-recognition-inference:latest \
  launch \
  --port 8000 \
  --model_id openai/whisper-large-v3 \
  --model_dir /data/models

This command makes the API available on localhost:8000 and makes host model and audio files available to the container via bind mounting. Sending requests is the same as above, but note that the container only has access to bind mounted files. Above, this means that requests should replace /tmp with /data/audio:

curl localhost:8000/transcribe \
  -X POST \
  -d '{"audio_file": "/data/audio/female.wav", "response_format": "json"}' \
  -H 'Content-Type: application/json'

To run batch processing via Docker, replace the launch command with pipeline and update the options:

docker run \
  -v $HOME/.cache/huggingface/hub:/data/models \
  -v /tmp:/data/audio \
  ghcr.io/princeton-ddss/speech-recognition-inference:latest \
  pipeline \
  /data/audio \
  --model_id openai/whisper-large-v3 \
  --model_dir /data/models

Again, note that /tmp is bound to /data/audio on the host, so this command runs inference on all audio files in /tmp on the host.

Detailed Usage

Full usage details are available via the --help option. For example,

❯ speech_recognition --help

 Usage: speech_recognition [OPTIONS] COMMAND [ARGS]...

╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --install-completion          Install completion for the current shell.                                                                          │
│ --show-completion             Show completion for the current shell, to copy it or customize the installation.                                   │
│ --help                        Show this message and exit.                                                                                        │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ launch     Launch a speech-to-text API.                                                                                                          │
│ pipeline   Perform batch speech-to-text inference.                                                                                               │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Environment Variables

In addition to specifying settings at the command line, some settings can be provided through environment variables. These settings are:

SRI_HOST - Host to use for the API.
SRI_PORT - Port to use for the API.
SRI_TOKEN - Authentication token to use for authenticated API access.
HF_ACCESS_TOKEN - Authentication token to use for authentication with the Hugging Face Hub API.

Users can set environment variables by exporting or via a .env file. Command line arguments always take precedence over environment variables.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3 :: Only
- Python :: 3.11

Release history Release notifications | RSS feed

This version

0.2.0

May 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speech_recognition_inference-0.2.0.tar.gz (131.8 kB view details)

Uploaded May 2, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

speech_recognition_inference-0.2.0-py3-none-any.whl (14.5 kB view details)

Uploaded May 2, 2025 Python 3

File details

Details for the file speech_recognition_inference-0.2.0.tar.gz.

File metadata

Download URL: speech_recognition_inference-0.2.0.tar.gz
Upload date: May 2, 2025
Size: 131.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.6

File hashes

Hashes for speech_recognition_inference-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`278286483b822832558112b0ac4c6523b9befe491e3a09ab3dbf126102a368e7`
MD5	`0d8c9a6a8d1b3a93786ab851e76d6688`
BLAKE2b-256	`d8cc272da91a8fc85d5a108994e3b7f7e8b7520364430615b53d03e6b3d4e780`

See more details on using hashes here.

File details

Details for the file speech_recognition_inference-0.2.0-py3-none-any.whl.

File metadata

Download URL: speech_recognition_inference-0.2.0-py3-none-any.whl
Upload date: May 2, 2025
Size: 14.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.6.6

File hashes

Hashes for speech_recognition_inference-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`396f2a9a0970b1085d0341bb61e2765dc6eed9d61029824921415a918c889670`
MD5	`0f3be15b756e55926c94b70138205bb3`
BLAKE2b-256	`7012db493711cd8bd5f7497ffadc4a5686b5a450a0ccdadf525620497537e427`

See more details on using hashes here.

speech-recognition-inference 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

speech-recognition-inference

Quickstart

Pip

Docker

Detailed Usage

Environment Variables

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes