Skip to main content

A Speech-to-Text API.

Project description

speech-recognition-inference

This repo provides a command-line tool for performing automatic speech-to-text tasks (i.e., "transcription") using open source models from Hugging Face Hub. For interactive tasks, it allows users to spin up an inference API. For bulk processing, it furnishes a pipeline for running inference on the contents of a specified directory.

Quickstart

Pip

First, install ffmpeg if it is not already installed on your machine.

Next, clone the repository and install it:

git clone https://github.com/princeton-ddss/speech-recognition-inference.git
cd speech-recognition-inference
python -m venv .venv
pip install --upgrade pip
pip install . # or pip install -e . for development

To start an API from the command-line, simply run:

speech_recognition launch \
  --port 8000:8000 \
  --model-id openai/whisper-tiny \
  --model-dir $HOME/.cache/huggingface/hub

Once the application startup is complete, you can submit requests using any HTTP request library or tool, e.g.,

curl localhost:8000/transcribe \
  -X POST \
  -d '{"audio_file": "/tmp/female.wav", "response_format": "json"}' \
  -H 'Content-Type: application/json'

To run batch processing, run:

speech_recognition pipeline /data/audio \
  --model-id openai/whisper-tiny \
  --model-dir $HOME/.cache/huggingface/hub

Docker

We also provide a Docker image, ghcr.io/princeton-ddss/speech-recognition-inference. To run the API via Docker use the following command:

docker run \
  -p 8000:8000 \
  -v $HOME/.cache/huggingface/hub:/data/models \
  -v /tmp:/data/audio \
  ghcr.io/princeton-ddss/speech-recognition-inference:latest \
  launch \
  --port 8000 \
  --model_id openai/whisper-large-v3 \
  --model_dir /data/models

This command makes the API available on localhost:8000 and makes host model and audio files available to the container via bind mounting. Sending requests is the same as above, but note that the container only has access to bind mounted files. Above, this means that requests should replace /tmp with /data/audio:

curl localhost:8000/transcribe \
  -X POST \
  -d '{"audio_file": "/data/audio/female.wav", "response_format": "json"}' \
  -H 'Content-Type: application/json'

To run batch processing via Docker, replace the launch command with pipeline and update the options:

docker run \
  -v $HOME/.cache/huggingface/hub:/data/models \
  -v /tmp:/data/audio \
  ghcr.io/princeton-ddss/speech-recognition-inference:latest \
  pipeline \
  /data/audio \
  --model_id openai/whisper-large-v3 \
  --model_dir /data/models

Again, note that /tmp is bound to /data/audio on the host, so this command runs inference on all audio files in /tmp on the host.

Detailed Usage

Full usage details are available via the --help option. For example,

 speech_recognition --help

 Usage: speech_recognition [OPTIONS] COMMAND [ARGS]...

╭─ Options ────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ --install-completion          Install completion for the current shell.                                                                          │
│ --show-completion             Show completion for the current shell, to copy it or customize the installation.                                   │
│ --help                        Show this message and exit.                                                                                        │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯
╭─ Commands ───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╮
│ launch     Launch a speech-to-text API.                                                                                                          │
│ pipeline   Perform batch speech-to-text inference.                                                                                               │
╰──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────╯

Environment Variables

In addition to specifying settings at the command line, some settings can be provided through environment variables. These settings are:

  • SRI_HOST - Host to use for the API.
  • SRI_PORT - Port to use for the API.
  • SRI_TOKEN - Authentication token to use for authenticated API access.
  • HF_ACCESS_TOKEN - Authentication token to use for authentication with the Hugging Face Hub API.

Users can set environment variables by exporting or via a .env file. Command line arguments always take precedence over environment variables.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speech_recognition_inference-0.2.0.tar.gz (131.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

speech_recognition_inference-0.2.0-py3-none-any.whl (14.5 kB view details)

Uploaded Python 3

File details

Details for the file speech_recognition_inference-0.2.0.tar.gz.

File metadata

File hashes

Hashes for speech_recognition_inference-0.2.0.tar.gz
Algorithm Hash digest
SHA256 278286483b822832558112b0ac4c6523b9befe491e3a09ab3dbf126102a368e7
MD5 0d8c9a6a8d1b3a93786ab851e76d6688
BLAKE2b-256 d8cc272da91a8fc85d5a108994e3b7f7e8b7520364430615b53d03e6b3d4e780

See more details on using hashes here.

File details

Details for the file speech_recognition_inference-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for speech_recognition_inference-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 396f2a9a0970b1085d0341bb61e2765dc6eed9d61029824921415a918c889670
MD5 0f3be15b756e55926c94b70138205bb3
BLAKE2b-256 7012db493711cd8bd5f7497ffadc4a5686b5a450a0ccdadf525620497537e427

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page