Skip to main content

A small Python transcription helper using OpenAI speech-to-text APIs.

Project description

whisper-smith

PyPI version Python versions License: MIT Docs

whisper-smith is a small Python CLI/app helper for transcribing audio files with OpenAI speech-to-text models.

Features

  • Transcribe local audio files
  • CLI-first workflow for quick terminal use
  • Output as txt, json, srt, or vtt
  • Automatically infer output format from output file extension
  • Load environment variables from .env

Requirements

  • Python 3.10+
  • An OpenAI API key (OPENAI_API_KEY)
  • For large-file fallback: either system ffmpeg in PATH, or Python package imageio-ffmpeg
  • For optional speaker diarization: a Hugging Face token (HUGGINGFACE_TOKEN) and pyannote.audio

Installation

Option 1: uv (recommended)

uv sync

Option 2: pip

pip install -e .

Optional speaker diarization dependencies

uv sync --extra diarize

or:

pip install -e ".[diarize]"

Configuration

Set your API key in the environment or in a .env file:

export OPENAI_API_KEY="your_api_key_here"
export HUGGINGFACE_TOKEN="your_huggingface_token_here"

Or create .env in project root:

OPENAI_API_KEY=your_api_key_here
HUGGINGFACE_TOKEN=your_huggingface_token_here

CLI Usage Guide

Basic command:

whisper-smith <audio_path>

Show help:

whisper-smith --help

1) Print transcript to terminal (default txt)

whisper-smith data/sample.m4a

2) Save transcript to a file

whisper-smith data/sample.m4a --output data/sample.txt

3) Choose output format explicitly

whisper-smith data/sample.m4a --format json --output data/sample.json

Supported CLI formats: txt, json, srt, vtt

4) Let format be inferred from output extension

whisper-smith data/sample.m4a --output data/sample.srt

5) Overwrite existing file

whisper-smith data/sample.m4a --output data/sample.txt --overwrite

6) Run speaker diarization

whisper-smith data/sample.m4a --diarize --output data/sample.diarization.json

Diarization currently supports JSON output only. Optional speaker hints:

whisper-smith data/sample.m4a --diarize --format json --num-speakers 2

7) Create speaker-aligned transcript JSON

Run the full pipeline from one audio file:

whisper-smith data/sample.m4a --align --output data/sample.aligned.json

This writes the main aligned transcript JSON to data/sample.aligned.json and also writes intermediate artifacts beside it:

data/sample.transcript.json
data/sample.diarization.json

To put the intermediate artifacts in a separate directory:

whisper-smith data/sample.m4a --align --output data/sample.aligned.json --artifacts-dir data/artifacts

Python Usage

from pathlib import Path
from whisper_smith.transcribe import transcribe_audio
from whisper_smith.exporters import export_transcript

result = transcribe_audio(Path("data/sample.m4a"))
print(result.text)

srt = export_transcript(result, "srt")
Path("data/sample.srt").write_text(srt, encoding="utf-8")

Speaker diarization

from pathlib import Path
from whisper_smith.diarize import diarize_audio

result = diarize_audio(Path("data/sample.m4a"))

for segment in result.segments:
    print(segment.start, segment.end, segment.speaker)

diarize_audio uses HUGGINGFACE_TOKEN from the environment, or accepts hf_token="..." explicitly.

The default local model is pyannote/speaker-diarization-3.1, which is compatible with the Intel macOS dependency set. You may pass a different model explicitly from Python when running on a newer platform.

Notes

  • If --output is omitted, transcript is printed to stdout.
  • If --format is omitted, format is inferred from --output extension when possible.
  • If an output file already exists, add --overwrite to replace it.
  • Transcription uses a timestamp-capable OpenAI model by default so JSON, SRT, and VTT outputs have segment timestamps.
  • For large audio files, whisper-smith automatically splits audio into chunks and merges transcript text.
  • If diarization fails with torchaudio missing AudioMetaData, refresh the optional diarization dependencies with uv lock --upgrade-package torch --upgrade-package torchaudio and then uv sync --extra diarize.

Development

Run tests:

pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whisper_smith-0.1.0.tar.gz (265.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

whisper_smith-0.1.0-py3-none-any.whl (16.0 kB view details)

Uploaded Python 3

File details

Details for the file whisper_smith-0.1.0.tar.gz.

File metadata

  • Download URL: whisper_smith-0.1.0.tar.gz
  • Upload date:
  • Size: 265.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for whisper_smith-0.1.0.tar.gz
Algorithm Hash digest
SHA256 6bb526bdf346fcdb7bbd47a643593743e5f7c354593f56d5dbaa8a3175fca900
MD5 69a6d44627f7cd866d47ecc9c16067c3
BLAKE2b-256 20899acde67cf6b068a97d6e301655eaa4605010831244b930a2e888bc49a09a

See more details on using hashes here.

File details

Details for the file whisper_smith-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: whisper_smith-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for whisper_smith-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5867364f548bd1cefe9ad8d619f585621ab79130d4064e6fca5e99598c6baa16
MD5 2632c37b91bb4071fd56b16169fdebb7
BLAKE2b-256 c0008387affff53612871fb983b0fd21433e1677e3d5ed3e5a5c516e15401c4d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page