A small Python transcription helper using OpenAI speech-to-text APIs.

These details have not been verified by PyPI

Project links

Project description

whisper-smith

whisper-smith is a small Python CLI/app helper for transcribing audio files with OpenAI speech-to-text models.

Features

Transcribe local audio files
CLI-first workflow for quick terminal use
Output as txt, json, srt, or vtt
Automatically infer output format from output file extension
Load environment variables from .env

Run on Google Colab (free GPU)

No local setup needed. Open the notebook directly in Colab and run the full speaker-aligned transcript pipeline on a free T4 GPU:

The notebook covers: install → set API keys → upload audio → run pipeline → download result. Use a GPU runtime for the best diarization performance. The notebook also includes an advanced GPU pipeline for explicitly moving the pyannote model to CUDA.

Requirements

Python 3.10+
An OpenAI API key (OPENAI_API_KEY)
For large-file fallback: either system ffmpeg in PATH, or Python package imageio-ffmpeg
For optional speaker diarization: a Hugging Face token (HUGGINGFACE_TOKEN) and pyannote.audio

Installation

Option 1: uv (recommended)

uv sync

Option 2: pip

pip install -e .

Optional speaker diarization dependencies

uv sync --extra diarize

or:

pip install -e ".[diarize]"

Configuration

Set your API key in the environment or in a .env file:

export OPENAI_API_KEY="your_api_key_here"
export HUGGINGFACE_TOKEN="your_huggingface_token_here"

Or create .env in project root:

OPENAI_API_KEY=your_api_key_here
HUGGINGFACE_TOKEN=your_huggingface_token_here

CLI Usage Guide

Basic command:

whisper-smith <audio_path>

Show help:

whisper-smith --help

1) Print transcript to terminal (default `txt`)

whisper-smith data/sample.m4a

2) Save transcript to a file

whisper-smith data/sample.m4a --output data/sample.txt

3) Choose output format explicitly

whisper-smith data/sample.m4a --format json --output data/sample.json

Supported CLI formats: txt, json, srt, vtt

4) Let format be inferred from output extension

whisper-smith data/sample.m4a --output data/sample.srt

5) Overwrite existing file

whisper-smith data/sample.m4a --output data/sample.txt --overwrite

6) Run speaker diarization

whisper-smith data/sample.m4a --diarize --output data/sample.diarization.json

Diarization currently supports JSON output only. Optional speaker hints:

whisper-smith data/sample.m4a --diarize --format json --num-speakers 2

7) Create speaker-aligned transcript JSON

Run the full pipeline from one audio file:

whisper-smith data/sample.m4a --align --output data/sample.aligned.json

This writes the main aligned transcript JSON to data/sample.aligned.json and also writes intermediate artifacts beside it:

data/sample.transcript.json
data/sample.diarization.json

To put the intermediate artifacts in a separate directory:

whisper-smith data/sample.m4a --align --output data/sample.aligned.json --artifacts-dir data/artifacts

Python Usage

from pathlib import Path
from whisper_smith.transcribe import transcribe_audio
from whisper_smith.exporters import export_transcript

result = transcribe_audio(Path("data/sample.m4a"))
print(result.text)

srt = export_transcript(result, "srt")
Path("data/sample.srt").write_text(srt, encoding="utf-8")

Speaker diarization

from pathlib import Path
from whisper_smith.diarize import diarize_audio

result = diarize_audio(Path("data/sample.m4a"))

for segment in result.segments:
    print(segment.start, segment.end, segment.speaker)

diarize_audio uses HUGGINGFACE_TOKEN from the environment, or accepts hf_token="..." explicitly.

The default local model is pyannote/speaker-diarization-3.1, which is compatible with the Intel macOS dependency set. You may pass a different model explicitly from Python or with --diarization-model when running on a newer platform.

Notes

If --output is omitted, transcript is printed to stdout.
If --format is omitted, format is inferred from --output extension when possible.
If an output file already exists, add --overwrite to replace it.
Transcription uses a timestamp-capable OpenAI model by default so JSON, SRT, and VTT outputs have segment timestamps.
For large audio files, whisper-smith automatically splits audio into chunks and merges transcript text.
If diarization fails with torchaudio missing AudioMetaData, refresh the optional diarization dependencies with uv lock --upgrade-package torch --upgrade-package torchaudio and then uv sync --extra diarize.

Development

Run tests:

pytest

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.1

Jun 10, 2026

This version

0.2.0

Jun 9, 2026

0.1.0

Jun 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

whisper_smith-0.2.0.tar.gz (296.0 kB view details)

Uploaded Jun 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

whisper_smith-0.2.0-py3-none-any.whl (16.4 kB view details)

Uploaded Jun 9, 2026 Python 3

File details

Details for the file whisper_smith-0.2.0.tar.gz.

File metadata

Download URL: whisper_smith-0.2.0.tar.gz
Upload date: Jun 9, 2026
Size: 296.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for whisper_smith-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`994cfab7dc98a4565542e9db7ded92b60078c354c4add65a5b58aa3f6fb6c157`
MD5	`3e2a4a6f80a60f4a0f62f8bbf2d6e0f1`
BLAKE2b-256	`67f64b8455f9d3a302a0c191319e4b0d0d5546c9ef2bb34878aa2857b91d7b31`

See more details on using hashes here.

File details

Details for the file whisper_smith-0.2.0-py3-none-any.whl.

File metadata

Download URL: whisper_smith-0.2.0-py3-none-any.whl
Upload date: Jun 9, 2026
Size: 16.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for whisper_smith-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`925ef38870e62678c7fc707f787c3345b913792db968af92a806a0e588bc34c8`
MD5	`86f0a4755efd10e10e638b8da09d5d0d`
BLAKE2b-256	`f7c387d4a5ec4af775ea3635bdb17bd025faf5fec7083686e17ed730704314c2`

See more details on using hashes here.

whisper-smith 0.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

whisper-smith

Features

Run on Google Colab (free GPU)

Requirements

Installation

Option 1: uv (recommended)

Option 2: pip

Optional speaker diarization dependencies

Configuration

CLI Usage Guide

1) Print transcript to terminal (default txt)

2) Save transcript to a file

3) Choose output format explicitly

4) Let format be inferred from output extension

5) Overwrite existing file

6) Run speaker diarization

7) Create speaker-aligned transcript JSON

Python Usage

Speaker diarization

Notes

Development

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

1) Print transcript to terminal (default `txt`)