Skip to main content

Speech to Text (s2t): Record audio, run Whisper, export formats, and copy transcript to clipboard.

Project description

s2t

Record audio from your microphone, run Whisper to transcribe it, export common formats, and optionally copy the transcript to your clipboard.

Install

Two distinct paths — pick the one matching your role. Do not mix them.

  • Use s2t from anywhere in the terminal (recommended for end users): install via pipx, which keeps s2t in its own isolated venv and exposes it on your PATH.
    • From PyPI: pipx install s2t
    • From a local wheel built in this checkout: make install-pipx
  • Develop on the working tree: install editable into the project .venv (not on PATH outside the project; intentional).
    • make setup (creates .venv and installs dev extras), then either source .venv/bin/activate or call .venv/bin/s2t directly.

Rule of thumb: development = project .venv, daily use = pipx. The two installs coexist cleanly because pipx lives in ~/.local/bin/s2t and the dev install lives in ./.venv/bin/s2t; whichever wins on PATH is determined by whether the project venv is active.

Requirements: Python 3.11–3.12. No mandatory external binaries. ffmpeg is optional (only for MP3 encoding/decoding).

System requirements (Linux)

  • Some environments need system libraries for audio I/O:
    • Debian/Ubuntu: sudo apt-get install libportaudio2 libsndfile1
    • Fedora/RHEL: sudo dnf install portaudio libsndfile
  • Optional for MP3: ffmpeg (sudo apt-get install ffmpeg or brew install ffmpeg).
  • Optional backends:
    • faster-whisper (CTranslate2): pip install faster-whisper (GPU via CUDA on NVIDIA; CPU works well with int8).
    • whisper.cpp (Metal/CPU): pip install whispercpp (requires local gguf models; experimental GPU on Apple varies by build).

Usage

  • Start interactive recording and transcribe:
    • s2t
  • Short options:
    • Language: -l de (long: --lang de)
    • Model: -m large-v3 (long: --model large-v3)
    • Backend: --backend whisper|faster|whispercpp (default: whisper)
    • Device: --device auto|cpu|cuda|mps (default: auto)
    • Sample rate: -r 48000 (long: --rate 48000)
    • Channels: -c 2 (long: --channels 2)
    • Output dir: -o transcripts (long: --outdir transcripts) — default is transcripts/ if omitted
    • Translate to English: -t (long: --translate). You may still provide --lang as an input-language hint if you want.
    • List available models and exit: -L (long: --list-models)
    • Recording format: -f flac|wav|mp3 (long: --recording-format), default flac. MP3 requires ffmpeg; if absent, it falls back to FLAC with a warning.
    • Note: There is no minimum chunk duration; cuts are chosen at the longest pause within the window.
    • Observation window (for block-based splitting): -b 20.0 or --buffer-sec 20.0 (default 20.0). Cuts at the longest pause within each window.
    • Chunk segmentation: by default each recorded chunk becomes one Whisper segment; pass --no-chunk-segmentation to keep Whisper's native segmentation per chunk.
    • Written prompt: -p "TEXT" or --prompt="TEXT" — pass a text prompt to guide Whisper's transcription (terminology, spelling, style). Use the = syntax for prompts starting with a dash. Just -p / --prompt (without argument) asks interactively.
    • Spoken prompt: --speak-prompt — speak your prompt first, then press SPACE to use it as prompt and continue with your main content. If you press ENTER instead of SPACE, no prompt is used; the spoken audio is transcribed as normal payload and the session ends.
    • Keep chunk files: --keep-chunks — by default, per‑chunk audio and per‑chunk Whisper outputs are deleted after the final merge.
    • Open transcript for editing: -e (long: --edit) — opens the generated .txt in your shell editor ($VISUAL/$EDITOR).
  • Examples:
    • Transcribe in German using large-v3: s2t -l de -m large-v3
    • Translate any input to English: s2t -t
    • Write outputs under transcripts/: s2t -o transcripts
    • List local model names: s2t -L

Interactive Controls

  • Key bindings (while recording)
    • ENTER: Split now (manual cut). Ends the current segment immediately.
    • Q (or q): Finish the session and process final outputs.
    • SPACE: Toggle pause/resume. On pause, the current buffer is drained (single best cut), then a PAUSED marker is shown.
    • c (lowercase): Copy the recent source-language transcript to the clipboard since the last c or C action. Prints a visible console marker.
    • C (uppercase): Copy the full source-language transcript (since the beginning) to the clipboard. Prints a distinct console marker.
    • t (lowercase): Copy the recent translated transcript (e.g., English when using -t) since the last t or T action.
    • T (uppercase): Copy the full translated transcript (since the beginning). Requires translation mode (-t or --translate-to).
  • Written prompt (-p/--prompt)
    • Pass a text prompt to guide Whisper (e.g. s2t --prompt="Dr. Müller, Angiographie"), or just s2t --prompt to type it interactively.
  • Spoken prompt (--speak-prompt)
    • Speak your prompt first, then press ENTER. The app waits until your prompt is transcribed, prints a separator, and then you start speaking your main content.

Segmentation Behavior

  • Windowed splitting (default): The recorder analyzes a sliding window of length --buffer-sec (default 20 seconds) and cuts at the longest detected pause.
    • If no suitable pause is found within the window, a hard cut occurs at the window boundary.
    • A small audio overlap (--overlap-ms, default 200) is applied between consecutive segments to avoid trimming syllables at cut points.

Outputs are written into a timestamped folder under the chosen output directory (default is transcripts/), e.g. transcripts/2025-01-31T14-22-05+0200/, containing:

  • Per‑chunk outputs: chunk_####.flac/.wav plus chunk_####.txt/.srt/.vtt/.tsv/.json (deleted by default unless --keep-chunks)
  • Final outputs: transcription.flac/.wav (and transcription.mp3 if requested and ffmpeg available), plus transcription.txt/.srt/.vtt/.tsv/.json
    • Transcript is written to .txt; clipboard copying is optional and disabled by default.

Auto-splitting details

  • ENTER splits immediately; Q finishes the recording.
  • Windowed: cuts at the longest pause within the selected window (fallback: window boundary).
  • There is no fixed minimum duration per chunk.

Makefile (optional)

  • Setup venv + dev deps: make setup
  • Lint/format/test: make lint, make format, make test; combined gate: make check
  • Build sdist/wheel: make build (runs check first)
  • Publish to PyPI/TestPyPI: make publish, make publish-test (run after build)
  • Install current working tree as a global CLI via pipx: make install-pipx
  • Verify the published PyPI release via pipx: make verify-pypi
  • Both targets use the project venv's pipx (declared in [dev] extras and auto-installed if missing) — no system-level pipx required for development.
  • Run CLI: make record ARGS='-l de -t -o transcripts'
  • List models: make list-models
  • Show package version: make version

Notes on models

  • The local openai-whisper CLI supports models like: tiny, base, small, medium, large-v1, large-v2, large-v3 and their .en variants.
  • The name turbo refers to OpenAI’s hosted model family and is not provided by the local whisper CLI. If you pass -m turbo, the command may fail; choose a supported local model instead.

Development & Release

  • For developer setup and contribution guidelines, see CONTRIBUTING.md.
  • For the release process, see docs/RELEASING.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

s2t-0.2.8.tar.gz (45.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

s2t-0.2.8-py3-none-any.whl (32.5 kB view details)

Uploaded Python 3

File details

Details for the file s2t-0.2.8.tar.gz.

File metadata

  • Download URL: s2t-0.2.8.tar.gz
  • Upload date:
  • Size: 45.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for s2t-0.2.8.tar.gz
Algorithm Hash digest
SHA256 d461a8ef7eb7c3114074c6e77d22ac9dc8f98d16da14ec7f9012dfcc3cefc8e1
MD5 ebd10ce316333f11a8ffe06f2d7fb440
BLAKE2b-256 54ab90c0106869901c97ac044de13cc65fddec78b0fe6a228ee5ed91b9bf6f66

See more details on using hashes here.

File details

Details for the file s2t-0.2.8-py3-none-any.whl.

File metadata

  • Download URL: s2t-0.2.8-py3-none-any.whl
  • Upload date:
  • Size: 32.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for s2t-0.2.8-py3-none-any.whl
Algorithm Hash digest
SHA256 9142aedde5a1aecdd43e4468bc95202c1638b6598ed91269ff495f49b114a0a4
MD5 a7d93818794839606129dd6b670846c1
BLAKE2b-256 13924de8924130c7c1474c24ddcd6c25bff8f56a27893cc104e716b1d32db007

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page