Skip to main content

Portable, resumable, multi-backend Whisper transcription — runs anywhere, resumes after crashes.

Project description

ScribeFlow — Whisper transcription everywhere

English | Türkçe

Local CPU/GPU, Apple Silicon, or Google Colab. Input from a file, folder, Google Drive, or URL. ScribeFlow auto-selects the right model for the hardware it finds, writes durable checkpoints as it goes, and resumes cleanly after a crash — no duplicated or corrupted output.

PyPI version Python 3.10–3.12 License: Apache-2.0 CI status Downloads

Linux macOS Google Colab


Demo

ScribeFlow transcribing a lecture and resuming after a crash


Install

Base install is intentionally small — the pure-Python engine plus the default faster-whisper backend, which runs on CPU out of the box:

pip install scribeflow

ffmpeg is the one required system dependency:

# macOS (Homebrew)
brew install ffmpeg

# Debian / Ubuntu
sudo apt-get install -y ffmpeg

# Fedora
sudo dnf install -y ffmpeg

# Windows (winget)
winget install Gyan.FFmpeg

Optional extras layer in heavier backends, the web UI, and remote sources:

pip install 'scribeflow[gpu]'      # torch + CUDA (documented, not hard-pinned)
pip install 'scribeflow[cpp]'      # whisper.cpp via pywhispercpp — Apple-Silicon Metal path
pip install 'scribeflow[openai]'   # openai-whisper (PyTorch reference baseline)
pip install 'scribeflow[web]'      # FastAPI web UI: scribeflow web
pip install 'scribeflow[url]'      # yt-dlp — transcribe straight from a URL
pip install 'scribeflow[drive]'    # Google Drive API source
pip install 'scribeflow[dev]'      # pytest + ruff + mypy + nbformat

From a clone (editable, with the dev toolchain):

git clone https://github.com/htahaozlu/scribeflow
cd scribeflow
pip install -e '.[dev]'

Run without installing (the npx equivalent)

ScribeFlow is a Python CLI — there is no npm/npx; the equivalents are pipx and uv. Once published to PyPI:

pipx install scribeflow                 # isolated global install
uvx scribeflow transcribe lecture.mp4   # run once, no install (like npx)

Before the PyPI release you can run it straight from a clone:

pipx install .            # from the cloned repo

Homebrew (macOS) — planned

After the first PyPI release, a tap will provide a one-liner:

brew install htahaozlu/tap/scribeflow   # planned (post-PyPI)

Note — the [gpu] extra documents the torch + CUDA path but does not hard-pin a CUDA build, so you stay in control of the wheel that matches your driver. See docs/CONFIG.md for the recommended install line.


Quickstart

pip install scribeflow                       # base install (CPU-capable)

scribeflow doctor                            # check ffmpeg / device / backends
scribeflow transcribe ./lecture.mp4          # auto-selects backend + model for this host
scribeflow transcribe ./lecture.mp4 --format srt,vtt

That's it. The .txt transcript is always written; --format adds subtitle and JSON outputs. If the run is interrupted, re-run the same command and ScribeFlow picks up from the last completed chunk.


What it does

ScribeFlow is a single tool that gives you the same transcription pipeline everywhere:

  • Runs anywhere — local CPU or NVIDIA GPU, Apple Silicon (Metal via whisper.cpp), or Google Colab — and adapts to the hardware it detects.
  • Input from anywhere — a local file or folder, a URL (yt-dlp), a mounted Google Drive path, or an upload through the web UI.
  • Auto-selects the model for the detected hardware, with sensible defaults tuned for quality (global default: large-v3-turbo).
  • Crash-safe and resumable — durable per-chunk checkpoints mean an interrupted run resumes from where it stopped, with no duplicated or corrupted output.
  • Honest scope — local models only (no cloud APIs in v1); transcripts are best-effort ASR, not verbatim legal records.

Backends & hardware

Every backend normalizes to one output shape, so you can swap them without changing your workflow. ScribeFlow auto-selects based on the host; you can always override with --backend, --model, --device, and --compute-type.

Backend Best for Install Notes
faster-whisper CPU and NVIDIA CUDA (the default) base install CUDA → float16 (≥8 GB VRAM) or int8_float16; CPU → int8.
whispercpp Apple Silicon — Metal GPU pip install 'scribeflow[cpp]' Needs a whisper-cli binary + a ggml model (see env vars below).
openai-whisper PyTorch reference baseline pip install 'scribeflow[openai]' The reference implementation; slower, useful for comparison.

Hardware auto-select rules:

  • Apple Silicon → whisper.cpp on Metal when its binary is available, otherwise faster-whisper CPU int8. ScribeFlow never offers cuda/mps to faster-whisper on macOS-arm64 — that path doesn't exist, so it isn't pretended.
  • CUDAfloat16 for ≥8 GB VRAM, otherwise int8_float16.
  • CPUint8.
  • It never auto-selects tiny/base/distil for Turkish; the global default is large-v3-turbo.

To enable the Apple-Silicon Metal path, point ScribeFlow at your whisper.cpp binary and ggml models:

export SCRIBEFLOW_WHISPERCPP_BIN=/path/to/whisper-cli
export SCRIBEFLOW_WHISPERCPP_MODELS=/path/to/ggml-models

See your host's pick at any time:

scribeflow models           # lists the catalog + this host's auto-pick
scribeflow doctor           # ffmpeg / device / VRAM / RAM / backends checklist

Usage

scribeflow transcribe <source> [options]
scribeflow models      [--want default|speed|quality] [--json] [--ui-lang en|tr]
scribeflow doctor      [--json] [--ui-lang en|tr]
scribeflow gen-notebook <source> -o nb.ipynb [options]
scribeflow web         [--host 127.0.0.1] [--port 8000]
scribeflow --version

scribeflow transcribe

The source is a local file/folder, a http(s):// URL, or a drive: path. The kind is inferred from the argument; override it with --source-kind.

Key flags:

Flag Purpose
--backend faster-whisper · whispercpp · openai-whisper
--model Override the auto-selected model (default: large-v3-turbo).
--device / --compute-type cpu · cuda; e.g. float16, int8, int8_float16.
--want default|speed|quality Bias the auto-pick toward speed or quality.
--language / -l Audio language (Turkish tr by default; auto to detect).
--chunk-minutes Chunk length for checkpointing (default 20).
--beam-size Decoder beam width (Turkish default: 5).
--format txt,srt,vtt,json (comma-separated; txt is always written).
--out Durable output dir (transcripts + checkpoints).
--workspace Scratch dir for heavy audio-chunk I/O.
--cache-dir Model download cache.
--runtime auto|local|colab Execution target (owns the scratch-vs-durable split).
--source-kind local|url|drive|upload Force the source kind instead of inferring it.
--overwrite Discard any existing run and start fresh.
--config FILE Path to a scribeflow.toml.
--json Machine-readable JSON output.
--ui-lang / --lang en|tr Interface language (separate from --language).

Examples:

# A whole folder, Turkish, with subtitles
scribeflow transcribe ./lectures/ --format srt,vtt

# A URL (needs the [url] extra), auto-detect language, quality bias
scribeflow transcribe "https://example.com/talk.mp4" -l auto --want quality

# Force a backend/model on capable hardware
scribeflow transcribe ./talk.wav --backend faster-whisper --model large-v3 --device cuda --compute-type float16

# Split scratch vs. durable storage explicitly
scribeflow transcribe ./lecture.mp4 --out ./out --workspace /tmp/scribeflow-scratch

Web UI

With the [web] extra installed, launch a small FastAPI app to upload media and transcribe from the browser:

pip install 'scribeflow[web]'
scribeflow web                                   # http://127.0.0.1:8000
scribeflow web --host 0.0.0.0 --port 8080 --out ./out --workspace /tmp/scribeflow-scratch

Colab

scribeflow gen-notebook emits a runnable .ipynb that mounts Drive, pip-installs ScribeFlow, transcribes, and resumes — top-to-bottom, no editing required:

scribeflow gen-notebook ./lecture.mp4 -o scribeflow_colab.ipynb
scribeflow gen-notebook "https://example.com/talk.mp4" -o talk.ipynb        # url extra auto-wired
scribeflow gen-notebook "drive:My Drive/lectures/week1.mp4" -o week1.ipynb   # drive extra auto-wired

Open the notebook in Colab and run the cells in order.

The Errno-107 split. On Colab, a Google Drive FUSE mount can drop mid-write and raise OSError: [Errno 107] Transport endpoint is not connected. ScribeFlow sidesteps this by keeping heavy, churny I/O (audio chunks, temp files) on local /content scratch (the --workspace), and writing only durable transcripts and checkpoints to Drive (the --out). If the mount blips, your committed transcripts are already safe and the run resumes.


Output formats

The .txt transcript is always written. Add more with --format (comma-separated):

Format Flag value Description
Text txt Plain transcript (always produced).
SubRip srt Subtitles with timecodes.
WebVTT vtt Web-native subtitles with timecodes.
JSON json Structured segments (text + timing) for downstream tooling.

Subtitle timecodes are global: each chunk's local times are shifted by chunk_index * chunk_seconds, so timing stays correct across the whole file.

scribeflow transcribe ./lecture.mp4 --format txt,srt,vtt,json

How resume works

Resume isn't a bolt-on — it's how the engine runs.

  • Chunk-by-chunk checkpoints. The media is split into chunks; each completed chunk is committed durably to progress.json + chunk_outputs/, using atomic temp-then-replace writes (never a half-written file).
  • Just re-run. Kill the process and run the same command again → ScribeFlow resumes from the last completed chunk. No duplicated work, no corrupted output.
  • RunIdentity guard. A resume refuses to silently mix a different backend/model/chunking/options into an existing run — it raises CheckpointIdentityError. Want a clean slate with new settings? Pass --overwrite.

Determinism makes this safe: Turkish defaults use temperature=0.0, condition_on_previous_text=False (with a tail-prompt continuity hint), vad_filter=True, and beam_size=5, so re-running a chunk reproduces the same result.


Config

Configuration resolves from CLI flags → a scribeflow.toml → environment variables, with sensible defaults underneath. Full reference: docs/CONFIG.md.

Common environment variables:

Variable Purpose
SCRIBEFLOW_LANG Default interface language (en / tr).
SCRIBEFLOW_WHISPERCPP_BIN Path to the whisper-cli binary (Apple Silicon).
SCRIBEFLOW_WHISPERCPP_MODELS Directory holding ggml models for whisper.cpp.
NO_COLOR Disable ANSI colors (also auto-off when piped).

A project-local scribeflow.toml lets you pin defaults:

# scribeflow.toml
backend = "faster-whisper"
model = "large-v3-turbo"
language = "tr"
chunk_minutes = 20
beam_size = 5
formats = ["txt", "srt"]
scribeflow transcribe ./lecture.mp4 --config scribeflow.toml

Contributing

Contributions are welcome — see CONTRIBUTING.md for the dev setup, test, and lint workflow:

pip install -e '.[dev]'
pytest
ruff check .
mypy

License

Licensed under the Apache License 2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scribeflow-0.1.0.tar.gz (1.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

scribeflow-0.1.0-py3-none-any.whl (256.5 kB view details)

Uploaded Python 3

File details

Details for the file scribeflow-0.1.0.tar.gz.

File metadata

  • Download URL: scribeflow-0.1.0.tar.gz
  • Upload date:
  • Size: 1.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scribeflow-0.1.0.tar.gz
Algorithm Hash digest
SHA256 b4086fd493bc5e9a3ca3c72d27c4edf976a6555bbe2fa3ef067b05287a00ce18
MD5 7ef61fe5a7f0760779659da88557ac6c
BLAKE2b-256 5f8f47d45371bc50cc0027f0605c7e5161c9136e9700cb723c24e468a9d3206d

See more details on using hashes here.

Provenance

The following attestation bundles were made for scribeflow-0.1.0.tar.gz:

Publisher: publish.yml on htahaozlu/scribeflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file scribeflow-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: scribeflow-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 256.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for scribeflow-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ec90a0e3d146e9dc8ce3837649339c46b4ba16c9926db73ac13ace1da2048ce4
MD5 ddbee9325157c79a3e4a3e95745f6ba7
BLAKE2b-256 49173958674a7165c6cbbd29900adb253a7025d789a28f35ac0fcf8c3453a15c

See more details on using hashes here.

Provenance

The following attestation bundles were made for scribeflow-0.1.0-py3-none-any.whl:

Publisher: publish.yml on htahaozlu/scribeflow

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page