Portable, resumable, multi-backend Whisper transcription — runs anywhere, resumes after crashes.
Project description
English | Türkçe
Local CPU/GPU, Apple Silicon, or Google Colab. Input from a file, folder, Google Drive, or URL. ScribeFlow auto-selects the right model for the hardware it finds, writes durable checkpoints as it goes, and resumes cleanly after a crash — no duplicated or corrupted output.
Demo
Install
Base install is intentionally small — the pure-Python engine plus the default faster-whisper backend, which runs on CPU out of the box:
pip install scribeflow
ffmpeg is the one required system dependency:
# macOS (Homebrew)
brew install ffmpeg
# Debian / Ubuntu
sudo apt-get install -y ffmpeg
# Fedora
sudo dnf install -y ffmpeg
# Windows (winget)
winget install Gyan.FFmpeg
Optional extras layer in heavier backends, the web UI, and remote sources:
pip install 'scribeflow[gpu]' # torch + CUDA (documented, not hard-pinned)
pip install 'scribeflow[cpp]' # whisper.cpp via pywhispercpp — Apple-Silicon Metal path
pip install 'scribeflow[openai]' # openai-whisper (PyTorch reference baseline)
pip install 'scribeflow[web]' # FastAPI web UI: scribeflow web
pip install 'scribeflow[url]' # yt-dlp — transcribe straight from a URL
pip install 'scribeflow[drive]' # Google Drive API source
pip install 'scribeflow[dev]' # pytest + ruff + mypy + nbformat
From a clone (editable, with the dev toolchain):
git clone https://github.com/htahaozlu/scribeflow
cd scribeflow
pip install -e '.[dev]'
Run without installing (the npx equivalent)
ScribeFlow is a Python CLI — there is no npm/npx; the equivalents are
pipx and uv. Once published to
PyPI:
pipx install scribeflow # isolated global install
uvx scribeflow transcribe lecture.mp4 # run once, no install (like npx)
Before the PyPI release you can run it straight from a clone:
pipx install . # from the cloned repo
Homebrew (macOS) — planned
After the first PyPI release, a tap will provide a one-liner:
brew install htahaozlu/tap/scribeflow # planned (post-PyPI)
Note — the
[gpu]extra documents the torch + CUDA path but does not hard-pin a CUDA build, so you stay in control of the wheel that matches your driver. See docs/CONFIG.md for the recommended install line.
Quickstart
pip install scribeflow # base install (CPU-capable)
scribeflow doctor # check ffmpeg / device / backends
scribeflow transcribe ./lecture.mp4 # auto-selects backend + model for this host
scribeflow transcribe ./lecture.mp4 --format srt,vtt
That's it. The .txt transcript is always written; --format adds subtitle and
JSON outputs. If the run is interrupted, re-run the same command and ScribeFlow
picks up from the last completed chunk.
What it does
ScribeFlow is a single tool that gives you the same transcription pipeline everywhere:
- Runs anywhere — local CPU or NVIDIA GPU, Apple Silicon (Metal via whisper.cpp), or Google Colab — and adapts to the hardware it detects.
- Input from anywhere — a local file or folder, a URL (yt-dlp), a mounted Google Drive path, or an upload through the web UI.
- Auto-selects the model for the detected hardware, with sensible defaults tuned for quality (global default: large-v3-turbo).
- Crash-safe and resumable — durable per-chunk checkpoints mean an interrupted run resumes from where it stopped, with no duplicated or corrupted output.
- Honest scope — local models only (no cloud APIs in v1); transcripts are best-effort ASR, not verbatim legal records.
Backends & hardware
Every backend normalizes to one output shape, so you can swap them without changing
your workflow. ScribeFlow auto-selects based on the host; you can always override with
--backend, --model, --device, and --compute-type.
| Backend | Best for | Install | Notes |
|---|---|---|---|
faster-whisper |
CPU and NVIDIA CUDA (the default) | base install | CUDA → float16 (≥8 GB VRAM) or int8_float16; CPU → int8. |
whispercpp |
Apple Silicon — Metal GPU | pip install 'scribeflow[cpp]' |
Needs a whisper-cli binary + a ggml model (see env vars below). |
openai-whisper |
PyTorch reference baseline | pip install 'scribeflow[openai]' |
The reference implementation; slower, useful for comparison. |
Hardware auto-select rules:
- Apple Silicon → whisper.cpp on Metal when its binary is available, otherwise
faster-whisper CPU
int8. ScribeFlow never offerscuda/mpsto faster-whisper on macOS-arm64 — that path doesn't exist, so it isn't pretended. - CUDA →
float16for ≥8 GB VRAM, otherwiseint8_float16. - CPU →
int8. - It never auto-selects
tiny/base/distilfor Turkish; the global default islarge-v3-turbo.
To enable the Apple-Silicon Metal path, point ScribeFlow at your whisper.cpp binary and ggml models:
export SCRIBEFLOW_WHISPERCPP_BIN=/path/to/whisper-cli
export SCRIBEFLOW_WHISPERCPP_MODELS=/path/to/ggml-models
See your host's pick at any time:
scribeflow models # lists the catalog + this host's auto-pick
scribeflow doctor # ffmpeg / device / VRAM / RAM / backends checklist
Usage
scribeflow transcribe <source> [options]
scribeflow models [--want default|speed|quality] [--json] [--ui-lang en|tr]
scribeflow doctor [--json] [--ui-lang en|tr]
scribeflow gen-notebook <source> -o nb.ipynb [options]
scribeflow web [--host 127.0.0.1] [--port 8000]
scribeflow --version
scribeflow transcribe
The source is a local file/folder, a http(s):// URL, or a drive: path. The kind
is inferred from the argument; override it with --source-kind.
Key flags:
| Flag | Purpose |
|---|---|
--backend |
faster-whisper · whispercpp · openai-whisper |
--model |
Override the auto-selected model (default: large-v3-turbo). |
--device / --compute-type |
cpu · cuda; e.g. float16, int8, int8_float16. |
--want default|speed|quality |
Bias the auto-pick toward speed or quality. |
--language / -l |
Audio language (Turkish tr by default; auto to detect). |
--chunk-minutes |
Chunk length for checkpointing (default 20). |
--beam-size |
Decoder beam width (Turkish default: 5). |
--format |
txt,srt,vtt,json (comma-separated; txt is always written). |
--out |
Durable output dir (transcripts + checkpoints). |
--workspace |
Scratch dir for heavy audio-chunk I/O. |
--cache-dir |
Model download cache. |
--runtime auto|local|colab |
Execution target (owns the scratch-vs-durable split). |
--source-kind local|url|drive|upload |
Force the source kind instead of inferring it. |
--overwrite |
Discard any existing run and start fresh. |
--config FILE |
Path to a scribeflow.toml. |
--json |
Machine-readable JSON output. |
--ui-lang / --lang en|tr |
Interface language (separate from --language). |
Examples:
# A whole folder, Turkish, with subtitles
scribeflow transcribe ./lectures/ --format srt,vtt
# A URL (needs the [url] extra), auto-detect language, quality bias
scribeflow transcribe "https://example.com/talk.mp4" -l auto --want quality
# Force a backend/model on capable hardware
scribeflow transcribe ./talk.wav --backend faster-whisper --model large-v3 --device cuda --compute-type float16
# Split scratch vs. durable storage explicitly
scribeflow transcribe ./lecture.mp4 --out ./out --workspace /tmp/scribeflow-scratch
Web UI
With the [web] extra installed, launch a small FastAPI app to upload media and
transcribe from the browser:
pip install 'scribeflow[web]'
scribeflow web # http://127.0.0.1:8000
scribeflow web --host 0.0.0.0 --port 8080 --out ./out --workspace /tmp/scribeflow-scratch
Colab
scribeflow gen-notebook emits a runnable .ipynb that mounts Drive, pip-installs ScribeFlow,
transcribes, and resumes — top-to-bottom, no editing required:
scribeflow gen-notebook ./lecture.mp4 -o scribeflow_colab.ipynb
scribeflow gen-notebook "https://example.com/talk.mp4" -o talk.ipynb # url extra auto-wired
scribeflow gen-notebook "drive:My Drive/lectures/week1.mp4" -o week1.ipynb # drive extra auto-wired
Open the notebook in Colab and run the cells in order.
The Errno-107 split. On Colab, a Google Drive FUSE mount can drop mid-write and
raise OSError: [Errno 107] Transport endpoint is not connected. ScribeFlow sidesteps this
by keeping heavy, churny I/O (audio chunks, temp files) on local /content scratch
(the --workspace), and writing only durable transcripts and checkpoints to Drive
(the --out). If the mount blips, your committed transcripts are already safe and the
run resumes.
Output formats
The .txt transcript is always written. Add more with --format
(comma-separated):
| Format | Flag value | Description |
|---|---|---|
| Text | txt |
Plain transcript (always produced). |
| SubRip | srt |
Subtitles with timecodes. |
| WebVTT | vtt |
Web-native subtitles with timecodes. |
| JSON | json |
Structured segments (text + timing) for downstream tooling. |
Subtitle timecodes are global: each chunk's local times are shifted by
chunk_index * chunk_seconds, so timing stays correct across the whole file.
scribeflow transcribe ./lecture.mp4 --format txt,srt,vtt,json
How resume works
Resume isn't a bolt-on — it's how the engine runs.
- Chunk-by-chunk checkpoints. The media is split into chunks; each completed chunk
is committed durably to
progress.json+chunk_outputs/, using atomic temp-then-replace writes (never a half-written file). - Just re-run. Kill the process and run the same command again → ScribeFlow resumes from the last completed chunk. No duplicated work, no corrupted output.
- RunIdentity guard. A resume refuses to silently mix a different
backend/model/chunking/options into an existing run — it raises
CheckpointIdentityError. Want a clean slate with new settings? Pass--overwrite.
Determinism makes this safe: Turkish defaults use temperature=0.0,
condition_on_previous_text=False (with a tail-prompt continuity hint),
vad_filter=True, and beam_size=5, so re-running a chunk reproduces the same result.
Config
Configuration resolves from CLI flags → a scribeflow.toml → environment variables, with
sensible defaults underneath. Full reference: docs/CONFIG.md.
Common environment variables:
| Variable | Purpose |
|---|---|
SCRIBEFLOW_LANG |
Default interface language (en / tr). |
SCRIBEFLOW_WHISPERCPP_BIN |
Path to the whisper-cli binary (Apple Silicon). |
SCRIBEFLOW_WHISPERCPP_MODELS |
Directory holding ggml models for whisper.cpp. |
NO_COLOR |
Disable ANSI colors (also auto-off when piped). |
A project-local scribeflow.toml lets you pin defaults:
# scribeflow.toml
backend = "faster-whisper"
model = "large-v3-turbo"
language = "tr"
chunk_minutes = 20
beam_size = 5
formats = ["txt", "srt"]
scribeflow transcribe ./lecture.mp4 --config scribeflow.toml
Contributing
Contributions are welcome — see CONTRIBUTING.md for the dev setup, test, and lint workflow:
pip install -e '.[dev]'
pytest
ruff check .
mypy
License
Licensed under the Apache License 2.0 — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file scribeflow-0.1.0.tar.gz.
File metadata
- Download URL: scribeflow-0.1.0.tar.gz
- Upload date:
- Size: 1.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b4086fd493bc5e9a3ca3c72d27c4edf976a6555bbe2fa3ef067b05287a00ce18
|
|
| MD5 |
7ef61fe5a7f0760779659da88557ac6c
|
|
| BLAKE2b-256 |
5f8f47d45371bc50cc0027f0605c7e5161c9136e9700cb723c24e468a9d3206d
|
Provenance
The following attestation bundles were made for scribeflow-0.1.0.tar.gz:
Publisher:
publish.yml on htahaozlu/scribeflow
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scribeflow-0.1.0.tar.gz -
Subject digest:
b4086fd493bc5e9a3ca3c72d27c4edf976a6555bbe2fa3ef067b05287a00ce18 - Sigstore transparency entry: 1708597366
- Sigstore integration time:
-
Permalink:
htahaozlu/scribeflow@b079de60cb2eac322dc099eaf31e0dfbe64c89aa -
Branch / Tag:
refs/heads/main - Owner: https://github.com/htahaozlu
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b079de60cb2eac322dc099eaf31e0dfbe64c89aa -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file scribeflow-0.1.0-py3-none-any.whl.
File metadata
- Download URL: scribeflow-0.1.0-py3-none-any.whl
- Upload date:
- Size: 256.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ec90a0e3d146e9dc8ce3837649339c46b4ba16c9926db73ac13ace1da2048ce4
|
|
| MD5 |
ddbee9325157c79a3e4a3e95745f6ba7
|
|
| BLAKE2b-256 |
49173958674a7165c6cbbd29900adb253a7025d789a28f35ac0fcf8c3453a15c
|
Provenance
The following attestation bundles were made for scribeflow-0.1.0-py3-none-any.whl:
Publisher:
publish.yml on htahaozlu/scribeflow
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
scribeflow-0.1.0-py3-none-any.whl -
Subject digest:
ec90a0e3d146e9dc8ce3837649339c46b4ba16c9926db73ac13ace1da2048ce4 - Sigstore transparency entry: 1708597375
- Sigstore integration time:
-
Permalink:
htahaozlu/scribeflow@b079de60cb2eac322dc099eaf31e0dfbe64c89aa -
Branch / Tag:
refs/heads/main - Owner: https://github.com/htahaozlu
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b079de60cb2eac322dc099eaf31e0dfbe64c89aa -
Trigger Event:
workflow_dispatch
-
Statement type: