Skip to main content

Local CrispASR transcription workflow for Codex and MCP agents.

Project description

crispasr-agent-transcriber

Local-only transcription for Codex and MCP-based AI agents, powered by CrispASR. No cloud uploads, no API keys required for transcription.

What it does

Give it a local audio or video file. It:

  1. Probes the spoken language (English or Chinese) using CrispASR's FireRed LID.
  2. Starts a local CrispASR server with the right backend -- Cohere Transcribe for English, Qwen3-ASR for Chinese.
  3. Extracts audio from video with ffmpeg when needed.
  4. Calls CrispASR's /v1/audio/transcriptions endpoint.
  5. Writes the transcript and metadata to disk.

Everything runs on your machine. Media never leaves it.

Quick install for Codex

The plugin includes the Codex Skill, command-line tool, and MCP server. Media stays on your computer. Model files are never downloaded automatically.

1. Install prerequisites

Install Node.js 20 or newer, uv, and ffmpeg. The installer uses uv to provide Python.

node --version
uv --version
ffmpeg -version

2. Run the installer

npx @emiyakatuz/crispasr-agent-transcriber@latest install

The installer:

  • downloads the matching GitHub Release and verifies its SHA-256 checksum;
  • installs the plugin under ~/plugins/crispasr-agent-transcriber;
  • installs the Python and MCP dependencies;
  • detects CUDA, Vulkan, or CPU and installs the best CrispASR build;
  • registers the plugin in the Codex Personal marketplace;
  • preserves existing models, binaries, and outputs during updates.

3. Add the local models

When a model is missing, the installer prints its official source and stops. Download the three files listed under Required models into:

~/plugins/crispasr-agent-transcriber/models/

Then verify the complete installation:

npx @emiyakatuz/crispasr-agent-transcriber@latest doctor

4. Enable the plugin

With a Codex build that supports plugin commands, run:

codex plugin add crispasr-agent-transcriber@personal

If the CLI has no codex plugin command, open the Codex desktop Plugins view and install CrispASR Transcriber from the Personal marketplace. Start a new conversation, then ask:

Transcribe C:\path\to\sample.mp4 with CrispASR using auto language detection.
Save a verbose JSON transcript and an SRT subtitle file.

Update or uninstall

npx @emiyakatuz/crispasr-agent-transcriber@latest update
npx @emiyakatuz/crispasr-agent-transcriber@latest uninstall

Uninstall preserves local models, CrispASR binaries, and outputs. Use uninstall --purge-data only when those files should also be deleted. See Plugin installation for manual installation and troubleshooting.

Direct command-line use

After installation, you can run the transcription script without Codex:

Set-Location (Join-Path $HOME "plugins\crispasr-agent-transcriber")
uv run python scripts/transcribe.py sample.mp4 --profile auto `
  --manage-server `
  --lid-backend firered --lid-model models\firered-lid-q2_k.gguf `
  --model models\cohere-transcribe.gguf `
  --format verbose_json

Use with other AI agents

The MCP server is the cross-agent interface. Any agent that supports MCP stdio can run the released package directly from GitHub:

uvx --from "crispasr-agent-transcriber[mcp] @ git+https://github.com/EmiyaKatuz/crispasr-agent-transcriber.git@v0.3.4" crispasr-agent-mcp

Use the same command and arguments in Claude Desktop, Cursor, or another MCP client. See AI agent integrations for a generic MCP configuration and Codex CLI command.

Maintainer publishing

End users do not need the release steps. Maintainers should follow the publishing guide for Codex Marketplace, PyPI, MCP Registry, and cross-agent distribution.

Required models

This tool does not download models automatically. Download these three GGUF files and keep them in a local directory (the repo's models/ folder works well):

Purpose File ~Size Source
English ASR cohere-transcribe.gguf 3.9 GB Cohere on HuggingFace
Chinese ASR qwen3-asr-1.7b-q4_k.gguf 1.3 GB Qwen3-ASR GGUF
Language detection firered-lid-q2_k.gguf 350 MB FireRed LID GGUF

Pass them on every run:

--model models\cohere-transcribe.gguf
--lid-backend firered --lid-model models\firered-lid-q2_k.gguf

CrispASR binary management

The tool auto-detects, installs, and updates the CrispASR binary from GitHub releases.

Flag Effect
--install-crispasr Download latest platform binary to bin/
--update-crispasr Upgrade to newest release
--crispasr-status Show installed version + update availability
--crispasr-bin-dir PATH Custom directory (default ./bin)
--crispasr-bin PATH Exact path to crispasr.exe

When --manage-server is set and no binary is found, it auto-installs before starting the server.

GPU detection

On install and update, the tool checks your hardware:

  1. CUDA -- nvidia-smi available, or CUDA_PATH / CUDA_HOME set, or CUDA in PATH -> downloads crispasr-*-cuda variant.
  2. Vulkan -- vulkaninfo or VULKAN_SDK set (only when CUDA is absent) -> downloads crispasr-*-vulkan variant.
  3. CPU -- fallback when no GPU toolkit is detected.

macOS always uses the universal binary.

Profiles

Profile Backend ASR model Language hint
english cohere Cohere Transcribe 03-2026 en
chinese qwen3-1.7b Qwen3-ASR 1.7B zh
auto determined by LID determined by LID detected

auto mode runs FireRed language detection on the media, then routes English to Cohere or Chinese to Qwen3-1.7B. Mixed or uncertain content stops with a clear error asking you to re-run with --profile english or --profile chinese.

Usage

Managed server (tool starts CrispASR for you)

uv run python scripts/transcribe.py sample.wav `
  --profile auto `
  --manage-server `
  --model models\qwen3-asr-1.7b-q4_k.gguf `
  --lid-backend firered --lid-model models\firered-lid-q2_k.gguf `
  --format srt `
  --out-dir outputs

Add --keep-server to leave the server running after transcription.

Manual server (you start CrispASR)

# Terminal 1 -- start the server
crispasr --server --backend cohere `
  -m models\cohere-transcribe.gguf `
  --port 8080

# Terminal 2 -- transcribe
uv run python scripts/transcribe.py sample.mp4 `
  --profile english `
  --server-url http://127.0.0.1:8080 `
  --format verbose_json

If the running server's backend doesn't match the selected profile, the tool prints the exact command you need to start the correct server.

Output formats

--format File extension Contents
text .txt Plain transcript
verbose_json .json Full response with segments
srt .srt SubRip subtitles
vtt .vtt WebVTT subtitles

A .metadata.json sidecar is always written alongside the transcript.

Video files

Video files are detected automatically. ffmpeg extracts the audio track to a temporary mono 16 kHz WAV before sending it to CrispASR. The temporary file is deleted when transcription finishes.

All CLI flags

--profile auto|english|chinese
--format text|verbose_json|srt|vtt
--out-dir PATH
--server-url URL
--allow-remote-server
--manage-server
--keep-server
--model PATH               Local GGUF model path
--allow-model-auto-download
--lid-model PATH           Local LID model path
--lid-backend firered|silero|ecapa|whisper
--host HOST                Managed server host (default 127.0.0.1)
--port PORT                Managed server port (default 8080)
--language CODE            Language hint for transcription
--prompt TEXT              Initial prompt/context
--vad                      Enable voice activity detection
--diarize                  Enable speaker diarization
--diarize-method METHOD
--hotwords WORD,WORD       Comma-separated hotwords
--no-timestamps
--preprocess auto|always|never
--api-key KEY              If CRISPASR_API_KEYS is enabled
--crispasr-bin-dir PATH
--crispasr-bin PATH
--install-crispasr
--update-crispasr
--crispasr-status

MCP server

uv sync --extra mcp
uv run --extra mcp crispasr-agent-mcp

Exposed tools:

Tool Description
crispasr_health Check CrispASR server health
crispasr_backends List available backends
crispasr_detect_language Run language detection on a file
transcribe_audio Transcribe an audio file
transcribe_video Transcribe a video file
transcribe_folder Batch-transcribe a folder

Security model

  • No cloud uploads. Media files stay on the local filesystem.
  • No remote servers by default. --server-url only accepts localhost unless --allow-remote-server is explicitly passed.
  • No URL inputs. Only local file paths are accepted. URLs, S3, and other remote schemes are rejected.
  • No shell injection. ffmpeg is called with argument lists and shell=False. No user-controlled strings are interpolated into shell commands.
  • No model downloads by default. CrispASR model auto-download (-m auto) requires --allow-model-auto-download. The same guard applies to language detection models.
  • Temporary files are cleaned up. Converted WAV files and LID probe windows are deleted when transcription finishes.
  • Binary downloads are explicit. CrispASR binary installs only from the official CrispStrobe/CrispASR GitHub releases.
  • Verified plugin releases. The npm installer requires the plugin ZIP to match the SHA-256 value published in the same GitHub Release.
  • Narrow installer writes. The installer manages only its plugin directory and the named Personal marketplace entry. Updates preserve local models, binaries, and outputs.

Verify

uv run pytest
uv run ruff check .  # zero lint warnings

License

This project is licensed under the MIT License.

Third-party components and attribution

This tool orchestrates several independently-licensed projects. It does not bundle, fork, or redistribute their code -- it downloads pre-built binaries and calls them as subprocesses or HTTP services at runtime.

Component License Role
CrispASR MIT ASR engine, server, language detection
ffmpeg LGPL 2.1+ / GPL 2+ Media decoding and audio extraction
Cohere Transcribe 03-2026 Cohere model license English ASR model (loaded by CrispASR)
Qwen3-ASR 1.7B Apache 2.0 Chinese ASR model (loaded by CrispASR)
FireRed LID Apache 2.0 Language detection model (loaded by CrispASR)
httpx BSD HTTP client for CrispASR API
MCP Python SDK MIT MCP server framework
Node.js MIT npm installer runtime
adm-zip MIT Verified plugin ZIP extraction

Model files must be downloaded separately by the user from their respective HuggingFace repositories. See Required models above.

Related projects

  • CrispASR -- the ASR engine this tool wraps
  • CrisperWeaver -- CrispASR's desktop GUI (not used by this tool)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crispasr_agent_transcriber-0.3.4.tar.gz (98.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crispasr_agent_transcriber-0.3.4-py3-none-any.whl (27.2 kB view details)

Uploaded Python 3

File details

Details for the file crispasr_agent_transcriber-0.3.4.tar.gz.

File metadata

  • Download URL: crispasr_agent_transcriber-0.3.4.tar.gz
  • Upload date:
  • Size: 98.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.23 {"installer":{"name":"uv","version":"0.11.23","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for crispasr_agent_transcriber-0.3.4.tar.gz
Algorithm Hash digest
SHA256 399615aad3021160c705122cea7807fdfb4cbf7be47b22a85e8019466f4ad073
MD5 df8879f55de2bf2fa1493dad056b30cd
BLAKE2b-256 3bb30f417ab3328adc1bb0ccea37936bbf18ba80ce7b88c0e92b8e4d58acdb84

See more details on using hashes here.

File details

Details for the file crispasr_agent_transcriber-0.3.4-py3-none-any.whl.

File metadata

  • Download URL: crispasr_agent_transcriber-0.3.4-py3-none-any.whl
  • Upload date:
  • Size: 27.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.23 {"installer":{"name":"uv","version":"0.11.23","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for crispasr_agent_transcriber-0.3.4-py3-none-any.whl
Algorithm Hash digest
SHA256 1d6b8edd3e74c8ce357a84c5787a3a275625be265be66bf1cecf7b444b04f033
MD5 47d6bc675e9d167bff78d091d29fdd6a
BLAKE2b-256 b759b4cbd742b29e312ba8a6b7ab78114f9a54f4f93a3bbf6b6c0465650f6ee7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page