Skip to main content

REST API for audio transcription using parakeet-mlx

Project description

Paratran

CLI, REST API, and MCP server for audio transcription on Apple Silicon, powered by parakeet-mlx.

The default model (parakeet-tdt-0.6b-v3) achieves 6.34% average WER across 8 English benchmarks and supports 25 languages. Runs ~30x faster than Whisper on Apple Silicon via MLX.

Requirements

  • macOS with Apple Silicon (M1/M2/M3/M4)
  • Python 3.11+
  • ~2 GB memory for the default model

Quick Start

Transcribe audio files directly:

uvx paratran recording.wav

Or start the REST API server and transcribe via client mode (no model reload per file):

uvx paratran serve
uvx paratran -s http://localhost:8000 recording.wav

Install

uv (recommended)

uv tool install paratran

pip

pip install paratran

From source

git clone https://github.com/briansunter/paratran.git
cd paratran
uv sync
uv run paratran

CLI Usage

# Transcribe a single file
paratran recording.wav

# Transcribe multiple files with verbose output
paratran -v file1.wav file2.mp3 file3.m4a

# Output as SRT subtitles
paratran --output-format srt recording.wav

# Output all formats (txt, json, srt, vtt)
paratran --output-format all --output-dir ./output recording.wav

# Use beam search decoding
paratran --decoding beam recording.wav

# Custom model and cache directory
paratran --model mlx-community/parakeet-tdt-1.1b-v2 --cache-dir /Volumes/Storage/models recording.wav

Client Mode

Use --server / -s to send files to a running paratran server instead of transcribing locally. This avoids model loading time on every invocation — start the server once, then transcribe instantly.

# Start the server (loads model once)
paratran serve

# Transcribe via the server
paratran -s http://localhost:8000 recording.wav

# All the same options work
paratran -s http://localhost:8000 --output-format all --output-dir ./output -v recording.wav

# Set the server URL via environment variable
export PARATRAN_SERVER=http://localhost:8000
paratran recording.wav  # automatically uses the server

CLI Options

Flag Default Description
-s, --server URL of a running paratran server
--model mlx-community/parakeet-tdt-0.6b-v3 HF model ID or local path
--cache-dir HuggingFace default Model cache directory
--output-dir . Output directory
--output-format txt txt, json, srt, vtt, or all
--decoding greedy greedy or beam
--chunk-duration 120 Chunk duration in seconds (0 to disable)
--overlap-duration 15 Overlap between chunks
--beam-size 5 Beam size (beam decoding)
--length-penalty 0.013 Length penalty (beam decoding)
--patience 3.5 Patience (beam decoding)
--duration-reward 0.67 Duration reward (beam decoding)
--max-words Max words per sentence
--silence-gap Split at silence gaps (seconds)
--max-duration Max sentence duration (seconds)
--fp32 Use FP32 precision instead of BF16
-v Verbose output

Environment variables: PARATRAN_MODEL, PARATRAN_MODEL_DIR, PARATRAN_SERVER.

REST API Server

# Start server with default settings
paratran serve

# Custom host, port, and model cache
paratran serve --host 127.0.0.1 --port 9000 --cache-dir /Volumes/Storage/models

API

GET /health

curl http://localhost:8000/health
{
  "status": "ok",
  "model": "mlx-community/parakeet-tdt-0.6b-v3",
  "model_dir": "/Volumes/Storage/models"
}

POST /transcribe

Upload an audio file (wav, mp3, flac, m4a, ogg, webm):

curl -X POST http://localhost:8000/transcribe -F "file=@recording.m4a"

Optional query parameters:

Parameter Default Description
decoding greedy greedy or beam
beam_size 5 Beam size (beam decoding)
length_penalty 1.0 Length penalty (beam decoding)
patience 1.0 Patience (beam decoding)
duration_reward 0.7 Duration reward (beam decoding)
max_words Max words per sentence
silence_gap Split at silence gaps (seconds)
max_duration Max sentence duration (seconds)
chunk_duration Chunk duration for long audio (seconds)
overlap_duration 15.0 Overlap between chunks (seconds)
fp32 false Use FP32 instead of BF16
{
  "text": "Hello world, this is a test.",
  "duration": 3.52,
  "processing_time": 0.176,
  "sentences": [
    {
      "text": "Hello world, this is a test.",
      "start": 0.0,
      "end": 3.52,
      "tokens": [
        { "text": "Hello", "start": 0.0, "end": 0.48 },
        { "text": " world", "start": 0.48, "end": 0.8 }
      ]
    }
  ]
}

Interactive API docs are available at http://localhost:8000/docs.

MCP Server

Paratran includes an MCP server so Claude Code, Claude Desktop, or any MCP client can transcribe audio files directly. Supports both stdio and streamable HTTP transports.

Claude Code (stdio)

Add to .claude/settings.json:

{
  "mcpServers": {
    "paratran": {
      "command": "uvx",
      "args": ["--from", "paratran", "paratran-mcp"]
    }
  }
}

Claude Desktop (stdio)

Add to ~/Library/Application Support/Claude/claude_desktop_config.json:

{
  "mcpServers": {
    "paratran": {
      "command": "uvx",
      "args": ["--from", "paratran", "paratran-mcp"]
    }
  }
}

Optionally set PARATRAN_MODEL_DIR in the env block to customize the model cache location.

Streamable HTTP

Run the MCP server over HTTP for remote or multi-client access:

paratran-mcp --transport streamable-http --host 0.0.0.0 --port 8000

The MCP endpoint is available at http://localhost:8000/mcp.

MCP Tool

The transcribe tool accepts a file path and all the same options as the REST API (decoding, beam search, sentence splitting, chunking, precision).

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

paratran-0.5.0.tar.gz (11.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

paratran-0.5.0-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file paratran-0.5.0.tar.gz.

File metadata

  • Download URL: paratran-0.5.0.tar.gz
  • Upload date:
  • Size: 11.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for paratran-0.5.0.tar.gz
Algorithm Hash digest
SHA256 8de12d10625290f00b2bb7959656a48264366c8ba3b542b8823583814adab731
MD5 1622029dcfbf21cfcbb83869ba7780a4
BLAKE2b-256 9f20458017c942328bc1b10e805cb0955e42e63ca40fda51c9a765911b4e9cc3

See more details on using hashes here.

File details

Details for the file paratran-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: paratran-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 11.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.2 {"installer":{"name":"uv","version":"0.10.2","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for paratran-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 725a1ffe4a62bf594c007a6723f6c6f9e6a56639093b9f66b4a8cbcf641542df
MD5 cb07ca50fdf994f12fbc328e8c02bfb2
BLAKE2b-256 d5860c0225968a1254ba5e34e18bc49df15f777b3ff61d729ec232dc0f6585a7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page