Skip to main content

Transcribe your .wav .mp4 .mp3 .flac files to text or record your own audio!

Project description

Audio Transcriber

CLI or API | MCP | Agent

PyPI - Version MCP Server PyPI - Downloads GitHub Repo stars GitHub forks GitHub contributors PyPI - License GitHub GitHub last commit (by committer) GitHub pull requests GitHub closed pull requests GitHub issues GitHub top language GitHub language count GitHub repo size GitHub repo file count (file type) PyPI - Wheel PyPI - Implementation

Version: 1.0.1

Documentation — Installation, deployment, and usage across the CLI, Python API, MCP server, and A2A agent are maintained in the official documentation.


Overview

Audio Transcriber is a production-grade Agent and Model Context Protocol (MCP) server designed to interface directly with Transcribe your .wav .mp4 .mp3 .flac files to text or record your own audio!.


Key Features

  • Consolidated Action-Routed MCP Tools: Minimizes token overhead and eliminates tool bloat in LLM contexts by grouping methods into optimized, togglable tool modules.
  • Enterprise-Grade Security: Comprehensive support for Eunomia policies, OIDC token delegation, and granular execution context tracking.
  • Integrated Graph Agent: Built-in Pydantic AI agent supporting the Agent Control Protocol (ACP) and standard Web interfaces (AG-UI).
  • Native Telemetry & Tracing: Out-of-the-box OpenTelemetry exports and native Langfuse tracing.

CLI or API

This agent wraps the Transcribe your .wav .mp4 .mp3 .flac files to text or record your own audio! API. You can interact with it programmatically or via its integrated execution entrypoints.

Detailed instructions on how to use the underlying API wrappers, extended schema bindings, and developer SDK references are maintained in docs/index.md.


MCP

This server utilizes dynamic Action-Routed tools to optimize token overhead and maximize IDE compatibility.

Available MCP Tools

The table below is auto-generated from the live server — do not edit by hand.

Condensed action-routed tools (default — MCP_TOOL_MODE=condensed)

MCP Tool Toggle Env Var Description
health_check MISCTOOL
transcribe_audio AUDIO_PROCESSINGTOOL Transcribes audio from a provided file or by recording from the microphone.

Verbose 1:1 API-mapped tools (MCP_TOOL_MODE=verbose or both)

7 per-operation tools — one per public API method (click to expand)
MCP Tool Toggle Env Var Description
audio_transcriber_export AUDIO_TRANSCRIBERTOOL Export transcription to specified formats.
audio_transcriber_initiate_stream AUDIO_TRANSCRIBERTOOL Initiate the audio input stream.
audio_transcriber_interact AUDIO_TRANSCRIBERTOOL Interact with PersonaPlex server via WebSocket.
audio_transcriber_record AUDIO_TRANSCRIBERTOOL Record audio for a specified duration or until stopped.
audio_transcriber_save_stream AUDIO_TRANSCRIBERTOOL Save the recorded frames to a WAV file.
audio_transcriber_stop_stream AUDIO_TRANSCRIBERTOOL Stop and close the audio stream.
audio_transcriber_transcribe AUDIO_TRANSCRIBERTOOL Transcribe the audio file using the initialized backend.

2 action-routed tool(s) (default) · 7 verbose 1:1 tool(s). Each is enabled unless its <DOMAIN>TOOL toggle is set false; MCP_TOOL_MODE selects the surface (condensed default · verbose 1:1 · both). Auto-generated — do not edit.

Detailed tool schemas, parameter shapes, and validation constraints are preserved in docs/mcp.md.

Dynamic Tool Selection & Visibility

This MCP server supports dynamic toolset selection and visibility filtering at runtime. This allows you to restrict the set of exposed tools in order to prevent blowing up the LLM's context window.

You can configure tool filtering via multiple input channels:

  • CLI Arguments: Pass --tools or --toolsets (or their disabled counterparts --disabled-tools and --disabled-toolsets) during startup.
  • Environment Variables: Define standard environment variables:
    • MCP_ENABLED_TOOLS / MCP_DISABLED_TOOLS
    • MCP_ENABLED_TAGS / MCP_DISABLED_TAGS
  • HTTP SSE Request Headers: Pass custom headers during transport initialization:
    • x-mcp-enabled-tools / x-mcp-disabled-tools
    • x-mcp-enabled-tags / x-mcp-disabled-tags
  • HTTP SSE Request Query Parameters: Append query parameters directly to your transport connection URL:
    • ?tools=tool1,tool2
    • ?tags=tag1

When query strings or parameters are supplied, an LLM-free Knowledge Graph resolution layer (using DynamicToolOrchestrator) matches query intents against known tool tags, names, or descriptions, with safe fallback and automated 24-hour background cache refreshing.


MCP Configuration Examples

Install the slim [mcp] extra. All examples install audio-transcriber[mcp] — the MCP-server extra that pulls only the FastMCP / FastAPI tooling (agent-utilities[mcp]). It deliberately excludes the heavy agent runtime (pydantic-ai, the epistemic-graph engine, dspy, llama-index), so uvx / container installs are far smaller. Use the full [agent] extra only when you need the integrated Pydantic AI agent.

stdio Transport (local IDEs — Cursor, Claude Desktop, VS Code)

{
  "mcpServers": {
    "audio-transcriber-mcp": {
      "command": "uvx",
      "args": [
        "--from",
        "audio-transcriber[mcp]",
        "audio-transcriber-mcp"
      ],
      "env": {
        "MCP_TOOL_MODE": "condensed",
        "AUDIO_PROCESSINGTOOL": "True",
        "MISCTOOL": "True",
        "WHISPER_MODEL": "base"
      }
    }
  }
}

Streamable-HTTP Transport (networked / production)

{
  "mcpServers": {
    "audio-transcriber-mcp": {
      "command": "uvx",
      "args": [
        "--from",
        "audio-transcriber[mcp]",
        "audio-transcriber-mcp",
        "--transport",
        "streamable-http",
        "--port",
        "8000"
      ],
      "env": {
        "TRANSPORT": "streamable-http",
        "HOST": "0.0.0.0",
        "PORT": "8000",
        "MCP_TOOL_MODE": "condensed",
        "AUDIO_PROCESSINGTOOL": "True",
        "MISCTOOL": "True",
        "WHISPER_MODEL": "base"
      }
    }
  }
}

Alternatively, connect to a pre-deployed Streamable-HTTP instance by url:

{
  "mcpServers": {
    "audio-transcriber-mcp": {
      "url": "http://localhost:8000/audio-transcriber-mcp/mcp"
    }
  }
}

Deploying the Streamable-HTTP server via Docker:

docker run -d \
  --name audio-transcriber-mcp-mcp \
  -p 8000:8000 \
  -e TRANSPORT=streamable-http \
  -e HOST=0.0.0.0 \
  -e PORT=8000 \
  -e MCP_TOOL_MODE=condensed \
  -e AUDIO_PROCESSINGTOOL=True \
  -e MISCTOOL=True \
  -e WHISPER_MODEL=base \
  knucklessg1/audio-transcriber:mcp

Auto-generated from the code-read env surface (MCP_TOOL_MODE + package vars) — do not edit.

Additional Deployment Options

audio-transcriber can also run as a local container (Docker / Podman / uv) or be consumed from a remote deployment. The Deployment guide has full, copy-paste mcp_config.json for all four transports — stdio, streamable-http, local container / uv, and remote URL:

  • Local container / uv — launch the server from mcp_config.json via uvx, docker run, or podman run, or point at a local streamable-http container by url.
  • Remote URL — connect to a server deployed behind Caddy at http://audio-transcriber-mcp.arpa/mcp using the "url" key.

Agent

This repository features a fully integrated Pydantic AI Graph Agent. It communicates over the Agent Control Protocol (ACP) and interacts seamlessly with the Agent Web UI (AG-UI) and Terminal interface.

Running the Agent CLI

To start the interactive command-line agent:

# Configure transcription (optional)
export WHISPER_MODEL="base"
export TRANSCRIBE_DIRECTORY="/path/to/transcribe_directory"

# Run the agent server
audio-transcriber-agent --provider openai --model-id gpt-4o

Docker Compose Orchestration

The following docker/agent.compose.yml configures the Agent, Web UI, and Terminal Interface together:

version: '3.8'

services:
  audio-transcriber-mcp:
    image: knucklessg1/audio-transcriber:mcp
    container_name: audio-transcriber-mcp
    hostname: audio-transcriber-mcp
    restart: always
    env_file:
      - ../.env
    environment:
      - PYTHONUNBUFFERED=1
      - HOST=0.0.0.0
      - PORT=8000
      - TRANSPORT=streamable-http
    ports:
      - "8000:8000"
    healthcheck:
      test: ["CMD", "python3", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 10s
    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "3"

  audio-transcriber-agent:
    image: knucklessg1/audio-transcriber:latest
    container_name: audio-transcriber-agent
    hostname: audio-transcriber-agent
    restart: always
    depends_on:
      - audio-transcriber-mcp
    env_file:
      - ../.env
    command: [ "audio-transcriber-agent" ]
    environment:
      - PYTHONUNBUFFERED=1
      - HOST=0.0.0.0
      - PORT=9014
      - MCP_URL=http://audio-transcriber-mcp:8000/mcp
      - PROVIDER=${PROVIDER:-openai}
      - MODEL_ID=${MODEL_ID:-gpt-4o}
      - ENABLE_WEB_UI=True
      - ENABLE_OTEL=True
    ports:
      - "9014:9014"
    healthcheck:
      test: ["CMD", "python3", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:9014/health')"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 10s
    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "3"

Detailed graph node architecture explanations, custom skill configurations, and agentic trace guides are available in docs/agent.md.


Security & Governance

Built directly upon the enterprise-ready agent-utilities core, standard security parameters are fully supported:

Access Control & Policy Enforcement

  • Eunomia Policies: Fine-grained, policy-driven tool authorization. Supports none, local embedded (mcp_policies.json), or centralized remote modes.
  • OIDC Token Delegation: Compliant with RFC 8693 token exchange for flowing authenticating user credentials from Web UI / ACP → Agent → MCP.
  • Scoped Credentials: Execution context runs restricted to the specific caller identity.

Runtime Security Grid

Feature Functionality Enablement
Tool Guard Sensitivity inspection with human-in-the-loop validation Enabled by default
Prompt Injection Defense Input scanning, repetition monitoring, and recursive loop blocks Enabled by default
Context Safety Guard Stuck-loop detectors and contextual overflow preemptive alerts Enabled by default

Environment Variables

Package environment variables

Variable Example Description
HOST 0.0.0.0
PORT 8000
TRANSPORT stdio options: stdio, streamable-http, sse
ENABLE_OTEL True
OTEL_EXPORTER_OTLP_ENDPOINT http://localhost:8080/api/public/otel
OTEL_EXPORTER_OTLP_PUBLIC_KEY pk-...
OTEL_EXPORTER_OTLP_SECRET_KEY sk-...
OTEL_EXPORTER_OTLP_PROTOCOL http/protobuf
EUNOMIA_TYPE none options: none, embedded, remote
EUNOMIA_POLICY_FILE mcp_policies.json
EUNOMIA_REMOTE_URL http://eunomia-server:8000
TRANSCRIBE_DIRECTORY /path/to/transcribe_directory Directory where transcripts are written (defaults to the data dir under audio-transcriber)
MISCTOOL True
AUDIO_PROCESSINGTOOL True
WHISPER_MODEL base Standard OpenAI Whisper model to use for local transcription (e.g., base, tiny, small)

Inherited agent-utilities variables (apply to every connector)

Variable Example Description
MCP_TOOL_MODE condensed Tool surface: condensed
MCP_ENABLED_TOOLS Comma-separated tool allow-list
MCP_DISABLED_TOOLS Comma-separated tool deny-list
MCP_ENABLED_TAGS Comma-separated tag allow-list
MCP_DISABLED_TAGS Comma-separated tag deny-list
MCP_CLIENT_AUTH Outbound MCP auth (oidc-client-credentials for fleet calls)
OIDC_CLIENT_ID OIDC client id (service-account auth)
OIDC_CLIENT_SECRET OIDC client secret (service-account auth)
DEBUG False Verbose logging
PYTHONUNBUFFERED 1 Unbuffered stdout (recommended in containers)
MCP_URL http://localhost:8000/mcp URL of the MCP server the agent connects to
PROVIDER openai LLM provider for the agent
MODEL_ID gpt-4o Model id for the agent
ENABLE_WEB_UI True Serve the AG-UI web interface

15 package + 14 inherited variable(s). Auto-generated from .env.example + the shared agent-utilities set — do not edit.

Every variable the server reads, grouped by purpose.

Transcription

Variable Description Default
WHISPER_MODEL Local OpenAI Whisper model (e.g. base, tiny, small) base
TRANSCRIBE_DIRECTORY Directory where transcripts are written data dir

MCP server / transport

Variable Description Default
TRANSPORT stdio, streamable-http, or sse stdio
HOST Bind host (HTTP transports) 0.0.0.0
PORT Bind port (HTTP transports) 8000
MCP_TOOL_MODE Tool surface: condensed, verbose, or both condensed
MCP_ENABLED_TOOLS / MCP_DISABLED_TOOLS Comma-separated tool allow/deny list
MCP_ENABLED_TAGS / MCP_DISABLED_TAGS Comma-separated tag allow/deny list
DEBUG Verbose logging False
PYTHONUNBUFFERED Unbuffered stdout (recommended in containers) 1

Tool toggles

Each action-routed tool can be disabled individually via its toggle env var (set to false). See the Available MCP Tools table above for the authoritative names.

Variable Description Default
MISCTOOL Toggle the miscellaneous / health-check tool True
AUDIO_PROCESSINGTOOL Toggle the audio-processing (transcription) tool True

Telemetry & governance

Variable Description Default
ENABLE_OTEL Enable OpenTelemetry export True
OTEL_EXPORTER_OTLP_ENDPOINT OTLP collector endpoint
OTEL_EXPORTER_OTLP_PUBLIC_KEY / OTEL_EXPORTER_OTLP_SECRET_KEY OTLP auth keys
OTEL_EXPORTER_OTLP_PROTOCOL OTLP protocol (e.g. http/protobuf)
EUNOMIA_TYPE Authorization mode: none, embedded, remote none
EUNOMIA_POLICY_FILE Embedded policy file mcp_policies.json
EUNOMIA_REMOTE_URL Remote Eunomia server URL

Agent CLI (full [agent] runtime only)

Variable Description Default
MCP_URL URL of the MCP server the agent connects to http://localhost:8000/mcp
PROVIDER LLM provider (e.g. openai) openai
MODEL_ID Model id (e.g. gpt-4o) gpt-4o
ENABLE_WEB_UI Serve the AG-UI web interface True

See .env.example for a copy-paste starting point.


Installation

Pick the extra that matches what you want to run:

Extra Installs Use when
audio-transcriber[mcp] Slim MCP server only (agent-utilities[mcp] — FastMCP/FastAPI) You only run the MCP server (smallest install / image)
audio-transcriber[agent] Full agent runtime (agent-utilities[agent,logfire] — Pydantic AI + the epistemic-graph engine) You run the integrated agent
audio-transcriber[all] Everything (mcp + agent) Development / both surfaces
# MCP server only (recommended for tool hosting — slim deps)
uv pip install "audio-transcriber[mcp]"

# Full agent runtime (Pydantic AI + epistemic-graph engine)
uv pip install "audio-transcriber[agent]"

# Everything (development)
uv pip install "audio-transcriber[all]"      # or: python -m pip install "audio-transcriber[all]"

Container images (:mcp vs :agent)

One multi-stage docker/Dockerfile builds two right-sized images, selected by --target:

Image tag Build target Contents Entrypoint
knucklessg1/audio-transcriber:mcp --target mcp audio-transcriber[mcp]slim, no engine/pydantic-ai/dspy/llama-index/tree-sitter audio-transcriber-mcp
knucklessg1/audio-transcriber:latest --target agent (default) audio-transcriber[agent]full agent runtime + epistemic-graph engine audio-transcriber-agent
docker build --target mcp   -t knucklessg1/audio-transcriber:mcp    docker/   # slim MCP server
docker build --target agent -t knucklessg1/audio-transcriber:latest docker/   # full agent

docker/mcp.compose.yml runs the slim :mcp server; docker/agent.compose.yml runs the agent (:latest) with a co-located :mcp sidecar.

Knowledge-graph database (epistemic-graph)

The full agent ([agent] / :latest) embeds the epistemic-graph engine (pulled in transitively via agent-utilities[agent]). For production — or to share one knowledge graph across multiple agents — run epistemic-graph as its own database container and point the agent at it instead of embedding it. Deployment recipes (single-node + Raft HA), connection config, and the full database architecture (with diagrams) are documented in the epistemic-graph deployment guide. The slim [mcp] server does not require the database.


Repository Owners

GitHub followers GitHub User's stars


Documentation

The complete documentation is published as the official documentation site and is the recommended reference for installation, deployment, and day-to-day operation.

Page Contents
Installation pip, source, extras, prebuilt Docker image
Deployment run the MCP server and agent, Compose, Caddy + Technitium, env config
Usage the MCP tool, the AudioTranscriber API, the CLI
Overview capability summary and ecosystem role
Concepts concept registry (CONCEPT:AUDIO-*)

Contribute

Contributions are welcome! Please ensure code quality by executing local checks before submitting pull requests:

  • Format code using ruff format .
  • Lint code using ruff check .
  • Validate type-safety with mypy .
  • Execute test suites using pytest

Deploy with agent-os-genesis

This package can be provisioned for you — skill-guided — by the agent-os-genesis universal skill (its single-package deploy mode): it picks your install method, seeds secrets to OpenBao/Vault (or .env), trusts your enterprise CA, registers the MCP server, and verifies it — the same machinery that stands up the whole Agent OS, narrowed to just this package. Ask your agent to "deploy audio-transcriber with agent-os-genesis".

Install mode Command
Bare-metal, prod (PyPI) uvx audio-transcriber-mcp · or uv tool install audio-transcriber
Bare-metal, dev (editable) uv pip install -e ".[all]" · or pip install -e ".[all]"
Container, prod deploy knucklessg1/audio-transcriber:latest via docker-compose / swarm / podman / podman-compose / kubernetes
Container, dev (editable) deploy docker/compose.dev.yml (source-mounted at /src; edits live on restart)

Secrets are read-existing + seeded via vault_sync — you are only prompted for what's missing.

Project details


Release history Release notifications | RSS feed

This version

1.0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audio_transcriber-1.0.1.tar.gz (55.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

audio_transcriber-1.0.1-py3-none-any.whl (47.2 kB view details)

Uploaded Python 3

File details

Details for the file audio_transcriber-1.0.1.tar.gz.

File metadata

  • Download URL: audio_transcriber-1.0.1.tar.gz
  • Upload date:
  • Size: 55.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for audio_transcriber-1.0.1.tar.gz
Algorithm Hash digest
SHA256 c585c17ee14e9b7af2d9e4f3f272e71427d93e47ac50b72f7515e858e9a1847d
MD5 b977553818fa171aef9db66f8a4ac539
BLAKE2b-256 358b176387191fadfab10cf9333a2f3c669fba4f0dd04a365e97b6ff698f50dc

See more details on using hashes here.

File details

Details for the file audio_transcriber-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for audio_transcriber-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c8c57b9c0fc38d6b6f155ef5f7db587ac53d7060048d303b793c38b7a14cb989
MD5 00d29c819383b53ea3132da6cff5c4cb
BLAKE2b-256 7e5e417bd851898f2065331e10dc237fb803bcbd209239e86c666a83b593309b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page