Skip to main content

Transcribe your .wav .mp4 .mp3 .flac files to text or record your own audio!

Project description

Audio Transcriber

CLI or API | MCP | Agent

PyPI - Version MCP Server PyPI - Downloads GitHub Repo stars GitHub forks GitHub contributors PyPI - License GitHub GitHub last commit (by committer) GitHub pull requests GitHub closed pull requests GitHub issues GitHub top language GitHub language count GitHub repo size GitHub repo file count (file type) PyPI - Wheel PyPI - Implementation

Version: 0.25.0


Overview

Audio Transcriber is a production-grade Agent and Model Context Protocol (MCP) server designed to interface directly with Transcribe your .wav .mp4 .mp3 .flac files to text or record your own audio!.


Key Features

  • Consolidated Action-Routed MCP Tools: Minimizes token overhead and eliminates tool bloat in LLM contexts by grouping methods into optimized, togglable tool modules.
  • Enterprise-Grade Security: Comprehensive support for Eunomia policies, OIDC token delegation, and granular execution context tracking.
  • Integrated Graph Agent: Built-in Pydantic AI agent supporting the Agent Control Protocol (ACP) and standard Web interfaces (AG-UI).
  • Native Telemetry & Tracing: Out-of-the-box OpenTelemetry exports and native Langfuse tracing.

CLI or API

This agent wraps the Transcribe your .wav .mp4 .mp3 .flac files to text or record your own audio! API. You can interact with it programmatically or via its integrated execution entrypoints.

Detailed instructions on how to use the underlying API wrappers, extended schema bindings, and developer SDK references are maintained in docs/index.md.


MCP

This server utilizes dynamic Action-Routed tools to optimize token overhead and maximize IDE compatibility.

Available MCP Tools

Tool Module Toggle Env Var Enabled by Default Description & Nested Methods
Misc MISC_TOOL True Manage audio transcriber misc operations.
Audio Processing AUDIO_PROCESSING_TOOL True Transcribes audio from a provided file or by recording from the microphone.

Detailed tool schemas, parameter shapes, and validation constraints are preserved in docs/mcp.md.

Dynamic Tool Selection & Visibility

This MCP server supports dynamic toolset selection and visibility filtering at runtime. This allows you to restrict the set of exposed tools in order to prevent blowing up the LLM's context window.

You can configure tool filtering via multiple input channels:

  • CLI Arguments: Pass --tools or --toolsets (or their disabled counterparts --disabled-tools and --disabled-toolsets) during startup.
  • Environment Variables: Define standard environment variables:
    • MCP_ENABLED_TOOLS / MCP_DISABLED_TOOLS
    • MCP_ENABLED_TAGS / MCP_DISABLED_TAGS
  • HTTP SSE Request Headers: Pass custom headers during transport initialization:
    • x-mcp-enabled-tools / x-mcp-disabled-tools
    • x-mcp-enabled-tags / x-mcp-disabled-tags
  • HTTP SSE Request Query Parameters: Append query parameters directly to your transport connection URL:
    • ?tools=tool1,tool2
    • ?tags=tag1

When query strings or parameters are supplied, an LLM-free Knowledge Graph resolution layer (using DynamicToolOrchestrator) matches query intents against known tool tags, names, or descriptions, with safe fallback and automated 24-hour background cache refreshing.


MCP Configuration Examples

stdio Transport (Recommended for local IDEs e.g., Cursor, Claude Desktop)

Configure your IDE's mcp.json to launch the MCP server via uvx:

{
  "mcpServers": {
    "audio-transcriber": {
      "command": "uvx",
      "args": [
        "--from",
        "audio-transcriber",
        "audio-transcriber-mcp"
      ],
      "env": {
        "AUDIO_TRANSCRIPTOR_API_KEY": "your_audio_transcriptor_api_key_here",
        "LANGSMITH_DEFAULT_SYSTEM_PROMPT": "your_langsmith_default_system_prompt_here",
        "OPENROUTER_API_KEY": "your_openrouter_api_key_here"
      }
    }
  }
}

Streamable-HTTP Transport (Recommended for production deployments)

Configure your client's mcp.json to launch the Streamable-HTTP server via uvx with explicit host and port definition:

{
  "mcpServers": {
    "audio-transcriber": {
      "command": "uvx",
      "args": [
        "--from",
        "audio-transcriber",
        "audio-transcriber-mcp"
      ],
      "env": {
        "TRANSPORT": "streamable-http",
        "HOST": "0.0.0.0",
        "PORT": "8000",
        "AUDIO_TRANSCRIPTOR_API_KEY": "your_audio_transcriptor_api_key_here",
        "LANGSMITH_DEFAULT_SYSTEM_PROMPT": "your_langsmith_default_system_prompt_here",
        "OPENROUTER_API_KEY": "your_openrouter_api_key_here"
      }
    }
  }
}

Alternatively, connect to a pre-deployed remote or local Streamable-HTTP instance:

{
  "mcpServers": {
    "audio-transcriber": {
      "url": "http://localhost:8000/audio-transcriber/mcp"
    }
  }
}

Deploying the Streamable-HTTP server via Docker:

docker run -d \
  --name audio-transcriber-mcp \
  -p 8000:8000 \
  -e TRANSPORT=streamable-http \
  -e PORT=8000 \
  -e AUDIO_TRANSCRIPTOR_API_KEY="your_value" \
  -e LANGSMITH_DEFAULT_SYSTEM_PROMPT="your_value" \
  -e OPENROUTER_API_KEY="your_value" \
  knucklessg1/audio-transcriber:latest

Agent

This repository features a fully integrated Pydantic AI Graph Agent. It communicates over the Agent Control Protocol (ACP) and interacts seamlessly with the Agent Web UI (AG-UI) and Terminal interface.

Running the Agent CLI

To start the interactive command-line agent:

# Set credentials
export AUDIO_TRANSCRIPTOR_API_KEY="your_value"
export LANGSMITH_DEFAULT_SYSTEM_PROMPT="your_value"
export OPENROUTER_API_KEY="your_value"

# Run the agent server
audio-transcriber-agent --provider openai --model-id gpt-4o

Docker Compose Orchestration

The following docker/agent.compose.yml configures the Agent, Web UI, and Terminal Interface together:

version: '3.8'

services:
  audio-transcriber-mcp:
    image: knucklessg1/audio-transcriber:latest
    container_name: audio-transcriber-mcp
    hostname: audio-transcriber-mcp
    restart: always
    env_file:
      - ../.env
    environment:
      - PYTHONUNBUFFERED=1
      - HOST=0.0.0.0
      - PORT=8000
      - TRANSPORT=streamable-http
    ports:
      - "8000:8000"
    healthcheck:
      test: ["CMD", "python3", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 10s
    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "3"

  audio-transcriber-agent:
    image: knucklessg1/audio-transcriber:latest
    container_name: audio-transcriber-agent
    hostname: audio-transcriber-agent
    restart: always
    depends_on:
      - audio-transcriber-mcp
    env_file:
      - ../.env
    command: [ "audio-transcriber-agent" ]
    environment:
      - PYTHONUNBUFFERED=1
      - HOST=0.0.0.0
      - PORT=9014
      - MCP_URL=http://audio-transcriber-mcp:8000/mcp
      - PROVIDER=${PROVIDER:-openai}
      - MODEL_ID=${MODEL_ID:-gpt-4o}
      - ENABLE_WEB_UI=True
      - ENABLE_OTEL=True
    ports:
      - "9014:9014"
    healthcheck:
      test: ["CMD", "python3", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:9014/health')"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 10s
    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "3"

Detailed graph node architecture explanations, custom skill configurations, and agentic trace guides are available in docs/agent.md.


Security & Governance

Built directly upon the enterprise-ready agent-utilities core, standard security parameters are fully supported:

Access Control & Policy Enforcement

  • Eunomia Policies: Fine-grained, policy-driven tool authorization. Supports none, local embedded (mcp_policies.json), or centralized remote modes.
  • OIDC Token Delegation: Compliant with RFC 8693 token exchange for flowing authenticating user credentials from Web UI / ACP → Agent → MCP.
  • Scoped Credentials: Execution context runs restricted to the specific caller identity.

Runtime Security Grid

Feature Functionality Enablement
Tool Guard Sensitivity inspection with human-in-the-loop validation Enabled by default
Prompt Injection Defense Input scanning, repetition monitoring, and recursive loop blocks Enabled by default
Context Safety Guard Stuck-loop detectors and contextual overflow preemptive alerts Enabled by default

Environment Variables Reference

The following environment variables configure the runtime behavior of the agent, MCP server, and underlying dependencies:

Environment Variable Description Default / Example
AUDIO_PROCESSING_TOOL Toggle the audio processing tool module. True
AUDIO_PROCESSINGTOOL Boolean flag for enabling internal audio processing tools. True
AUTH_TYPE Security authentication type to apply (e.g., jwt, none). none
EUNOMIA_POLICY_FILE Path to the Eunomia security guardrail policies JSON file. mcp_policies.json
EUNOMIA_TYPE Eunomia guardrail deployment type (e.g., none, embedded, remote). none
OTEL_EXPORTER_OTLP_ENDPOINT OpenTelemetry collector endpoint for exporting traces. http://localhost:4317
WHISPER_MODEL Standard OpenAI Whisper model to use for local transcription (e.g., base, tiny, small). base

Installation

Install the Python package locally:

# Using uv (highly recommended)
uv pip install audio-transcriber[all]

# Using standard pip
python -m pip install audio-transcriber[all]

Repository Owners

GitHub followers GitHub User's stars


Contribute

Contributions are welcome! Please ensure code quality by executing local checks before submitting pull requests:

  • Format code using ruff format .
  • Lint code using ruff check .
  • Validate type-safety with mypy .
  • Execute test suites using pytest

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audio_transcriber-0.25.0.tar.gz (36.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

audio_transcriber-0.25.0-py3-none-any.whl (39.4 kB view details)

Uploaded Python 3

File details

Details for the file audio_transcriber-0.25.0.tar.gz.

File metadata

  • Download URL: audio_transcriber-0.25.0.tar.gz
  • Upload date:
  • Size: 36.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for audio_transcriber-0.25.0.tar.gz
Algorithm Hash digest
SHA256 a95c5afa3d1c081b6dff6de8c71ebfd063395112011c51c61e7430e5b39d9ada
MD5 97f957111cf2772c1f78fb6c5416d250
BLAKE2b-256 4275dff411eab43fc014c968f6dab4fb5dd67d556d327e659dd2abecab510e9a

See more details on using hashes here.

File details

Details for the file audio_transcriber-0.25.0-py3-none-any.whl.

File metadata

File hashes

Hashes for audio_transcriber-0.25.0-py3-none-any.whl
Algorithm Hash digest
SHA256 41c256764b3b16c970863b95d6b1bf24dd4f5dc877444c412ddc5e09654b938e
MD5 6a72abb87fc5e2c266e99249d7817599
BLAKE2b-256 610181c50c5c8c1000b530b1af500490239658fe7ee00bb213ee4ceba6f4ac8f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page