Transcribe your .wav .mp4 .mp3 .flac files to text or record your own audio!

These details have not been verified by PyPI

Development Status
- 5 - Production/Stable
Environment
- Console
License
- Public Domain
Operating System
- POSIX :: Linux
Programming Language
- Python :: 3

Project description

Audio Transcriber

CLI or API | MCP | Agent

PyPI - Version MCP Server PyPI - Downloads GitHub Repo stars GitHub forks GitHub contributors PyPI - License GitHub GitHub last commit (by committer) GitHub pull requests GitHub closed pull requests GitHub issues GitHub top language GitHub language count GitHub repo size GitHub repo file count (file type) PyPI - Wheel PyPI - Implementation

Version: 0.25.0

Overview

Audio Transcriber is a production-grade Agent and Model Context Protocol (MCP) server designed to interface directly with Transcribe your .wav .mp4 .mp3 .flac files to text or record your own audio!.

Key Features

Consolidated Action-Routed MCP Tools: Minimizes token overhead and eliminates tool bloat in LLM contexts by grouping methods into optimized, togglable tool modules.
Enterprise-Grade Security: Comprehensive support for Eunomia policies, OIDC token delegation, and granular execution context tracking.
Integrated Graph Agent: Built-in Pydantic AI agent supporting the Agent Control Protocol (ACP) and standard Web interfaces (AG-UI).
Native Telemetry & Tracing: Out-of-the-box OpenTelemetry exports and native Langfuse tracing.

CLI or API

This agent wraps the Transcribe your .wav .mp4 .mp3 .flac files to text or record your own audio! API. You can interact with it programmatically or via its integrated execution entrypoints.

Detailed instructions on how to use the underlying API wrappers, extended schema bindings, and developer SDK references are maintained in docs/index.md.

MCP

This server utilizes dynamic Action-Routed tools to optimize token overhead and maximize IDE compatibility.

Available MCP Tools

Tool Module	Toggle Env Var	Enabled by Default	Description & Nested Methods
Misc	`MISC_TOOL`	`True`	Manage audio transcriber misc operations.
Audio Processing	`AUDIO_PROCESSING_TOOL`	`True`	Transcribes audio from a provided file or by recording from the microphone.

Detailed tool schemas, parameter shapes, and validation constraints are preserved in docs/mcp.md.

Dynamic Tool Selection & Visibility

This MCP server supports dynamic toolset selection and visibility filtering at runtime. This allows you to restrict the set of exposed tools in order to prevent blowing up the LLM's context window.

You can configure tool filtering via multiple input channels:

CLI Arguments: Pass --tools or --toolsets (or their disabled counterparts --disabled-tools and --disabled-toolsets) during startup.
Environment Variables: Define standard environment variables:
- MCP_ENABLED_TOOLS / MCP_DISABLED_TOOLS
- MCP_ENABLED_TAGS / MCP_DISABLED_TAGS
HTTP SSE Request Headers: Pass custom headers during transport initialization:
- x-mcp-enabled-tools / x-mcp-disabled-tools
- x-mcp-enabled-tags / x-mcp-disabled-tags
HTTP SSE Request Query Parameters: Append query parameters directly to your transport connection URL:
- ?tools=tool1,tool2
- ?tags=tag1

When query strings or parameters are supplied, an LLM-free Knowledge Graph resolution layer (using DynamicToolOrchestrator) matches query intents against known tool tags, names, or descriptions, with safe fallback and automated 24-hour background cache refreshing.

MCP Configuration Examples

stdio Transport (Recommended for local IDEs e.g., Cursor, Claude Desktop)

Configure your IDE's mcp.json to launch the MCP server via uvx:

{
  "mcpServers": {
    "audio-transcriber": {
      "command": "uvx",
      "args": [
        "--from",
        "audio-transcriber",
        "audio-transcriber-mcp"
      ],
      "env": {
        "AUDIO_TRANSCRIPTOR_API_KEY": "your_audio_transcriptor_api_key_here",
        "LANGSMITH_DEFAULT_SYSTEM_PROMPT": "your_langsmith_default_system_prompt_here",
        "OPENROUTER_API_KEY": "your_openrouter_api_key_here"
      }
    }
  }
}

Streamable-HTTP Transport (Recommended for production deployments)

Configure your client's mcp.json to launch the Streamable-HTTP server via uvx with explicit host and port definition:

{
  "mcpServers": {
    "audio-transcriber": {
      "command": "uvx",
      "args": [
        "--from",
        "audio-transcriber",
        "audio-transcriber-mcp"
      ],
      "env": {
        "TRANSPORT": "streamable-http",
        "HOST": "0.0.0.0",
        "PORT": "8000",
        "AUDIO_TRANSCRIPTOR_API_KEY": "your_audio_transcriptor_api_key_here",
        "LANGSMITH_DEFAULT_SYSTEM_PROMPT": "your_langsmith_default_system_prompt_here",
        "OPENROUTER_API_KEY": "your_openrouter_api_key_here"
      }
    }
  }
}

Alternatively, connect to a pre-deployed remote or local Streamable-HTTP instance:

{
  "mcpServers": {
    "audio-transcriber": {
      "url": "http://localhost:8000/audio-transcriber/mcp"
    }
  }
}

Deploying the Streamable-HTTP server via Docker:

docker run -d \
  --name audio-transcriber-mcp \
  -p 8000:8000 \
  -e TRANSPORT=streamable-http \
  -e PORT=8000 \
  -e AUDIO_TRANSCRIPTOR_API_KEY="your_value" \
  -e LANGSMITH_DEFAULT_SYSTEM_PROMPT="your_value" \
  -e OPENROUTER_API_KEY="your_value" \
  knucklessg1/audio-transcriber:latest

Agent

This repository features a fully integrated Pydantic AI Graph Agent. It communicates over the Agent Control Protocol (ACP) and interacts seamlessly with the Agent Web UI (AG-UI) and Terminal interface.

Running the Agent CLI

To start the interactive command-line agent:

# Set credentials
export AUDIO_TRANSCRIPTOR_API_KEY="your_value"
export LANGSMITH_DEFAULT_SYSTEM_PROMPT="your_value"
export OPENROUTER_API_KEY="your_value"

# Run the agent server
audio-transcriber-agent --provider openai --model-id gpt-4o

Docker Compose Orchestration

The following docker/agent.compose.yml configures the Agent, Web UI, and Terminal Interface together:

version: '3.8'

services:
  audio-transcriber-mcp:
    image: knucklessg1/audio-transcriber:latest
    container_name: audio-transcriber-mcp
    hostname: audio-transcriber-mcp
    restart: always
    env_file:
      - ../.env
    environment:
      - PYTHONUNBUFFERED=1
      - HOST=0.0.0.0
      - PORT=8000
      - TRANSPORT=streamable-http
    ports:
      - "8000:8000"
    healthcheck:
      test: ["CMD", "python3", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:8000/health')"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 10s
    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "3"

  audio-transcriber-agent:
    image: knucklessg1/audio-transcriber:latest
    container_name: audio-transcriber-agent
    hostname: audio-transcriber-agent
    restart: always
    depends_on:
      - audio-transcriber-mcp
    env_file:
      - ../.env
    command: [ "audio-transcriber-agent" ]
    environment:
      - PYTHONUNBUFFERED=1
      - HOST=0.0.0.0
      - PORT=9014
      - MCP_URL=http://audio-transcriber-mcp:8000/mcp
      - PROVIDER=${PROVIDER:-openai}
      - MODEL_ID=${MODEL_ID:-gpt-4o}
      - ENABLE_WEB_UI=True
      - ENABLE_OTEL=True
    ports:
      - "9014:9014"
    healthcheck:
      test: ["CMD", "python3", "-c", "import urllib.request; urllib.request.urlopen('http://localhost:9014/health')"]
      interval: 30s
      timeout: 10s
      retries: 3
      start_period: 10s
    logging:
      driver: json-file
      options:
        max-size: "10m"
        max-file: "3"

Detailed graph node architecture explanations, custom skill configurations, and agentic trace guides are available in docs/agent.md.

Security & Governance

Built directly upon the enterprise-ready agent-utilities core, standard security parameters are fully supported:

Access Control & Policy Enforcement

Eunomia Policies: Fine-grained, policy-driven tool authorization. Supports none, local embedded (mcp_policies.json), or centralized remote modes.
OIDC Token Delegation: Compliant with RFC 8693 token exchange for flowing authenticating user credentials from Web UI / ACP → Agent → MCP.
Scoped Credentials: Execution context runs restricted to the specific caller identity.

Runtime Security Grid

Feature	Functionality	Enablement
Tool Guard	Sensitivity inspection with human-in-the-loop validation	Enabled by default
Prompt Injection Defense	Input scanning, repetition monitoring, and recursive loop blocks	Enabled by default
Context Safety Guard	Stuck-loop detectors and contextual overflow preemptive alerts	Enabled by default

Environment Variables Reference

The following environment variables configure the runtime behavior of the agent, MCP server, and underlying dependencies:

Environment Variable	Description	Default / Example
`AUDIO_PROCESSING_TOOL`	Toggle the audio processing tool module.	`True`
`AUDIO_PROCESSINGTOOL`	Boolean flag for enabling internal audio processing tools.	`True`
`AUTH_TYPE`	Security authentication type to apply (e.g., `jwt`, `none`).	`none`
`EUNOMIA_POLICY_FILE`	Path to the Eunomia security guardrail policies JSON file.	`mcp_policies.json`
`EUNOMIA_TYPE`	Eunomia guardrail deployment type (e.g., `none`, `embedded`, `remote`).	`none`
`OTEL_EXPORTER_OTLP_ENDPOINT`	OpenTelemetry collector endpoint for exporting traces.	`http://localhost:4317`
`WHISPER_MODEL`	Standard OpenAI Whisper model to use for local transcription (e.g., `base`, `tiny`, `small`).	`base`

Installation

Install the Python package locally:

# Using uv (highly recommended)
uv pip install audio-transcriber[all]

# Using standard pip
python -m pip install audio-transcriber[all]

Repository Owners

GitHub followers GitHub User's stars

Contribute

Contributions are welcome! Please ensure code quality by executing local checks before submitting pull requests:

Format code using ruff format .
Lint code using ruff check .
Validate type-safety with mypy .
Execute test suites using pytest

Project details

These details have not been verified by PyPI

Development Status
- 5 - Production/Stable
Environment
- Console
License
- Public Domain
Operating System
- POSIX :: Linux
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.35.0

Jun 25, 2026

0.33.2

Jun 11, 2026

0.33.1

Jun 11, 2026

0.33.0

Jun 10, 2026

0.32.0

Jun 6, 2026

0.30.0

Jun 4, 2026

0.28.0

Jun 1, 2026

0.27.2

Jun 1, 2026

0.27.0

May 31, 2026

This version

0.25.0

May 29, 2026

0.23.0

May 29, 2026

0.18.0

May 22, 2026

0.15.1

May 21, 2026

0.15.0

May 18, 2026

0.14.0

May 11, 2026

0.13.0

May 9, 2026

0.12.0

May 8, 2026

0.11.2

May 6, 2026

0.7.0

Apr 30, 2026

0.6.55

Apr 17, 2026

0.6.54

Apr 10, 2026

0.6.48

Mar 25, 2026

0.6.46

Mar 20, 2026

0.6.45

Mar 18, 2026

0.6.42

Mar 14, 2026

0.6.41

Mar 13, 2026

0.6.40

Mar 10, 2026

0.6.33

Mar 8, 2026

0.6.32

Mar 7, 2026

0.6.29

Mar 6, 2026

0.6.28

Mar 6, 2026

0.6.26

Mar 4, 2026

0.6.25

Mar 4, 2026

0.6.24

Mar 3, 2026

0.6.23

Mar 2, 2026

0.6.22

Feb 28, 2026

0.6.21

Feb 27, 2026

0.6.20

Feb 27, 2026

0.6.19

Feb 27, 2026

0.6.18

Feb 27, 2026

0.6.17

Feb 26, 2026

0.6.16

Feb 25, 2026

0.6.15

Feb 19, 2026

0.6.14

Feb 18, 2026

0.6.13

Feb 17, 2026

0.6.12

Feb 17, 2026

0.6.11

Feb 16, 2026

0.6.10

Feb 16, 2026

0.6.9

Feb 16, 2026

0.6.8

Feb 14, 2026

0.6.7

Feb 12, 2026

0.6.6

Feb 12, 2026

0.6.5

Feb 12, 2026

0.6.4

Feb 11, 2026

0.6.2

Feb 11, 2026

0.6.1

Feb 10, 2026

0.5.78

Feb 10, 2026

0.5.77

Feb 9, 2026

0.5.76

Feb 7, 2026

0.5.75

Feb 1, 2026

0.5.74

Jan 31, 2026

0.5.73

Jan 29, 2026

0.5.72

Jan 29, 2026

0.5.71

Jan 29, 2026

0.5.70

Jan 28, 2026

0.5.69

Jan 28, 2026

0.5.68

Jan 26, 2026

0.5.67

Jan 25, 2026

0.5.66

Jan 24, 2026

0.5.65

Jan 21, 2026

0.5.64

Jan 19, 2026

0.5.63

Jan 19, 2026

0.5.62

Oct 29, 2025

0.5.61

Oct 29, 2025

0.5.58

Oct 29, 2025

0.5.57

Oct 21, 2025

0.5.56

Oct 21, 2025

0.5.55

Oct 17, 2025

0.5.54

Oct 17, 2025

0.5.53

Oct 17, 2025

0.5.52

Oct 6, 2025

0.5.51

Oct 5, 2025

0.5.50

Oct 3, 2025

0.5.49

Oct 3, 2025

0.5.48

Oct 1, 2025

0.5.47

Oct 1, 2025

0.5.46

Sep 30, 2025

0.5.45

Sep 10, 2025

0.5.44

Sep 10, 2025

0.5.43

Sep 9, 2025

0.5.42

Sep 9, 2025

0.5.41

Sep 9, 2025

0.5.40

Sep 9, 2025

0.5.37

Feb 9, 2024

0.5.36

Feb 8, 2024

0.5.35

Feb 7, 2024

0.5.34

Feb 6, 2024

0.5.33

Feb 5, 2024

0.5.32

Feb 4, 2024

0.5.31

Feb 3, 2024

0.5.30

Feb 2, 2024

0.5.29

Feb 1, 2024

0.5.28

Jan 31, 2024

0.5.27

Jan 30, 2024

0.5.26

Jan 29, 2024

0.5.25

Jan 28, 2024

0.5.24

Jan 27, 2024

0.5.23

Jan 26, 2024

0.5.22

Jan 25, 2024

0.5.21

Jan 24, 2024

0.5.20

Jan 23, 2024

0.5.19

Jan 22, 2024

0.5.18

Jan 21, 2024

0.5.17

Jan 20, 2024

0.5.16

Jan 19, 2024

0.5.15

Jan 18, 2024

0.5.14

Jan 17, 2024

0.5.13

Jan 16, 2024

0.5.12

Jan 15, 2024

0.5.11

Jan 15, 2024

0.5.10

Jan 14, 2024

0.5.9

Jan 13, 2024

0.5.8

Jan 12, 2024

0.5.7

Jan 11, 2024

0.5.6

Jan 10, 2024

0.5.5

Jan 9, 2024

0.5.4

Jan 8, 2024

0.5.3

Jan 7, 2024

0.5.2

Jan 6, 2024

0.5.1

Jan 5, 2024

0.5.0

Dec 18, 2023

0.4.0

Oct 15, 2023

0.3.0

Dec 30, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audio_transcriber-0.25.0.tar.gz (36.0 kB view details)

Uploaded May 29, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

audio_transcriber-0.25.0-py3-none-any.whl (39.4 kB view details)

Uploaded May 29, 2026 Python 3

File details

Details for the file audio_transcriber-0.25.0.tar.gz.

File metadata

Download URL: audio_transcriber-0.25.0.tar.gz
Upload date: May 29, 2026
Size: 36.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for audio_transcriber-0.25.0.tar.gz
Algorithm	Hash digest
SHA256	`a95c5afa3d1c081b6dff6de8c71ebfd063395112011c51c61e7430e5b39d9ada`
MD5	`97f957111cf2772c1f78fb6c5416d250`
BLAKE2b-256	`4275dff411eab43fc014c968f6dab4fb5dd67d556d327e659dd2abecab510e9a`

See more details on using hashes here.

File details

Details for the file audio_transcriber-0.25.0-py3-none-any.whl.

File metadata

Download URL: audio_transcriber-0.25.0-py3-none-any.whl
Upload date: May 29, 2026
Size: 39.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for audio_transcriber-0.25.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`41c256764b3b16c970863b95d6b1bf24dd4f5dc877444c412ddc5e09654b938e`
MD5	`6a72abb87fc5e2c266e99249d7817599`
BLAKE2b-256	`610181c50c5c8c1000b530b1af500490239658fe7ee00bb213ee4ceba6f4ac8f`

See more details on using hashes here.

audio-transcriber 0.25.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Audio Transcriber

CLI or API | MCP | Agent

Overview

Key Features

CLI or API

MCP

Available MCP Tools

Dynamic Tool Selection & Visibility

MCP Configuration Examples

stdio Transport (Recommended for local IDEs e.g., Cursor, Claude Desktop)

Streamable-HTTP Transport (Recommended for production deployments)

Agent

Running the Agent CLI

Docker Compose Orchestration

Security & Governance

Access Control & Policy Enforcement

Runtime Security Grid

Environment Variables Reference

Installation

Repository Owners

Contribute

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes