Transcribe your .wav .mp4 .mp3 .flac files to text or record your own audio!
Project description
Audio-Transcriber - A2A | AG-UI | MCP
Version: 0.6.8
Overview
Transcribe your .wav .mp4 .mp3 .flac files to text or record your own audio!
This repository is actively maintained - Contributions are welcome!
Contribution Opportunities:
- Support new models
Wrapped around OpenAI Whisper
MCP
MCP Tools
| Function Name | Description | Tag(s) |
|---|---|---|
transcribe_audio |
Transcribes audio from a provided file or by recording from the microphone. | audio_processing |
A2A Agent
Architecture Summary
---
config:
layout: dagre
---
flowchart TB
subgraph subGraph0["Agent Capabilities"]
C["Agent"]
B["A2A Server - Uvicorn/FastAPI"]
D["MCP Tools"]
F["Agent Skills"]
end
C --> D & F
A["User Query"] --> B
B --> C
D --> E["Platform API"]
C:::agent
B:::server
A:::server
classDef server fill:#f9f,stroke:#333
classDef agent fill:#bbf,stroke:#333,stroke-width:2px
style B stroke:#000000,fill:#FFD600
style D stroke:#000000,fill:#BBDEFB
style F fill:#BBDEFB
style A fill:#C8E6C9
style subGraph0 fill:#FFF9C4
Component Interaction Diagram
sequenceDiagram
participant User
participant Server as A2A Server
participant Agent as Agent
participant Skill as Agent Skills
participant MCP as MCP Tools
User->>Server: Send Query
Server->>Agent: Invoke Agent
Agent->>Skill: Analyze Skills Available
Skill->>Agent: Provide Guidance on Next Steps
Agent->>MCP: Invoke Tool
MCP-->>Agent: Tool Response Returned
Agent-->>Agent: Return Results Summarized
Agent-->>Server: Final Response
Server-->>User: Output
Usage
CLI
| Short Flag | Long Flag | Description |
|---|---|---|
| -h | --help | See Usage |
| -b | --bitrate | Bitrate to use during recording |
| -c | --channels | Number of channels to use during recording |
| -d | --directory | Directory to save recording |
| -e | --export | Export txt, srt, and vtt files |
| -f | --file | File to transcribe |
| -l | --language | Language to transcribe |
| -m | --model | Model to use: <tiny, base, small, medium, large> |
| -n | --name | Name of recording |
| -r | --record | Specify number of seconds to record to record from microphone |
audio-transcriber --file '~/Downloads/Federal_Reserve.mp4' --model 'large'
audio-transcriber --record 60 --directory '~/Downloads/' --name 'my_recording.wav' --model 'tiny'
MCP CLI
| Short Flag | Long Flag | Description |
|---|---|---|
| -h | --help | Display help information |
| -t | --transport | Transport method: 'stdio', 'http', or 'sse' [legacy] (default: stdio) |
| -s | --host | Host address for HTTP transport (default: 0.0.0.0) |
| -p | --port | Port number for HTTP transport (default: 8000) |
| --auth-type | Authentication type: 'none', 'static', 'jwt', 'oauth-proxy', 'oidc-proxy', 'remote-oauth' (default: none) | |
| --token-jwks-uri | JWKS URI for JWT verification | |
| --token-issuer | Issuer for JWT verification | |
| --token-audience | Audience for JWT verification | |
| --oauth-upstream-auth-endpoint | Upstream authorization endpoint for OAuth Proxy | |
| --oauth-upstream-token-endpoint | Upstream token endpoint for OAuth Proxy | |
| --oauth-upstream-client-id | Upstream client ID for OAuth Proxy | |
| --oauth-upstream-client-secret | Upstream client secret for OAuth Proxy | |
| --oauth-base-url | Base URL for OAuth Proxy | |
| --oidc-config-url | OIDC configuration URL | |
| --oidc-client-id | OIDC client ID | |
| --oidc-client-secret | OIDC client secret | |
| --oidc-base-url | Base URL for OIDC Proxy | |
| --remote-auth-servers | Comma-separated list of authorization servers for Remote OAuth | |
| --remote-base-url | Base URL for Remote OAuth | |
| --allowed-client-redirect-uris | Comma-separated list of allowed client redirect URIs | |
| --eunomia-type | Eunomia authorization type: 'none', 'embedded', 'remote' (default: none) | |
| --eunomia-policy-file | Policy file for embedded Eunomia (default: mcp_policies.json) | |
| --eunomia-remote-url | URL for remote Eunomia server |
Using as an MCP Server
The MCP Server can be run in two modes: stdio (for local testing) or http (for networked access). To start the server, use the following commands:
Run in stdio mode (default):
audio-transcriber-mcp
Run in HTTP mode:
audio-transcriber-mcp --transport "http" --host "0.0.0.0" --port "8000"
Model Information
Courtesy of and Credits to OpenAI: Whisper.ai
| Size | Parameters | English-only model | Multilingual model | Required VRAM | Relative speed |
|---|---|---|---|---|---|
| tiny | 39 M | tiny.en |
tiny |
~1 GB | ~32x |
| base | 74 M | base.en |
base |
~1 GB | ~16x |
| small | 244 M | small.en |
small |
~2 GB | ~6x |
| medium | 769 M | medium.en |
medium |
~5 GB | ~2x |
| large | 1550 M | N/A | large |
~10 GB | 1x |
Deploy MCP Server as a Service
The ServiceNow MCP server can be deployed using Docker, with configurable authentication, middleware, and Eunomia authorization.
Using Docker Run
docker pull knucklessg1/audio-transcriber:latest
docker run -d \
--name audio-transcriber-mcp \
-p 8004:8004 \
-e HOST=0.0.0.0 \
-e PORT=8004 \
-e TRANSPORT=http \
-e AUTH_TYPE=none \
-e EUNOMIA_TYPE=none \
knucklessg1/audio-transcriber:latest
For advanced authentication (e.g., JWT, OAuth Proxy, OIDC Proxy, Remote OAuth) or Eunomia, add the relevant environment variables:
docker run -d \
--name audio-transcriber-mcp \
-p 8004:8004 \
-e HOST=0.0.0.0 \
-e PORT=8004 \
-e TRANSPORT=http \
-e AUTH_TYPE=oidc-proxy \
-e OIDC_CONFIG_URL=https://provider.com/.well-known/openid-configuration \
-e OIDC_CLIENT_ID=your-client-id \
-e OIDC_CLIENT_SECRET=your-client-secret \
-e OIDC_BASE_URL=https://your-server.com \
-e ALLOWED_CLIENT_REDIRECT_URIS=http://localhost:*,https://*.example.com/* \
-e EUNOMIA_TYPE=embedded \
-e EUNOMIA_POLICY_FILE=/app/mcp_policies.json \
knucklessg1/audio-transcriber:latest
Using Docker Compose
Create a docker-compose.yml file:
services:
audio-transcriber-mcp:
image: knucklessg1/audio-transcriber:latest
environment:
- HOST=0.0.0.0
- PORT=8004
- TRANSPORT=http
- AUTH_TYPE=none
- EUNOMIA_TYPE=none
ports:
- 8004:8004
For advanced setups with authentication and Eunomia:
services:
audio-transcriber-mcp:
image: knucklessg1/audio-transcriber:latest
environment:
- HOST=0.0.0.0
- PORT=8004
- TRANSPORT=http
- AUTH_TYPE=oidc-proxy
- OIDC_CONFIG_URL=https://provider.com/.well-known/openid-configuration
- OIDC_CLIENT_ID=your-client-id
- OIDC_CLIENT_SECRET=your-client-secret
- OIDC_BASE_URL=https://your-server.com
- ALLOWED_CLIENT_REDIRECT_URIS=http://localhost:*,https://*.example.com/*
- EUNOMIA_TYPE=embedded
- EUNOMIA_POLICY_FILE=/app/mcp_policies.json
ports:
- 8004:8004
volumes:
- ./mcp_policies.json:/app/mcp_policies.json
Run the service:
docker-compose up -d
Configure mcp.json for AI Integration
Configure mcp.json
{
"mcpServers": {
"audio_transcriber": {
"command": "uv",
"args": [
"run",
"--with",
"audio-transcriber",
"audio-transcriber-mcp"
],
"env": {
"WHISPER_MODEL": "medium", // Optional
"TRANSCRIBE_DIRECTORY": "~/Downloads" // Optional
},
"timeout": 200000
}
}
}
A2A CLI
Endpoints
- Web UI:
http://localhost:8000/(if enabled) - A2A:
http://localhost:8000/a2a(Discovery:/a2a/.well-known/agent.json) - AG-UI:
http://localhost:8000/ag-ui(POST)
| Short Flag | Long Flag | Description |
|---|---|---|
| -h | --help | Display help information |
| --host | Host to bind the server to (default: 0.0.0.0) | |
| --port | Port to bind the server to (default: 9000) | |
| --reload | Enable auto-reload | |
| --provider | LLM Provider: 'openai', 'anthropic', 'google', 'huggingface' | |
| --model-id | LLM Model ID (default: qwen3:4b) | |
| --base-url | LLM Base URL (for OpenAI compatible providers) | |
| --api-key | LLM API Key |
| | --mcp-url | MCP Server URL (default: http://localhost:8000/mcp) | | | --web | Enable Pydantic AI Web UI | False (Env: ENABLE_WEB_UI) |
Install Python Package
python -m pip install audio-transcriber
or
uv pip install --upgrade audio-transcriber
Ubuntu Dependencies
sudo apt-get update
sudo apt-get install libasound-dev portaudio19-dev libportaudio2 libportaudiocpp0 ffmpeg gcc -y
Repository Owners
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file audio_transcriber-0.6.8.tar.gz.
File metadata
- Download URL: audio_transcriber-0.6.8.tar.gz
- Upload date:
- Size: 32.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c3f5837eda9d7d287bea7c35af876171c5b2c89917520a33720b6f1cb7e27788
|
|
| MD5 |
fb9d373399ab2c6f921af680a1654fcc
|
|
| BLAKE2b-256 |
bc13ae15548d1af8d0bc8df3aee7b5654518ec296df92b6be2b8c49de7dd1929
|
File details
Details for the file audio_transcriber-0.6.8-py3-none-any.whl.
File metadata
- Download URL: audio_transcriber-0.6.8-py3-none-any.whl
- Upload date:
- Size: 31.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2b731eecad65529affef6a7d1fab700457412f3f6d0a73428317f7ad8717d277
|
|
| MD5 |
f9d59efea8048f639316a582ab001601
|
|
| BLAKE2b-256 |
ad7915fdbab90436ff1373f1445ee06a293e1c6f9d36e22f23e6c0a8c42b9417
|