Audio-to-Markdown transcription optimized for AI consumption
Project description
Audium
🎧 Audio → AI‑optimized Markdown
Transcribe MP3/WAV/FLAC into clean, token‑efficient Markdown — ready for any LLM.
✨ Why Audium?
Feed audio to an LLM. Get answers. Simple.
But raw transcripts burn tokens on noise: long timestamps, filler words, silent segments, markup that adds nothing.
Audium turns speech into the minimum viable Markdown: every character counts, nothing wasted.
| 🎯 | ⚡ | 🪙 | 👁️ | 🌍 |
|---|---|---|---|---|
| 5 formats | GPU‑accelerated | Token‑aware | Watch + URL + GUI | ~97 languages |
| compact, minimal, structured, srt, vtt | 2–10× real‑time on CUDA | [MM:SS] + VAD + filler‑strip |
files, URLs, desktop, REST API | tiny to large‑v3 + turbo |
📦 Install
Requires ffmpeg: sudo apt install ffmpeg / brew install ffmpeg
Recommended: pipx (isolated, no conflicts)
pipx install audium-md
pipxcreates its own virtual environment — works on Ubuntu/Debian without PEP 668 errors. Install pipx first:sudo apt install pipxorpython3 -m pip install --user pipx
Alternative: uv tool (fastest)
uv tool install audium-md
Fallback: pip with override
pip install audium-md --break-system-packages
Local development
git clone https://github.com/Tamukj/Audium.git
cd Audium
python3 -m venv .venv
source .venv/bin/activate
pip install -e .
🚀 Quick Start
# Process a folder
audium run ./my-recordings/
# Single file
audium run lecture.mp3
# Watch folder — auto‑transcribe new files
audium watch ./incoming/
# See what you've transcribed
audium list
# Change model
audium config set model large-v3
📝 Formats
compact (default)
# lecture.mp3 (01:23:45)
[00:00] Neural networks learn hierarchical representations
[00:04] Each layer detects increasingly abstract features
[00:08] Early layers find edges and textures
[00:12] Later layers detect objects and scenes
minimal
Neural networks learn hierarchical representations
Each layer detects increasingly abstract features
Early layers find edges and textures
Later layers detect objects and scenes
structured (requires speaker diarization)
# interview.mp3 (00:45:12)
## Alice [00:00-00:30]
Neural networks are a powerful tool. It's important to understand their limitations.
## Bob [00:30-01:15]
I completely agree. Let me walk through an example to make this concrete.
srt (SubRip subtitles)
1
00:00:00,000 --> 00:00:04,000
Neural networks learn hierarchical representations
2
00:00:04,000 --> 00:00:08,000
Each layer detects increasingly abstract features
vtt (WebVTT — HTML5 & browser)
WEBVTT
00:00:00.000 --> 00:00:04.000
Neural networks learn hierarchical representations
00:00:04.000 --> 00:00:08.000
Each layer detects increasingly abstract features
⚙️ Commands
| Command | Description |
|---|---|
audium run <path> |
Transcribe audio files or folders |
audium watch <path> |
Watch folder and auto‑process new files |
audium list [dir] |
Show processed transcripts with file sizes |
audium config |
Show current configuration |
audium config set <key> <value> |
Change a setting |
audium config reset |
Reset to factory defaults |
audium config path |
Show config file location |
audium tui [dir] |
Interactive terminal UI with file browser |
audium desktop |
Launch the desktop GUI (Flet) |
audium serve |
Start REST API server (FastAPI) |
Common flags for run and watch
| Flag | Default | Description |
|---|---|---|
-o, --output-dir |
./transcripts |
Where to save .md files |
-f, --format |
compact |
compact / minimal / structured |
-r, --recursive |
off | Search subdirectories |
--model |
small |
tiny / base / small / medium / large-v3 |
--language |
auto |
Force language code: ru, en, zh, ... |
--strip-fillers |
off | Remove "um", "uh", "like", "мм", "ээ", etc. |
--no-vad |
off | Disable voice activity detection |
--no-progress |
off | Hide the progress bar |
--vocabulary |
— | Custom words to bias Whisper (e.g. "RAG,LoRA") |
--translate <lang> |
— | Translate output to target language (ru, zh, ...) |
--ask <prompt> |
— | Send transcript to GPT/Claude for summarization |
--workers <N> |
1 | Parallel transcription across N threads |
--diarize |
off | Speaker diarization via pyannote (needs HF_TOKEN) |
--chapters |
off | Auto-detect topic changes and add headings |
--api-key |
env | OpenAI-compatible API key for --ask |
🔧 Configuration
Settings are merged: CLI flags > .audium.yaml (project) > ~/.config/audium/config.yaml > defaults
# Show current config
audium config
# Set a value
audium config set model large-v3
audium config set strip_fillers true
audium config set output_dir ~/Documents/transcripts
# Also works as a shorthand:
echo "audium config model large-v3" → now supported!
# Reset to factory defaults
audium config reset
# Show config file path
audium config path
All Settings
audium config
Output showing current values with accepted options in parentheses:
beam_size: 5 (integer 1-20)
compute_type: auto (auto, float16, int8_float16, int8)
device: cuda (cuda, cpu)
format: compact (compact, minimal, structured)
language: auto (e.g. auto, ru, en, zh, ...)
min_segment_duration: 0.0 (float, seconds)
model: small (tiny, base, small, medium, large-v3, turbo)
output_dir: ./transcripts (path)
recursive: false (true / false)
strip_fillers: false (true / false)
vad_filter: true (true / false)
Setting Reference
| Key | Default | Description | Options |
|---|---|---|---|
model |
small |
Whisper model size | tiny, base, small, medium, large-v3, turbo |
device |
auto |
Computation device (auto-detect) | auto, cuda, cpu |
compute_type |
auto |
Precision for GPU inference | auto, float16, int8_float16, int8 |
format |
compact |
Output Markdown format | compact, minimal, structured |
language |
auto |
Source language | auto, or any ISO code (ru, en, zh, ...) |
beam_size |
5 |
Beam search width | integer (1-20) |
output_dir |
./transcripts |
Where .md files are saved | any path |
strip_fillers |
false |
Remove filler words | true / false |
vad_filter |
true |
Voice Activity Detection | true / false |
min_segment_duration |
0.0 |
Skip segments shorter than N seconds | float |
recursive |
false |
Scan subdirectories | true / false |
compute_type auto-detection: On GPUs with compute capability ≥ 7.0 (Volta+),
float16is used for best performance. On Pascal GPUs (GTX 10xx),int8_float16is used. On CPU,int8is used.
Local config file
Create .audium.yaml in your project root to override defaults per-project:
model: medium
language: ru
format: minimal
output_dir: ./transcripts
First run — model download
On first use, Audium downloads the Whisper model from HuggingFace Hub (~500 MB for small).
The model is cached locally in ~/.cache/huggingface/hub/ — subsequent runs are instant and fully offline.
Why HuggingFace? The models are too large (~500 MB–3 GB) to bundle in a pip package or GitHub repo. They're downloaded once, then cached forever.
Is this legal? Yes. All components are MIT licensed: Whisper (OpenAI), faster-whisper, CTranslate2. Free for personal and commercial use.
🪙 Token Optimization
Audium is built to minimize LLM token cost:
| Technique | Savings |
|---|---|
[MM:SS] instead of [HH:MM:SS.mmm] |
~30% on timestamps |
| VAD filtering (skip silence) | 15–40% on meeting recordings |
| Filler‑word stripping | 5–10% on conversational speech |
min_segment_duration threshold |
skip noise fragments |
| One line per segment, no blank lines | ~8% vs paragraph output |
📊 Model Sizes
| Model | Parameters | Speed (GPU) | Best for |
|---|---|---|---|
| tiny | 39M | ~32× real‑time | Quick drafts, low‑resource |
| base | 74M | ~16× real‑time | Dictation, clean audio |
| small | 244M | ~6× real‑time | General purpose |
| medium | 769M | ~2× real‑time | Accents, noisy audio |
| large‑v3 | 1.5B | ~1× real‑time | Maximum accuracy |
All multilingual models support the same ~97 languages. The size trades accuracy for speed.
🚀 Features
YouTube & URL support
Transcribe any YouTube video, podcast, or audio URL directly:
pip install audium-md[yt]
audium run "https://www.youtube.com/watch?v=dQw4w9WgXcQ"
Works with 1,700+ sites via yt-dlp (YouTube, Twitter/X, Vimeo, TikTok, ...).
AI Chapter detection
Auto-detect topic changes with pure math — no external API, instant:
audium run lecture.mp3 --chapters
Adds a clickable chapter index at the top of the transcript:
## 📑 Chapters
1. `[00:00]` Introduction · Neural · Networks
2. `[15:30]` Gradient · Descent · Optimization
3. `[42:00]` Transformers · Attention · Mechanism
LLM-ready output
Post-process directly with GPT, Claude, or any OpenAI-compatible API:
export OPENAI_API_KEY=sk-...
audium run podcast.mp3 --ask "summarize key takeaways in 3 bullet points"
Output includes both the transcript and the LLM response. Works with any
OpenAI-compatible endpoint (--api-base http://localhost:11434/v1 for Ollama).
Translate
Transcribe in one language, output in another:
audium run ru-lecture.mp3 --translate en
Speaker diarization
Identify who said what:
pip install audium-md[diarize]
export HF_TOKEN=hf_...
audium run meeting.mp3 --diarize -f structured
🖥️ Desktop GUI
A modern dark-themed desktop app with drag-and-drop, YouTube URL input, and all settings exposed:
pip install audium-md[desktop]
audium desktop
| Platform | Install | Run |
|---|---|---|
| Windows | pip install audium-md[desktop] |
audium desktop |
| Linux | pip install audium-md[desktop] |
audium desktop |
| macOS | pip install audium-md[desktop] |
audium desktop |
Bundle as standalone EXE (Windows users — no Python required):
pip install pyinstaller
python scripts/build_desktop.py
# → dist/Audium/Audium.exe (~150 MB, self-contained)
Terminal UI
Interactive file browser with live preview and keyboard shortcuts:
pip install audium-md[tui]
audium tui ./recordings/
REST API
Run Audium as a server and integrate with any application:
pip install audium-md[server]
audium serve --port 8080
curl -X POST http://localhost:8080/transcribe \
-F "file=@meeting.mp3" \
-F "format=compact"
# → {"content": "# meeting.mp3 (00:15:30)\n[00:00] ...", ...}
OpenAPI docs at http://localhost:8080/docs — try it in the browser.
🖥️ GPU Support
Audium automatically detects your GPU and configures itself:
| Hardware | Detection | Backend |
|---|---|---|
| NVIDIA (all) | nvidia-smi |
CUDA — best performance |
| AMD (ROCm) | /dev/kfd + rocm-smi |
ROCm / HIP |
| Intel (Arc, Iris) | xpu-smi / drm |
oneAPI / SYCL |
| CPU only | fallback | int8 quantized |
No manual configuration needed. Run audium run ./audio/ and it just works.
Updating
pip install --upgrade audium-md
# or: pipx upgrade audium-md
# or: uv tool upgrade audium-md
Check your current version:
audium --version
📄 License
MIT — do whatever you want. Attribution appreciated.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file audium_md-0.2.0.tar.gz.
File metadata
- Download URL: audium_md-0.2.0.tar.gz
- Upload date:
- Size: 44.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.21 {"installer":{"name":"uv","version":"0.11.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Zorin OS","version":"18","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
66addde7044ef073fea1624febd7436fcb489d22f694e0acc3f51fcabcf14bb7
|
|
| MD5 |
607d6cdabff28aba979fbcccab2d95c9
|
|
| BLAKE2b-256 |
e8f89e4ae0aaefadfd253f816f664e55968e18c40337daefd309e033e5dc35f8
|
File details
Details for the file audium_md-0.2.0-py3-none-any.whl.
File metadata
- Download URL: audium_md-0.2.0-py3-none-any.whl
- Upload date:
- Size: 38.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.21 {"installer":{"name":"uv","version":"0.11.21","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Zorin OS","version":"18","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
94ee12428e8c3f2dcaa28b14d47a2f162b0e3e19372e154f07fd2fccefe2c40c
|
|
| MD5 |
6281ef35a7c4d13349be65b8ff7535b7
|
|
| BLAKE2b-256 |
40af80f1c25c4d74ca94ae3b00475d54449affac00ec63962cf52ac49e91836e
|