Audio-to-Markdown transcription optimized for AI consumption
Project description
Audium
🎧 Audio → AI‑optimized Markdown
Transcribe MP3/WAV/FLAC into clean, token‑efficient Markdown — ready for any LLM.
✨ Why Audium?
Feed audio to an LLM. Get answers. Simple.
But raw transcripts burn tokens on noise: long timestamps, filler words, silent segments, markup that adds nothing.
Audium turns speech into the minimum viable Markdown: every character counts, nothing wasted.
| 🎯 | ⚡ | 🪙 | 👁️ | 🌍 |
|---|---|---|---|---|
| 3 formats | GPU‑accelerated | Token‑aware | Watch mode | ~97 languages |
| compact, minimal, structured | 2–10× real‑time on CUDA | [MM:SS] + VAD + filler‑strip |
drop files → auto‑transcribe | tiny to large‑v3 |
📦 Install
pip install audium-md
Requires
ffmpegon your system:sudo apt install ffmpeg/brew install ffmpeg
🚀 Quick Start
# Process a folder
audium run ./my-recordings/
# Single file
audium run lecture.mp3
# Watch folder — auto‑transcribe new files
audium watch ./incoming/
# See what you've transcribed
audium list
# Change model
audium config set model large-v3
📝 Formats
compact (default)
# lecture.mp3 (01:23:45)
[00:00] Neural networks learn hierarchical representations
[00:04] Each layer detects increasingly abstract features
[00:08] Early layers find edges and textures
[00:12] Later layers detect objects and scenes
minimal
Neural networks learn hierarchical representations
Each layer detects increasingly abstract features
Early layers find edges and textures
Later layers detect objects and scenes
structured (requires speaker diarization)
# interview.mp3 (00:45:12)
## Alice [00:00-00:30]
Neural networks are a powerful tool. It's important to understand their limitations.
## Bob [00:30-01:15]
I completely agree. Let me walk through an example to make this concrete.
⚙️ Commands
| Command | Description |
|---|---|
audium run <path> |
Transcribe audio files or folders |
audium watch <path> |
Watch folder and auto‑process new files |
audium list [dir] |
Show processed transcripts with file sizes |
audium config |
Show current configuration |
audium config set <key> <value> |
Change a setting |
audium config reset |
Reset to factory defaults |
audium config path |
Show config file location |
Common flags for run and watch
| Flag | Default | Description |
|---|---|---|
-o, --output-dir |
./transcripts |
Where to save .md files |
-f, --format |
compact |
compact / minimal / structured |
-r, --recursive |
off | Search subdirectories |
--model |
small |
tiny / base / small / medium / large-v3 |
--language |
auto |
Force language code: ru, en, zh, ... |
--strip-fillers |
off | Remove "um", "uh", "like", "мм", "ээ", etc. |
--no-vad |
off | Disable voice activity detection |
--no-progress |
off | Hide the progress bar |
🔧 Configuration
Settings are merged: CLI flags > .audium.yaml (project) > ~/.config/audium/config.yaml > defaults
# Set default model
audium config set model large-v3
# Always strip filler words
audium config set strip_fillers true
# Custom output folder
audium config set output_dir ~/Documents/transcripts
# See what you changed
audium config
# Example .audium.yaml (place in project root)
model: medium
language: ru
format: minimal
output_dir: ./transcripts
🪙 Token Optimization
Audium is built to minimize LLM token cost:
| Technique | Savings |
|---|---|
[MM:SS] instead of [HH:MM:SS.mmm] |
~30% on timestamps |
| VAD filtering (skip silence) | 15–40% on meeting recordings |
| Filler‑word stripping | 5–10% on conversational speech |
min_segment_duration threshold |
skip noise fragments |
| One line per segment, no blank lines | ~8% vs paragraph output |
📊 Model Sizes
| Model | Parameters | Speed (GPU) | Best for |
|---|---|---|---|
| tiny | 39M | ~32× real‑time | Quick drafts, low‑resource |
| base | 74M | ~16× real‑time | Dictation, clean audio |
| small | 244M | ~6× real‑time | General purpose |
| medium | 769M | ~2× real‑time | Accents, noisy audio |
| large‑v3 | 1.5B | ~1× real‑time | Maximum accuracy |
All multilingual models support the same ~97 languages. The size trades accuracy for speed.
📄 License
MIT — do whatever you want. Attribution appreciated.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file audium_md-0.1.0.tar.gz.
File metadata
- Download URL: audium_md-0.1.0.tar.gz
- Upload date:
- Size: 18.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9623a7312315e26a1e25b2b1d172e294fc1a8448dde3d80ed42bf69a166d69fa
|
|
| MD5 |
3298508ce6d502179fdef8b0a5b582d4
|
|
| BLAKE2b-256 |
84ee2d9ae14eb51c528f3f7ab77aea9c8bd1b9a86ea369c4bf3a545093ad127e
|
File details
Details for the file audium_md-0.1.0-py3-none-any.whl.
File metadata
- Download URL: audium_md-0.1.0-py3-none-any.whl
- Upload date:
- Size: 12.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c8091d2ab86dc041f5b2a50b6943d9e4abc818b7625dcca4ed238cf0920f8156
|
|
| MD5 |
f06c614dd1062beaf060531166740b97
|
|
| BLAKE2b-256 |
15151cc0bb745121f020bdfe9a1c9720f00677355e1611a4b5aee69f5a231d6c
|