Skip to main content

Audio-to-Markdown transcription optimized for AI consumption

Project description

Audium logo

Audium

🎧 Audio → AI‑optimized Markdown
Transcribe MP3/WAV/FLAC into clean, token‑efficient Markdown — ready for any LLM.

Python 3.10+ MIT License PyPI version faster-whisper Platform

English · Русский · 中文


✨ Why Audium?

Feed audio to an LLM. Get answers. Simple.

But raw transcripts burn tokens on noise: long timestamps, filler words, silent segments, markup that adds nothing.

Audium turns speech into the minimum viable Markdown: every character counts, nothing wasted.

🎯 🪙 👁️ 🌍
3 formats GPU‑accelerated Token‑aware Watch mode ~97 languages
compact, minimal, structured 2–10× real‑time on CUDA [MM:SS] + VAD + filler‑strip drop files → auto‑transcribe tiny to large‑v3

📦 Install

pip install audium-md

Requires ffmpeg on your system: sudo apt install ffmpeg / brew install ffmpeg


🚀 Quick Start

# Process a folder
audium run ./my-recordings/

# Single file
audium run lecture.mp3

# Watch folder — auto‑transcribe new files
audium watch ./incoming/

# See what you've transcribed
audium list

# Change model
audium config set model large-v3

📝 Formats

compact (default)

# lecture.mp3 (01:23:45)

[00:00] Neural networks learn hierarchical representations
[00:04] Each layer detects increasingly abstract features
[00:08] Early layers find edges and textures
[00:12] Later layers detect objects and scenes

minimal

Neural networks learn hierarchical representations
Each layer detects increasingly abstract features
Early layers find edges and textures
Later layers detect objects and scenes

structured (requires speaker diarization)

# interview.mp3 (00:45:12)

## Alice [00:00-00:30]
Neural networks are a powerful tool. It's important to understand their limitations.

## Bob [00:30-01:15]
I completely agree. Let me walk through an example to make this concrete.

⚙️ Commands

Command Description
audium run <path> Transcribe audio files or folders
audium watch <path> Watch folder and auto‑process new files
audium list [dir] Show processed transcripts with file sizes
audium config Show current configuration
audium config set <key> <value> Change a setting
audium config reset Reset to factory defaults
audium config path Show config file location

Common flags for run and watch

Flag Default Description
-o, --output-dir ./transcripts Where to save .md files
-f, --format compact compact / minimal / structured
-r, --recursive off Search subdirectories
--model small tiny / base / small / medium / large-v3
--language auto Force language code: ru, en, zh, ...
--strip-fillers off Remove "um", "uh", "like", "мм", "ээ", etc.
--no-vad off Disable voice activity detection
--no-progress off Hide the progress bar

🔧 Configuration

Settings are merged: CLI flags > .audium.yaml (project) > ~/.config/audium/config.yaml > defaults

# Set default model
audium config set model large-v3

# Always strip filler words
audium config set strip_fillers true

# Custom output folder
audium config set output_dir ~/Documents/transcripts

# See what you changed
audium config
# Example .audium.yaml (place in project root)
model: medium
language: ru
format: minimal
output_dir: ./transcripts

🪙 Token Optimization

Audium is built to minimize LLM token cost:

Technique Savings
[MM:SS] instead of [HH:MM:SS.mmm] ~30% on timestamps
VAD filtering (skip silence) 15–40% on meeting recordings
Filler‑word stripping 5–10% on conversational speech
min_segment_duration threshold skip noise fragments
One line per segment, no blank lines ~8% vs paragraph output

📊 Model Sizes

Model Parameters Speed (GPU) Best for
tiny 39M ~32× real‑time Quick drafts, low‑resource
base 74M ~16× real‑time Dictation, clean audio
small 244M ~6× real‑time General purpose
medium 769M ~2× real‑time Accents, noisy audio
large‑v3 1.5B ~1× real‑time Maximum accuracy

All multilingual models support the same ~97 languages. The size trades accuracy for speed.


📄 License

MIT License

MIT — do whatever you want. Attribution appreciated.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audium_md-0.1.0.tar.gz (18.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

audium_md-0.1.0-py3-none-any.whl (12.5 kB view details)

Uploaded Python 3

File details

Details for the file audium_md-0.1.0.tar.gz.

File metadata

  • Download URL: audium_md-0.1.0.tar.gz
  • Upload date:
  • Size: 18.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for audium_md-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9623a7312315e26a1e25b2b1d172e294fc1a8448dde3d80ed42bf69a166d69fa
MD5 3298508ce6d502179fdef8b0a5b582d4
BLAKE2b-256 84ee2d9ae14eb51c528f3f7ab77aea9c8bd1b9a86ea369c4bf3a545093ad127e

See more details on using hashes here.

File details

Details for the file audium_md-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: audium_md-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 12.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for audium_md-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c8091d2ab86dc041f5b2a50b6943d9e4abc818b7625dcca4ed238cf0920f8156
MD5 f06c614dd1062beaf060531166740b97
BLAKE2b-256 15151cc0bb745121f020bdfe9a1c9720f00677355e1611a4b5aee69f5a231d6c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page