Skip to main content

Audio-to-Markdown transcription optimized for AI consumption

Project description

Audium logo

Audium

🎧 Audio → AI‑optimized Markdown
Transcribe MP3/WAV/FLAC into clean, token‑efficient Markdown — ready for any LLM.

Python 3.10+ MIT License PyPI version faster-whisper Platform

English · Русский · 中文


✨ Why Audium?

Feed audio to an LLM. Get answers. Simple.

But raw transcripts burn tokens on noise: long timestamps, filler words, silent segments, markup that adds nothing.

Audium turns speech into the minimum viable Markdown: every character counts, nothing wasted.

🎯 🪙 👁️ 🌍
3 formats GPU‑accelerated Token‑aware Watch mode ~97 languages
compact, minimal, structured 2–10× real‑time on CUDA [MM:SS] + VAD + filler‑strip drop files → auto‑transcribe tiny to large‑v3

📦 Install

Requires ffmpeg: sudo apt install ffmpeg / brew install ffmpeg

Recommended: pipx (isolated, no conflicts)

pipx install audium-md

pipx creates its own virtual environment — works on Ubuntu/Debian without PEP 668 errors. Install pipx first: sudo apt install pipx or python3 -m pip install --user pipx

Alternative: uv tool (fastest)

uv tool install audium-md

Fallback: pip with override

pip install audium-md --break-system-packages

Local development

git clone https://github.com/Tamukj/Audium.git
cd Audium
python3 -m venv .venv
source .venv/bin/activate
pip install -e .

🚀 Quick Start

# Process a folder
audium run ./my-recordings/

# Single file
audium run lecture.mp3

# Watch folder — auto‑transcribe new files
audium watch ./incoming/

# See what you've transcribed
audium list

# Change model
audium config set model large-v3

📝 Formats

compact (default)

# lecture.mp3 (01:23:45)

[00:00] Neural networks learn hierarchical representations
[00:04] Each layer detects increasingly abstract features
[00:08] Early layers find edges and textures
[00:12] Later layers detect objects and scenes

minimal

Neural networks learn hierarchical representations
Each layer detects increasingly abstract features
Early layers find edges and textures
Later layers detect objects and scenes

structured (requires speaker diarization)

# interview.mp3 (00:45:12)

## Alice [00:00-00:30]
Neural networks are a powerful tool. It's important to understand their limitations.

## Bob [00:30-01:15]
I completely agree. Let me walk through an example to make this concrete.

⚙️ Commands

Command Description
audium run <path> Transcribe audio files or folders
audium watch <path> Watch folder and auto‑process new files
audium list [dir] Show processed transcripts with file sizes
audium config Show current configuration
audium config set <key> <value> Change a setting
audium config reset Reset to factory defaults
audium config path Show config file location

Common flags for run and watch

Flag Default Description
-o, --output-dir ./transcripts Where to save .md files
-f, --format compact compact / minimal / structured
-r, --recursive off Search subdirectories
--model small tiny / base / small / medium / large-v3
--language auto Force language code: ru, en, zh, ...
--strip-fillers off Remove "um", "uh", "like", "мм", "ээ", etc.
--no-vad off Disable voice activity detection
--no-progress off Hide the progress bar

🔧 Configuration

Settings are merged: CLI flags > .audium.yaml (project) > ~/.config/audium/config.yaml > defaults

# Show current config
audium config

# Set a value
audium config set model large-v3
audium config set strip_fillers true
audium config set output_dir ~/Documents/transcripts

# Also works as a shorthand:
echo "audium config model large-v3"  now supported!

# Reset to factory defaults
audium config reset

# Show config file path
audium config path

All Settings

audium config

Output showing current values with accepted options in parentheses:

  beam_size: 5             (integer 1-20)
  compute_type: auto       (auto, float16, int8_float16, int8)
  device: cuda             (cuda, cpu)
  format: compact          (compact, minimal, structured)
  language: auto           (e.g. auto, ru, en, zh, ...)
  min_segment_duration: 0.0  (float, seconds)
  model: small             (tiny, base, small, medium, large-v3, turbo)
  output_dir: ./transcripts  (path)
  recursive: false         (true / false)
  strip_fillers: false     (true / false)
  vad_filter: true         (true / false)

Setting Reference

Key Default Description Options
model small Whisper model size tiny, base, small, medium, large-v3, turbo
device auto Computation device (auto-detect) auto, cuda, cpu
compute_type auto Precision for GPU inference auto, float16, int8_float16, int8
format compact Output Markdown format compact, minimal, structured
language auto Source language auto, or any ISO code (ru, en, zh, ...)
beam_size 5 Beam search width integer (1-20)
output_dir ./transcripts Where .md files are saved any path
strip_fillers false Remove filler words true / false
vad_filter true Voice Activity Detection true / false
min_segment_duration 0.0 Skip segments shorter than N seconds float
recursive false Scan subdirectories true / false

compute_type auto-detection: On GPUs with compute capability ≥ 7.0 (Volta+), float16 is used for best performance. On Pascal GPUs (GTX 10xx), int8_float16 is used. On CPU, int8 is used.

Local config file

Create .audium.yaml in your project root to override defaults per-project:

model: medium
language: ru
format: minimal
output_dir: ./transcripts

First run — model download

On first use, Audium downloads the Whisper model from HuggingFace Hub (~500 MB for small).
The model is cached locally in ~/.cache/huggingface/hub/ — subsequent runs are instant and fully offline.

Why HuggingFace? The models are too large (~500 MB–3 GB) to bundle in a pip package or GitHub repo. They're downloaded once, then cached forever.

Is this legal? Yes. All components are MIT licensed: Whisper (OpenAI), faster-whisper, CTranslate2. Free for personal and commercial use.


🪙 Token Optimization

Audium is built to minimize LLM token cost:

Technique Savings
[MM:SS] instead of [HH:MM:SS.mmm] ~30% on timestamps
VAD filtering (skip silence) 15–40% on meeting recordings
Filler‑word stripping 5–10% on conversational speech
min_segment_duration threshold skip noise fragments
One line per segment, no blank lines ~8% vs paragraph output

📊 Model Sizes

Model Parameters Speed (GPU) Best for
tiny 39M ~32× real‑time Quick drafts, low‑resource
base 74M ~16× real‑time Dictation, clean audio
small 244M ~6× real‑time General purpose
medium 769M ~2× real‑time Accents, noisy audio
large‑v3 1.5B ~1× real‑time Maximum accuracy

All multilingual models support the same ~97 languages. The size trades accuracy for speed.


🖥️ GPU Support

Audium automatically detects your GPU and configures itself:

Hardware Detection Backend
NVIDIA (all) nvidia-smi CUDA — best performance
AMD (ROCm) /dev/kfd + rocm-smi ROCm / HIP
Intel (Arc, Iris) xpu-smi / drm oneAPI / SYCL
CPU only fallback int8 quantized

No manual configuration needed. Run audium run ./audio/ and it just works.


Updating

pip install --upgrade audium-md
# or: pipx upgrade audium-md
# or: uv tool upgrade audium-md

Check your current version:

audium --version

📄 License

MIT License

MIT — do whatever you want. Attribution appreciated.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audium_md-0.1.4.tar.gz (23.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

audium_md-0.1.4-py3-none-any.whl (15.2 kB view details)

Uploaded Python 3

File details

Details for the file audium_md-0.1.4.tar.gz.

File metadata

  • Download URL: audium_md-0.1.4.tar.gz
  • Upload date:
  • Size: 23.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for audium_md-0.1.4.tar.gz
Algorithm Hash digest
SHA256 02b2c73863a481cebe65622e5c4d9951c9742623efcf657d00d87ff3ac60bfd4
MD5 17553425683de9eea0df5f42ec2fbb00
BLAKE2b-256 f03702330a41cc4d9528c39e22fa188620c5fe4b79530f8b72c34a3f2ba38a4c

See more details on using hashes here.

File details

Details for the file audium_md-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: audium_md-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 15.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for audium_md-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 03047ad227fed10d28091ffda413288d638469061084abddb9e7a4d13c22c512
MD5 1d19c84d82395273fded0befba993985
BLAKE2b-256 e58acd8a23afa2c6fe7dc8349c95ddbe296ec345dcd9e74c1c2561c702540a35

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page