Rule-based markdown compression for LLM consumption. Reduces token usage by 20-35%.
Project description
mdmin
Rule-based markdown compression for LLM consumption. Reduces token usage by 20–35%.
Website: mdmin.dev • npm: npmjs.com/package/mdmin
Install
pip install mdmin
Zero dependencies. Python 3.9+.
Usage
Python API
from mdmin import compress, estimate_tokens
result = compress(text, level="medium")
print(result.output) # compressed text
print(result.stats.saved) # tokens saved
print(result.stats.pct) # % reduction
# CompressResult(output=..., stats=CompressionStats(input_tokens=2273, output_tokens=1765, saved=508, pct=22.3, ...))
CLI
# Compress a file (output to stdout)
mdmin compress README.md
# Save to file
mdmin compress README.md -o README.min.md
# Compression level
mdmin compress README.md --level aggressive
# Show token stats across all levels
mdmin stats README.md
# Pipe from stdin
cat file.md | mdmin compress -
Compression Levels
| Level | Savings | What it does |
|---|---|---|
light |
~10% | Whitespace, comments, basic verbose patterns |
medium |
~20-25% | + more verbose patterns, table compression, formatting cleanup |
aggressive |
~25-35% | + article stripping, list compression, bold removal, dictionary dedup |
What It Compresses
- Verbose phrases: 150+ patterns — "In order to" → "To", "Due to the fact that" → "Because"
- Whitespace: Blank lines, trailing spaces, decorative horizontal rules
- Tables: Markdown tables → compact CSV or key:value format
- Formatting: Redundant bold on headers, deep heading nesting, emphasis markers
- Lists: Short bullet lists → inline comma-separated (aggressive)
- Links: Empty titles, unused references, verbose alt text
- Dictionary dedup: Repeated phrases replaced with §1, §2 tokens
API Reference
compress(text: str, level: str = "medium") -> CompressResult
level:"light"|"medium"|"aggressive"- Returns
CompressResultwith.output(str) and.stats(CompressionStats)
estimate_tokens(text: str) -> int
Fast BPE token count estimate (no external dependencies).
CompressResult
result.output # str — compressed text
result.stats # CompressionStats
CompressionStats
stats.input_tokens # int
stats.output_tokens # int
stats.saved # int (input - output)
stats.pct # float (% saved)
stats.input_chars # int
stats.output_chars # int
stats.level # str
stats.dictionary # int (dedup entries created)
License
AGPL-3.0-only
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mdmin-1.0.0.tar.gz.
File metadata
- Download URL: mdmin-1.0.0.tar.gz
- Upload date:
- Size: 27.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
847d78274eb907aa274f4122feffc4676575501fe3533eab19cf99de5b89751d
|
|
| MD5 |
ace53320972808cd7765775591ec9205
|
|
| BLAKE2b-256 |
6fd37af3afea755336ccfd001729694a4fefadcb98bef8f2cd8f01dfc8755c52
|
File details
Details for the file mdmin-1.0.0-py3-none-any.whl.
File metadata
- Download URL: mdmin-1.0.0-py3-none-any.whl
- Upload date:
- Size: 22.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e03a8b3c55048bf621a9a91a96c1c5cbe26fff9770e2fb239b826b2411f74722
|
|
| MD5 |
6d1a8199017990de55f80fcc5cf1c3bd
|
|
| BLAKE2b-256 |
a5c3b9cec0a6ebd33f2577d31e6a08aca43dc8eb28ad429b9e09ff0390b4a46d
|