Rule-based markdown compression + context extraction for LLM consumption. Reduces token usage by 20-95%.
Project description
mdmin
Markdown compression + context extraction for LLM consumption. Reduces token usage by 20–95%.
Website: mdmin.dev • npm: npmjs.com/package/mdmin
Install
pip install mdmin
Zero dependencies. Python 3.9+.
Compress
Strip verbose phrases, redundant formatting, and structural waste. 13–35% token savings.
from mdmin import compress, estimate_tokens
result = compress(text, level="medium")
print(result.output) # compressed text
print(result.stats.pct) # e.g. 22.3 (%)
print(result.stats.saved) # tokens saved
mdmin compress README.md # stdout
mdmin compress README.md -o README.min.md # save to file
mdmin compress README.md --level aggressive
mdmin stats README.md # compare all levels
cat file.md | mdmin compress - # stdin
Extract
Given a large document and a query, returns only the relevant chunks within a token budget. TF-IDF based — no external API, no vector database, runs in milliseconds. 70–95% reduction on targeted queries.
from mdmin import extract
result = extract(large_doc, "how does auth work", max_tokens=2000)
print(result.text) # relevant chunks only
print(result.stats.reduction) # e.g. 91.2 (%)
print(result.stats.chunks_extracted) # e.g. 2 of 24 chunks
mdmin extract bigdoc.md -q "how does auth work"
mdmin extract bigdoc.md -q "database schema" --max 1500
For advanced use:
from mdmin import ContextExtractor
extractor = ContextExtractor()
extractor.index(large_doc)
result = extractor.extract("auth flow", max_tokens=2000)
# Multi-doc: score chunks globally across files
scored = extractor.score_chunks("auth flow")
Compression Levels
| Level | Savings | What it does |
|---|---|---|
light |
~10% | Whitespace, comments, basic verbose patterns |
medium |
~20-25% | + more verbose patterns, table compression, formatting cleanup |
aggressive |
~25-35% | + article stripping, list compression, bold removal, dictionary dedup |
API Reference
compress
compress(text: str, level: str = "medium") -> CompressResult
Returns CompressResult with .output (str) and .stats (CompressionStats):
stats.input_tokens # int
stats.output_tokens # int
stats.saved # int
stats.pct # float (% saved)
stats.input_chars # int
stats.output_chars # int
stats.level # str
extract
extract(text: str, query: str, *, max_tokens: int = 2000) -> ExtractResult
Returns ExtractResult with .text (str) and .stats (ExtractStats):
stats.total_doc_tokens # int
stats.extracted_tokens # int
stats.chunks_total # int
stats.chunks_extracted # int
stats.reduction # float (% reduction)
stats.top_scores # list[TopScore]
estimate_tokens
estimate_tokens(text: str) -> int
Fast BPE token count estimate (no external dependencies).
License
AGPL-3.0-only
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mdmin-1.1.1.tar.gz.
File metadata
- Download URL: mdmin-1.1.1.tar.gz
- Upload date:
- Size: 31.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
66400f5eaff3909cc6b55daff7fa3a10e4c160c3c4ec649ba26414c94a422cc0
|
|
| MD5 |
cb81074ca240f6155c21ddf2d2d62a09
|
|
| BLAKE2b-256 |
1bc571a1dac18317416c2d128c1ab6a1308435348eb95967ef489b2f63db3d9a
|
File details
Details for the file mdmin-1.1.1-py3-none-any.whl.
File metadata
- Download URL: mdmin-1.1.1-py3-none-any.whl
- Upload date:
- Size: 26.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
18d18713eec13025efbf9a7928ec1e6613954afa324f9ad26d0aa6b264b666ba
|
|
| MD5 |
f640e9f15f60240ea03aea6af661c7a4
|
|
| BLAKE2b-256 |
b386c65c722e423e9dea154bb25c5f3be4708e1324a47bcb5a15ac98be2466b7
|