Skip to main content

Rule-based markdown compression + context extraction for LLM consumption. Reduces token usage by 20-95%.

Project description

mdmin

Rule-based markdown compression for LLM consumption. Reduces token usage by 20–35%.

Website: mdmin.devnpm: npmjs.com/package/mdmin

Install

pip install mdmin

Zero dependencies. Python 3.9+.

Usage

Python API

from mdmin import compress, estimate_tokens

result = compress(text, level="medium")

print(result.output)         # compressed text
print(result.stats.saved)    # tokens saved
print(result.stats.pct)      # % reduction
# CompressResult(output=..., stats=CompressionStats(input_tokens=2273, output_tokens=1765, saved=508, pct=22.3, ...))

CLI

# Compress a file (output to stdout)
mdmin compress README.md

# Save to file
mdmin compress README.md -o README.min.md

# Compression level
mdmin compress README.md --level aggressive

# Show token stats across all levels
mdmin stats README.md

# Pipe from stdin
cat file.md | mdmin compress -

Compression Levels

Level Savings What it does
light ~10% Whitespace, comments, basic verbose patterns
medium ~20-25% + more verbose patterns, table compression, formatting cleanup
aggressive ~25-35% + article stripping, list compression, bold removal, dictionary dedup

What It Compresses

  • Verbose phrases: 150+ patterns — "In order to" → "To", "Due to the fact that" → "Because"
  • Whitespace: Blank lines, trailing spaces, decorative horizontal rules
  • Tables: Markdown tables → compact CSV or key:value format
  • Formatting: Redundant bold on headers, deep heading nesting, emphasis markers
  • Lists: Short bullet lists → inline comma-separated (aggressive)
  • Links: Empty titles, unused references, verbose alt text
  • Dictionary dedup: Repeated phrases replaced with §1, §2 tokens

API Reference

compress(text: str, level: str = "medium") -> CompressResult
  • level: "light" | "medium" | "aggressive"
  • Returns CompressResult with .output (str) and .stats (CompressionStats)
estimate_tokens(text: str) -> int

Fast BPE token count estimate (no external dependencies).

CompressResult

result.output          # str — compressed text
result.stats           # CompressionStats

CompressionStats

stats.input_tokens     # int
stats.output_tokens    # int
stats.saved            # int (input - output)
stats.pct              # float (% saved)
stats.input_chars      # int
stats.output_chars     # int
stats.level            # str
stats.dictionary       # int (dedup entries created)

License

AGPL-3.0-only

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mdmin-1.1.0.tar.gz (31.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mdmin-1.1.0-py3-none-any.whl (26.2 kB view details)

Uploaded Python 3

File details

Details for the file mdmin-1.1.0.tar.gz.

File metadata

  • Download URL: mdmin-1.1.0.tar.gz
  • Upload date:
  • Size: 31.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.8

File hashes

Hashes for mdmin-1.1.0.tar.gz
Algorithm Hash digest
SHA256 fe359e17eb0c5e1112c57866abe28a8249e4f095c8f7f2c5970cfdf641d40e16
MD5 79c816ed8d75a738b628de4728bd9be6
BLAKE2b-256 42388c9566fa24df310e42b299d6d599669b50d1a608fd3dab55fa81b3b7afd8

See more details on using hashes here.

File details

Details for the file mdmin-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: mdmin-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 26.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.8

File hashes

Hashes for mdmin-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f0ec547c6e586a8748e66f6345d90506a91b145f841160aafb8f964471205b91
MD5 d4342656b1d2e10d2cda54deb1985d3f
BLAKE2b-256 056ef6aaf41dd4428613b1ee5934d3739dc5c4d745c6b49951e09e776ab35d0a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page