Skip to main content

Rule-based markdown compression for LLM consumption. Reduces token usage by 20-35%.

Project description

mdmin

Rule-based markdown compression for LLM consumption. Reduces token usage by 20–35%.

Website: mdmin.devnpm: npmjs.com/package/mdmin

Install

pip install mdmin

Zero dependencies. Python 3.9+.

Usage

Python API

from mdmin import compress, estimate_tokens

result = compress(text, level="medium")

print(result.output)         # compressed text
print(result.stats.saved)    # tokens saved
print(result.stats.pct)      # % reduction
# CompressResult(output=..., stats=CompressionStats(input_tokens=2273, output_tokens=1765, saved=508, pct=22.3, ...))

CLI

# Compress a file (output to stdout)
mdmin compress README.md

# Save to file
mdmin compress README.md -o README.min.md

# Compression level
mdmin compress README.md --level aggressive

# Show token stats across all levels
mdmin stats README.md

# Pipe from stdin
cat file.md | mdmin compress -

Compression Levels

Level Savings What it does
light ~10% Whitespace, comments, basic verbose patterns
medium ~20-25% + more verbose patterns, table compression, formatting cleanup
aggressive ~25-35% + article stripping, list compression, bold removal, dictionary dedup

What It Compresses

  • Verbose phrases: 150+ patterns — "In order to" → "To", "Due to the fact that" → "Because"
  • Whitespace: Blank lines, trailing spaces, decorative horizontal rules
  • Tables: Markdown tables → compact CSV or key:value format
  • Formatting: Redundant bold on headers, deep heading nesting, emphasis markers
  • Lists: Short bullet lists → inline comma-separated (aggressive)
  • Links: Empty titles, unused references, verbose alt text
  • Dictionary dedup: Repeated phrases replaced with §1, §2 tokens

API Reference

compress(text: str, level: str = "medium") -> CompressResult
  • level: "light" | "medium" | "aggressive"
  • Returns CompressResult with .output (str) and .stats (CompressionStats)
estimate_tokens(text: str) -> int

Fast BPE token count estimate (no external dependencies).

CompressResult

result.output          # str — compressed text
result.stats           # CompressionStats

CompressionStats

stats.input_tokens     # int
stats.output_tokens    # int
stats.saved            # int (input - output)
stats.pct              # float (% saved)
stats.input_chars      # int
stats.output_chars     # int
stats.level            # str
stats.dictionary       # int (dedup entries created)

License

AGPL-3.0-only

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mdmin-1.0.0.tar.gz (27.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mdmin-1.0.0-py3-none-any.whl (22.1 kB view details)

Uploaded Python 3

File details

Details for the file mdmin-1.0.0.tar.gz.

File metadata

  • Download URL: mdmin-1.0.0.tar.gz
  • Upload date:
  • Size: 27.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.8

File hashes

Hashes for mdmin-1.0.0.tar.gz
Algorithm Hash digest
SHA256 847d78274eb907aa274f4122feffc4676575501fe3533eab19cf99de5b89751d
MD5 ace53320972808cd7765775591ec9205
BLAKE2b-256 6fd37af3afea755336ccfd001729694a4fefadcb98bef8f2cd8f01dfc8755c52

See more details on using hashes here.

File details

Details for the file mdmin-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: mdmin-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 22.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.8

File hashes

Hashes for mdmin-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e03a8b3c55048bf621a9a91a96c1c5cbe26fff9770e2fb239b826b2411f74722
MD5 6d1a8199017990de55f80fcc5cf1c3bd
BLAKE2b-256 a5c3b9cec0a6ebd33f2577d31e6a08aca43dc8eb28ad429b9e09ff0390b4a46d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page