Skip to main content

Rule-based markdown compression + context extraction for LLM consumption. Reduces token usage by 20-95%.

Project description

mdmin

Markdown compression + context extraction for LLM consumption. Reduces token usage by 20–95%.

Website: mdmin.devnpm: npmjs.com/package/mdmin

Install

pip install mdmin

Zero dependencies. Python 3.9+.

Compress

Strip verbose phrases, redundant formatting, and structural waste. 13–35% token savings.

from mdmin import compress, estimate_tokens

result = compress(text, level="medium")
print(result.output)       # compressed text
print(result.stats.pct)    # e.g. 22.3 (%)
print(result.stats.saved)  # tokens saved
mdmin compress README.md                    # stdout
mdmin compress README.md -o README.min.md   # save to file
mdmin compress README.md --level aggressive
mdmin stats README.md                       # compare all levels
cat file.md | mdmin compress -              # stdin

Extract

Given a large document and a query, returns only the relevant chunks within a token budget. TF-IDF based — no external API, no vector database, runs in milliseconds. 70–95% reduction on targeted queries.

from mdmin import extract

result = extract(large_doc, "how does auth work", max_tokens=2000)
print(result.text)               # relevant chunks only
print(result.stats.reduction)    # e.g. 91.2 (%)
print(result.stats.chunks_extracted)  # e.g. 2 of 24 chunks
mdmin extract bigdoc.md -q "how does auth work"
mdmin extract bigdoc.md -q "database schema" --max 1500

For advanced use:

from mdmin import ContextExtractor

extractor = ContextExtractor()
extractor.index(large_doc)
result = extractor.extract("auth flow", max_tokens=2000)

# Multi-doc: score chunks globally across files
scored = extractor.score_chunks("auth flow")

Compression Levels

Level Savings What it does
light ~10% Whitespace, comments, basic verbose patterns
medium ~20-25% + more verbose patterns, table compression, formatting cleanup
aggressive ~25-35% + article stripping, list compression, bold removal, dictionary dedup

API Reference

compress

compress(text: str, level: str = "medium") -> CompressResult

Returns CompressResult with .output (str) and .stats (CompressionStats):

stats.input_tokens     # int
stats.output_tokens    # int
stats.saved            # int
stats.pct              # float (% saved)
stats.input_chars      # int
stats.output_chars     # int
stats.level            # str

extract

extract(text: str, query: str, *, max_tokens: int = 2000) -> ExtractResult

Returns ExtractResult with .text (str) and .stats (ExtractStats):

stats.total_doc_tokens    # int
stats.extracted_tokens    # int
stats.chunks_total        # int
stats.chunks_extracted    # int
stats.reduction           # float (% reduction)
stats.top_scores          # list[TopScore]

estimate_tokens

estimate_tokens(text: str) -> int

Fast BPE token count estimate (no external dependencies).

License

AGPL-3.0-only

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mdmin-1.1.1.tar.gz (31.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mdmin-1.1.1-py3-none-any.whl (26.2 kB view details)

Uploaded Python 3

File details

Details for the file mdmin-1.1.1.tar.gz.

File metadata

  • Download URL: mdmin-1.1.1.tar.gz
  • Upload date:
  • Size: 31.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.8

File hashes

Hashes for mdmin-1.1.1.tar.gz
Algorithm Hash digest
SHA256 66400f5eaff3909cc6b55daff7fa3a10e4c160c3c4ec649ba26414c94a422cc0
MD5 cb81074ca240f6155c21ddf2d2d62a09
BLAKE2b-256 1bc571a1dac18317416c2d128c1ab6a1308435348eb95967ef489b2f63db3d9a

See more details on using hashes here.

File details

Details for the file mdmin-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: mdmin-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 26.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.8

File hashes

Hashes for mdmin-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 18d18713eec13025efbf9a7928ec1e6613954afa324f9ad26d0aa6b264b666ba
MD5 f640e9f15f60240ea03aea6af661c7a4
BLAKE2b-256 b386c65c722e423e9dea154bb25c5f3be4708e1324a47bcb5a15ac98be2466b7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page