Skip to main content

Structural prompt compression with safety gating

Project description

prompt-compress

Structural prompt compression for production LLM apps. Where LLMLingua removes individual low-perplexity tokens, this library parses your system prompt into named components (instruction, examples, constraints, style, context), uses Bayesian optimisation to search which components to keep and how aggressively to compress each, scores candidates by semantic similarity to the original, and gates every output through a post-compression validator (persona / placeholder / similarity). Prompts that are already information-dense are detected up front and passed through unchanged.

Install

pip install prompt-compress

Quickstart — production integration

from prompt_compress import PromptCompressor, CompressionFailedError

compressor = PromptCompressor()

try:
    result = compressor.compress(
        SYSTEM_PROMPT,
        min_similarity=0.80,
        on_failure='raise',
    )
    SYSTEM_PROMPT = result.compressed_text
    print(f"Saved {result.tokens_saved} tokens per call ({result.compression_ratio:.1%})")
except CompressionFailedError as e:
    print(f"Compression unsafe, using original: {e}")

on_failure accepts 'fallback' (default — return the original silently with gate_passed=False), 'raise' (raise CompressionFailedError), or 'warn' (log a warning and return the fallback). The library never blocks on user input.

Inspecting results

result = compressor.compress(SYSTEM_PROMPT)

print(result.summary())   # one-screen terminal summary
print(result.diff())      # side-by-side original vs compressed
result.to_dict()          # JSON-serialisable, useful for caching/logging

Key properties on CompressionResult:

Property Description
compressed_text the output you should use
compression_ratio tokens saved / original tokens
tokens_saved absolute token count saved
semantic_similarity cosine sim of original vs compressed (MiniLM)
compression_efficiency compression_ratio × semantic_similarity
safe_to_use True iff all validator checks passed
persona_preserved True iff the "You are…" line survived
placeholders_preserved True iff every {var} from the original is in the output
tier / tier_label which pipeline tier ran (1 BO, 2 TextRank, 3 Preserved)
density information density score used for routing

Configuration

from prompt_compress import PromptCompressor, OptimisationConfig

compressor = PromptCompressor(
    # Optimiser variants:
    use_informed_prior=False,    # seed BO with P3-derived prior
    use_attention_prior=False,   # per-prompt attention prior + ISR safety gate
    # Trade-off knob:
    alpha=0.3,                   # "auto" → 0.3 (validated benchmark default)
    # Tune BO budget:
    optimisation_config=OptimisationConfig(
        n_iterations=20, n_init=5, beta=2.0, random_seed=42,
    ),
)

min_similarity and on_failure are per-call (compressor.compress(prompt, min_similarity=…, on_failure=…)) so different parts of your app can adopt different safety bars without rebuilding the compressor.

Benchmark results

Matched-subset comparison against LLMLingua on the 38 prompts both systems successfully compressed (see research/benchmark.py and research/evaluate.py to reproduce):

Metric Ours LLMLingua
Compression ratio 24.1% 24.2%
LLM judge score (0–100) 73.3 70.2
Persona preservation 100% 53%
Compression efficiency 0.179 0.155

Compression efficiency = compression_ratio × output_similarity — rewards being high on both axes.

Citation

EMNLP manuscript in preparation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

prompt_compress-0.1.0.tar.gz (35.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

prompt_compress-0.1.0-py3-none-any.whl (43.7 kB view details)

Uploaded Python 3

File details

Details for the file prompt_compress-0.1.0.tar.gz.

File metadata

  • Download URL: prompt_compress-0.1.0.tar.gz
  • Upload date:
  • Size: 35.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for prompt_compress-0.1.0.tar.gz
Algorithm Hash digest
SHA256 dd1244af6586cef058a5c1b0e0e530d69f0612ed20d52261837cb1569e0172ac
MD5 070c92b4b27623282249f16443ef441b
BLAKE2b-256 b3250c07e14a8cfde5bc98c470d6ddc786ef8f9c2d274b36bde2e33fd30c70e3

See more details on using hashes here.

Provenance

The following attestation bundles were made for prompt_compress-0.1.0.tar.gz:

Publisher: publish.yml on joela03/bayesian-prompt-compressor-

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file prompt_compress-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for prompt_compress-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5aae73f14263044c3f553b8f0a37f787fe62221c3cdc577f51ae75c0407e1841
MD5 8f6acc87603c9de938817eb1a9914578
BLAKE2b-256 ee612e442c23b90c2747227c411dbc059338a078dde11e9db8fe221e74f871e4

See more details on using hashes here.

Provenance

The following attestation bundles were made for prompt_compress-0.1.0-py3-none-any.whl:

Publisher: publish.yml on joela03/bayesian-prompt-compressor-

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page