Skip to main content

An intelligent, hierarchical context compression framework for LLM memory systems.

Project description

CompactPy 🧠⚡

An intelligent, multi-evolutionary hierarchical memory and context compression framework designed to optimize LLM prompt footprints and eliminate token bloat in RAG pipelines.

PyPI Version GitHub Actions CI License: MIT GitHub


🚀 The Core Problem

Large Language Models have finite, expensive context windows. Storing raw, repetitive conversational history, system clutter, and loose narrative prose directly in the prompt window leads to massive API billing inflation, elevated system latency, and model confusion due to key context dilution.

CompactPy solves this. By mimicking cognitive memory tiers, vector math similarities, and directed knowledge graphs, it drops prompt footprints by 40%+ while perfectly preserving deep engineering states and concept dependencies.


🛠️ Multi-Evolutionary Architecture

CompactPy processes raw runtime context streams across six specialized optimization phases:

1. Token Analytics Core (compactpy.core)

Uses high-speed BPE tokenization via tiktoken to run precision boundaries, calculating exact text lengths and tracking compression metrics down to individual bits.

2. Algorithmic Compression Engines (compactpy.compressors)

  • Exact Deduplication Engine: Automatically strips out repetitive context loops and chronological logs while keeping structural stream order intact.
  • Semantic Compressor: Embeds data blocks via SentenceTransformer, executing vector Cosine Similarity arrays to eliminate overlapping thoughts (e.g., keeping only one variation of a phrase if similarity crosses a 0.75 threshold).

3. Hierarchical Memory Repository (compactpy.memory)

Isolates text strings into explicit cognitive abstraction layers based on real-world utility:

  • raw_memory: The volatile, incoming execution log dump.
  • working_memory: Active short-term operational buffers available for immediate context retrieval.
  • long_term_memory: High-value project parameters and user rules that never decay.

4. Memory Scoring Engine (compactpy.memory.scoring)

Memories are evaluated dynamically using a custom, long-horizon linear performance formula:

Score = 0.4 × Importance + 0.3 × Utility + 0.2 × Frequency + 0.1 × Recency

High-scoring nodes are promoted straight to Long-Term Memory, medium nodes stay in Working storage, and low-scoring noise is automatically evicted to prevent token bloat.

5. Relational Graph Memory System (compactpy.graph_memory)

Converts raw long-term strings into dense, indexed, bidirectional Knowledge Graphs using NetworkX. Instead of raw prose, it stores knowledge as structured triplets:

Source Entity --(Relation)--> Target Entity

Example:

FastAPI --(backend_of)--> Mediscan AI

This retains complex causal relationships without wasting prompt space.

6. Attention-Aware Compressor (compactpy.compressors.attention)

Acts as a dynamic "Importance Predictor." When a user passes a live query, it calculates the attention weight of your history pool relative to that query, dynamically filling a targeted prompt token budget with the highest-relevance vectors.


💾 Installation

Install the production framework directly from PyPI:

pip install compactpy

💻 Quickstart: End-to-End Pipeline

Here is how to run the complete automated ingestion, scoring, and query-aware compaction loop:

from compactpy.memory import HierarchicalMemory
from compactpy.memory.scoring import MemoryScoringEngine
from compactpy.graph_memory import GraphMemorySystem
from compactpy.compressors.attention import AttentionAwareCompressor

# 1. Initialize our modular cognitive layers
memory_vault = HierarchicalMemory()
scoring_engine = MemoryScoringEngine()
graph_db = GraphMemorySystem()
attention_compressor = AttentionAwareCompressor()

# 2. Ingest raw conversational logs
raw_logs = [
    "We are designing a medicine detection module called Mediscan AI.",
    "Mediscan AI uses FastAPI for the backend framework architecture.",
    "Today the weather in Delhi is cloudy and rainy."
]

for log in raw_logs:
    importance = 0.85 if "FastAPI" in log or "Mediscan" in log else 0.3
    memory_vault.add_memory(log, importance=importance, utility=0.7)

# 3. Simulate usage hits and run lifecycle scoring
memory_vault.increment_frequency(raw_logs[1])
scoring_engine.process_lifecycle_cycle(memory_vault)

# 4. Map persistent facts into the knowledge graph
graph_db.add_relation("FastAPI", "backend_of", "Mediscan AI")
graph_facts = graph_db.get_relationships_as_text()

# 5. Build a query-aware compact context
user_query = "What backend options did we settle on for Mediscan AI?"
combined_context = [m["text"] for m in memory_vault.working_memory] + graph_facts

optimized_payload, metrics = attention_compressor.compress_context_for_query(
    query=user_query,
    context_pool=combined_context,
    token_budget=45
)

print(f"Optimized Prompt Context: {optimized_payload}")
print(f"Token Reduction: {metrics['reduction_percentage']}%")

🧪 Running Validation Demos

The project repository keeps runnable verification scripts under bin/. Run them to watch the math and optimization phases execute live in your terminal:

# Test token utilities and basic compressors
python bin/demo_phase1_foundations.py
python bin/demo_step2.py

# Test hierarchical lifecycle scoring loops
python bin/demo_step3.py

# Test graph relationship mapping
python bin/demo_step5.py

# Test dynamic attention query budgeting
python bin/demo_step6.py

# Run the complete end-to-end processing pipeline
python bin/run_compactpy_pipeline.py

📊 Performance Benchmarks

CompactPy scales robustly with dense context footprints. Below is the empirical efficiency evaluation demonstrating token reduction scaling against processing latency:

CompactPy Performance Curve

  • Token Optimization: Reaches up to 95%+ token space reduction under dense context scales by aggressively pruning semantic redundancies and noise.
  • Latency Footprint: Post-initialization, context filtration operates dynamically in under 50ms, ensuring real-world suitability for high-throughput LLM pipelines.

📄 License

Distributed under the MIT License. See LICENSE for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

compactpy-1.0.1.tar.gz (21.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

compactpy-1.0.1-py3-none-any.whl (22.2 kB view details)

Uploaded Python 3

File details

Details for the file compactpy-1.0.1.tar.gz.

File metadata

  • Download URL: compactpy-1.0.1.tar.gz
  • Upload date:
  • Size: 21.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for compactpy-1.0.1.tar.gz
Algorithm Hash digest
SHA256 d6ab082e067446452c46b4648a5728cadc0cf8eb8652f5e14e9ffe1915d193a6
MD5 c046adb604e7724efa700c18d34139ed
BLAKE2b-256 59e84fd5c9e23a238b2da746d8240eb8f83f90cd6b337078881ccedd15164609

See more details on using hashes here.

File details

Details for the file compactpy-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: compactpy-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 22.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for compactpy-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4e23548528d04b49973f75d1bc2a8b79f22ba0126ca9f1fff37c88c5964a83a3
MD5 fd68a208320303fadfba05a0b6e3956e
BLAKE2b-256 0dec083d0b2052e4578dfcecbca12504928f23a237f1960af4fb746c264892b8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page