An intelligent, hierarchical context compression framework for LLM memory systems.
Project description
CompactPy 🧠⚡
An intelligent, multi-evolutionary hierarchical memory and context compression framework designed to optimize LLM prompt footprints and eliminate token bloat in RAG pipelines.
🚀 The Core Problem
Large Language Models have finite, expensive context windows. Storing raw, repetitive conversational history, system clutter, and loose narrative prose directly in the prompt window leads to massive API billing inflation, elevated system latency, and model confusion due to key context dilution.
CompactPy solves this. By mimicking cognitive memory tiers, vector math similarities, and directed knowledge graphs, it drops prompt footprints by 40%+ while perfectly preserving deep engineering states and concept dependencies.
🛠️ Multi-Evolutionary Architecture
CompactPy processes raw runtime context streams across six specialized optimization phases:
1. Token Analytics Core (compactpy.core)
Uses high-speed BPE tokenization via tiktoken to run precision boundaries, calculating exact text lengths and tracking compression metrics down to individual bits.
2. Algorithmic Compression Engines (compactpy.compressors)
- Exact Deduplication Engine: Automatically strips out repetitive context loops and chronological logs while keeping structural stream order intact.
- Semantic Compressor: Embeds data blocks via
SentenceTransformer, executing vector Cosine Similarity arrays to eliminate overlapping thoughts (e.g., keeping only one variation of a phrase if similarity crosses a0.75threshold).
3. Hierarchical Memory Repository (compactpy.memory)
Isolates text strings into explicit cognitive abstraction layers based on real-world utility:
raw_memory: The volatile, incoming execution log dump.working_memory: Active short-term operational buffers available for immediate context retrieval.long_term_memory: High-value project parameters and user rules that never decay.
4. Memory Scoring Engine (compactpy.memory.scoring)
Memories are evaluated dynamically using a custom, long-horizon linear performance formula:
Score = 0.4 × Importance + 0.3 × Utility + 0.2 × Frequency + 0.1 × Recency
High-scoring nodes are promoted straight to Long-Term Memory, medium nodes stay in Working storage, and low-scoring noise is automatically evicted to prevent token bloat.
5. Relational Graph Memory System (compactpy.graph_memory)
Converts raw long-term strings into dense, indexed, bidirectional Knowledge Graphs using NetworkX. Instead of raw prose, it stores knowledge as structured triplets:
Source Entity --(Relation)--> Target Entity
Example:
FastAPI --(backend_of)--> Mediscan AI
This retains complex causal relationships without wasting prompt space.
6. Attention-Aware Compressor (compactpy.compressors.attention)
Acts as a dynamic "Importance Predictor." When a user passes a live query, it calculates the attention weight of your history pool relative to that query, dynamically filling a targeted prompt token budget with the highest-relevance vectors.
💾 Installation
Install the production framework directly from PyPI:
pip install compactpy
💻 Quickstart: End-to-End Pipeline
Here is how to run the complete automated ingestion, scoring, and query-aware compaction loop:
from compactpy.memory import HierarchicalMemory
from compactpy.memory.scoring import MemoryScoringEngine
from compactpy.graph_memory import GraphMemorySystem
from compactpy.compressors.attention import AttentionAwareCompressor
# 1. Initialize our modular cognitive layers
memory_vault = HierarchicalMemory()
scoring_engine = MemoryScoringEngine()
graph_db = GraphMemorySystem()
attention_compressor = AttentionAwareCompressor()
# 2. Ingest raw conversational logs
raw_logs = [
"We are designing a medicine detection module called Mediscan AI.",
"Mediscan AI uses FastAPI for the backend framework architecture.",
"Today the weather in Delhi is cloudy and rainy."
]
for log in raw_logs:
importance = 0.85 if "FastAPI" in log or "Mediscan" in log else 0.3
memory_vault.add_memory(log, importance=importance, utility=0.7)
# 3. Simulate usage hits and run lifecycle scoring
memory_vault.increment_frequency(raw_logs[1])
scoring_engine.process_lifecycle_cycle(memory_vault)
# 4. Map persistent facts into the knowledge graph
graph_db.add_relation("FastAPI", "backend_of", "Mediscan AI")
graph_facts = graph_db.get_relationships_as_text()
# 5. Build a query-aware compact context
user_query = "What backend options did we settle on for Mediscan AI?"
combined_context = [m["text"] for m in memory_vault.working_memory] + graph_facts
optimized_payload, metrics = attention_compressor.compress_context_for_query(
query=user_query,
context_pool=combined_context,
token_budget=45
)
print(f"Optimized Prompt Context: {optimized_payload}")
print(f"Token Reduction: {metrics['reduction_percentage']}%")
🧪 Running Validation Demos
The project repository keeps runnable verification scripts under bin/. Run them to watch the math and optimization phases execute live in your terminal:
# Test token utilities and basic compressors
python bin/demo_phase1_foundations.py
python bin/demo_step2.py
# Test hierarchical lifecycle scoring loops
python bin/demo_step3.py
# Test graph relationship mapping
python bin/demo_step5.py
# Test dynamic attention query budgeting
python bin/demo_step6.py
# Run the complete end-to-end processing pipeline
python bin/run_compactpy_pipeline.py
📊 Performance Benchmarks
CompactPy scales robustly with dense context footprints. Below is the empirical efficiency evaluation demonstrating token reduction scaling against processing latency:
- Token Optimization: Reaches up to 95%+ token space reduction under dense context scales by aggressively pruning semantic redundancies and noise.
- Latency Footprint: Post-initialization, context filtration operates dynamically in under 50ms, ensuring real-world suitability for high-throughput LLM pipelines.
📄 License
Distributed under the MIT License. See LICENSE for more information.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file compactpy-1.0.1.tar.gz.
File metadata
- Download URL: compactpy-1.0.1.tar.gz
- Upload date:
- Size: 21.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d6ab082e067446452c46b4648a5728cadc0cf8eb8652f5e14e9ffe1915d193a6
|
|
| MD5 |
c046adb604e7724efa700c18d34139ed
|
|
| BLAKE2b-256 |
59e84fd5c9e23a238b2da746d8240eb8f83f90cd6b337078881ccedd15164609
|
File details
Details for the file compactpy-1.0.1-py3-none-any.whl.
File metadata
- Download URL: compactpy-1.0.1-py3-none-any.whl
- Upload date:
- Size: 22.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4e23548528d04b49973f75d1bc2a8b79f22ba0126ca9f1fff37c88c5964a83a3
|
|
| MD5 |
fd68a208320303fadfba05a0b6e3956e
|
|
| BLAKE2b-256 |
0dec083d0b2052e4578dfcecbca12504928f23a237f1960af4fb746c264892b8
|