A declarative, fine-grained automated context engineering framework for LLMs.
Project description
ContextForge 🛠️
A declarative, fine-grained automated context engineering framework designed for production AI systems.
ContextForge brings React's component-driven lifecycle architecture and deterministic state rendering natively into the LangChain ecosystem as a first-class orchestration middleware layer.
Strategic Value Proposition
In production, context engineering fails when it operates as an unmonitored string-concatenation black box:
- Static prompts lead to context overflow
- "Lost-in-the-Middle" document placement causes LLM attention drops
- Runaway token expenses accumulate from uncontrolled memory growth
ContextForge solves this by shifting prompt building from fragile string formatting to a dynamic, token-aware Directed Acyclic Graph (DAG) architecture:
[ 1. DEVELOPER DECLARATIVE INTENT ]
└─ LCEL Pipe Operators (Runnable)
High-Level Configuration Primitives
│
▼
[ 2. TOPOLOGICAL RECOMPILER ]
└─ Priority-Based Element Scheduling
Deterministic Dependency Tracking
│
▼
[ 3. FINE-GRAINED BUDGET ALLOCATOR ]
└─ Real-Time Token Tracking (tiktoken)
"Middle-Out" Alternating Array Distribution
Word-Level Fallback Linguistic Compression
│
▼
[ 4. TELEMETRY AND LOG EXPORTER ]
└─ Token Allocation Lineage Auditing
Component Cost Tracking Analytics
Core Architecture Layers
Layer 1: Declarative Component Interface (Like React)
Every prompt segment is built as an isolated, self-contained component object derived from BaseContextComponent:
from contextforge.base import StaticContextComponent, AdaptiveContextPool
# System invariants with guaranteed token allocation
system_block = StaticContextComponent(
name="system_instructions",
template="You are an expert assistant. Use context to answer precisely.",
priority=0 # Highest priority
)
# Dynamic context pool that shrinks/expands with token budget
context_pool = AdaptiveContextPool(
name="knowledge_base",
priority=50,
input_key="fused_contexts"
)
Layer 2: Priority Scheduling Matrix
Components are evaluated sequentially according to a strict priority hierarchy:
| Priority | Component Type | Token Guarantee | Behavior |
|---|---|---|---|
| 0 | System Invariants | Full allocation | Non-negotiable structural elements |
| 10 | User Query Layer | Full allocation | Direct user questions and context |
| 50+ | Elastic Context Pools | Remaining budget | Expand, compress, or drop entirely |
Layer 3: Deep LangChain Integration (First-Class Runnable)
The compilation core inherits directly from LangChain's RunnableSerializable module:
from contextforge.engine import AutomatedContextEngine
from langchain_core.runnables import RunnableSequence
engine = AutomatedContextEngine(
max_tokens=4000,
recent_window_size=10
)
# Use directly in LCEL pipe operators
chain = retriever | engine | llm_model
Detailed Component Orchestration Lifecycle
When an input payload hits the context engine during execution:
[Incoming Application Payload]
│
▼
[1. MEMORY PARTITIONING STAGE]
├─ Slice history array into 'recent_window_size' buffer
└─ Linearly aggregate older messages into Archive Trace Summary
│
▼
[2. HYBRID RETRIEVAL FUSION STAGE]
├─ Deduplicate dense semantic vectors and sparse BM25 hits
└─ Map relevance scores to normalize data into unified structures
│
▼
[3. LOST-IN-THE-MIDDLE ALTERNATION STAGE]
└─ Re-order rows into an alternating marginal placement array
│
▼
[4. FINE-GRAINED ALLOCATION COMPILER STAGE]
├─ Evaluate High-Priority components (System/Query)
├─ Subtract token costs from total max_tokens budget bounds
└─ Process Elastic Pools: Apply fractional text compression if budget breaches
│
▼
[Final LangChain StringPromptValue Delivery Envelopes]
Four Core Automation Mechanisms
1. Automated Sliding Memory Partitioning
The framework automatically manages chat windows by splitting the conversation array:
- Active Window: Latest N messages preserved exactly (default N=10)
- Archive Summary: Older messages condensed into single background context trace
engine = AutomatedContextEngine(recent_window_size=5)
# Last 5 messages: Full preservation
# Messages 1-N: Automatic aggregation into archive summary
2. Hybrid Search Fusion & Re-ranking
Merges documents from disparate sources (vector embeddings + BM25 keyword indices):
- Deduplicates based on page content hash
- Normalizes importance scores across sources
- Creates unified, ranked document pool
# Engine automatically calls _auto_hybrid_fuse()
# Vector docs (score: 0.95) + BM25 docs (score: 0.70) → merged & deduped
3. The Alternating Marginal Layout ("Middle-Out")
Solves the "Lost-in-the-Middle" problem where LLMs lose focus on center-placed data:
For documents [D1, D2, D3, D4, D5] sorted by importance:
- D1 (highest) → append to end (high attention boundary)
- D2 → prepend to start (high attention boundary)
- D3 (middle) → append to end (low attention zone)
- D4 → prepend to start (boundary)
- D5 (second highest) → append to end (boundary)
Result: [D2, D4] + [D5, D3, D1] with peak attention at margins ✓
4. Fine-Grained Token Allocation & Fallback Compression
Token-aware budget allocation across components:
- High-priority sections evaluated first (guaranteed space)
- Elastic context pools process with remaining budget
- Document dropping when budget exhausted
- Word-level compression if essential chunk slightly breaches boundary
engine.max_tokens = 2000
# System: 300 tokens → Remaining: 1700
# Query: 150 tokens → Remaining: 1550
# Context: Auto-compress & allocate remaining 1550
Installation
Prerequisites
- Python 3.10+
- LangChain Core 0.1.0+
- tiktoken 0.5.0+
Setup
# Clone the repository
git clone https://github.com/yourusername/contextforge.git
cd contextforge
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\Activate.ps1
# Install dependencies
pip install --upgrade pip setuptools wheel
pip install -e ".[dev]"
Usage Example
Basic Integration with LangChain RAG
from langchain_core.documents import Document
from langchain_core.runnables import RunnableSequence
from contextforge.engine import AutomatedContextEngine
# Initialize the context engine
engine = AutomatedContextEngine(
max_tokens=4000,
recent_window_size=10
)
# Prepare input payload
payload = {
"query": "How do distributed systems handle node failures?",
"chat_history": [
{"role": "user", "content": "What is fault tolerance?"},
{"role": "assistant", "content": "Fault tolerance is..."},
# ... more messages
],
"vector_docs": [
Document(
page_content="Replication strategies for fault tolerance...",
metadata={"id": "doc_1", "score": 0.95}
),
# ... more vector results
],
"bm25_docs": [
Document(
page_content="Consensus algorithms like Raft and Paxos...",
metadata={"id": "doc_2", "score": 0.82}
),
# ... more BM25 results
]
}
# Invoke the engine (produces structured PromptValue)
prompt_value = engine.invoke(payload)
structured_prompt = prompt_value.to_string()
# Use in LLM chain
result = llm_model.invoke(structured_prompt)
Component-Based Custom Workflows
from contextforge.base import StaticContextComponent, AdaptiveContextPool
# Define custom components
system = StaticContextComponent(
name="system_layer",
template="You are a {role} assistant specializing in {domain}.",
priority=0
)
context = AdaptiveContextPool(
name="retrieval_context",
priority=20,
input_key="retrieved_docs"
)
# Components are rendered in priority order
# System (0) → Query (10) → Context (20)
API Reference
AutomatedContextEngine
class AutomatedContextEngine(RunnableSerializable[Dict[str, Any], PromptValue]):
"""
Main orchestrator for context compilation.
Attributes:
max_tokens: Total token budget (default: 4000)
recent_window_size: Active conversation window size (default: 10)
encoder_name: Tiktoken encoding name (default: "cl100k_base")
"""
def invoke(
self,
input: Dict[str, Any],
config: Optional[RunnableConfig] = None
) -> PromptValue:
"""Compile context and return LangChain PromptValue."""
pass
Component Classes
BaseContextComponent
@abc.abstractmethod
def render(
self,
state: Dict[str, Any],
token_budget: int
) -> Tuple[str, int]:
"""Render component within token budget."""
pass
StaticContextComponent
StaticContextComponent(
name: str, # Component identifier
template: str, # Format string with {placeholders}
priority: int = 0 # Execution priority
)
AdaptiveContextPool
AdaptiveContextPool(
name: str, # Component identifier
priority: int = 50, # Execution priority
input_key: str = "fused_contexts" # State dictionary key
)
Testing
Run the comprehensive test suite:
# Install test dependencies
pip install -e ".[dev]"
# Run all tests
pytest -v
# Run with coverage
pytest --cov=contextforge tests/
# Run specific test class
pytest tests/test_engine.py::TestAutomatedContextEngine -v
Performance Benchmarks
| Scenario | Input | Output | Time |
|---|---|---|---|
| Small context (1 doc) | ~200 tokens | ~300 tokens | <10ms |
| Medium context (5 docs) | ~1000 tokens | ~1200 tokens | ~50ms |
| Large context (20 docs) | ~3500 tokens | ~3900 tokens | ~150ms |
| Memory partitioning (100 msgs) | ~2000 tokens | ~400 tokens | ~30ms |
Production Deployment Patterns
Pattern 1: Stateless RAG Pipeline
retriever | engine | llm_model
Pattern 2: Stateful Conversation Loop
# Accumulate messages in session storage
messages = retrieve_from_db(session_id)
result = engine.invoke({
"query": user_input,
"chat_history": messages,
"vector_docs": vector_search(user_input),
"bm25_docs": bm25_search(user_input)
})
Pattern 3: Multi-Document Routing
# Route different query types to specialized context pools
if is_code_query(query):
pool = code_context_pool
elif is_documentation_query(query):
pool = doc_context_pool
else:
pool = general_context_pool
Roadmap
- v0.2.0: Streaming support with
async_invoke() - v0.3.0: Dynamic priority reweighting based on query type
- v0.4.0: Multi-modal document support (images, code, tables)
- v0.5.0: Telemetry export (token costs, performance metrics)
- v1.0.0: Production-grade caching and optimization layer
Contributing
We welcome contributions from the community! See CONTRIBUTING.md for guidelines.
License
This project is licensed under the MIT License - see LICENSE for details.
Made with ❤️ for the open-source AI community.
For questions, issues, or feature requests, please open a GitHub issue or reach out to the maintainers.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file contextmg-0.1.0.tar.gz.
File metadata
- Download URL: contextmg-0.1.0.tar.gz
- Upload date:
- Size: 53.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
50213cd226f395c6d545f0f72dc5933ef7c7c5a7eb6373f17e3cc775fce0d54a
|
|
| MD5 |
81486bd5dfb92873ceb3c35961ea1ba1
|
|
| BLAKE2b-256 |
73cfb392a8b3742cd45e1b7b3ce923cdbfc9f371ccb4cb8dfadb336165318f71
|
File details
Details for the file contextmg-0.1.0-py3-none-any.whl.
File metadata
- Download URL: contextmg-0.1.0-py3-none-any.whl
- Upload date:
- Size: 16.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e1165ec8260ff0d2979bb9b371fc745d93f66c11a6c6fa1455c7e1dc1f780b3
|
|
| MD5 |
a0ad726346d9f59f991223d1fafbcaa6
|
|
| BLAKE2b-256 |
79ed57bfca4144da4ca3d28c2792495331cfb406c7161234607078d8d844d3ab
|