Skip to main content

A declarative, fine-grained automated context engineering framework for LLMs.

Project description

ContextForge 🛠️

A declarative, fine-grained automated context engineering framework designed for production AI systems.

ContextForge brings React's component-driven lifecycle architecture and deterministic state rendering natively into the LangChain ecosystem as a first-class orchestration middleware layer.

Strategic Value Proposition

In production, context engineering fails when it operates as an unmonitored string-concatenation black box:

  • Static prompts lead to context overflow
  • "Lost-in-the-Middle" document placement causes LLM attention drops
  • Runaway token expenses accumulate from uncontrolled memory growth

ContextForge solves this by shifting prompt building from fragile string formatting to a dynamic, token-aware Directed Acyclic Graph (DAG) architecture:

       [ 1. DEVELOPER DECLARATIVE INTENT ]
           └─ LCEL Pipe Operators (Runnable)
              High-Level Configuration Primitives
                            │
                            ▼
       [ 2. TOPOLOGICAL RECOMPILER ]
           └─ Priority-Based Element Scheduling
              Deterministic Dependency Tracking
                            │
                            ▼
       [ 3. FINE-GRAINED BUDGET ALLOCATOR ]
           └─ Real-Time Token Tracking (tiktoken)
              "Middle-Out" Alternating Array Distribution
              Word-Level Fallback Linguistic Compression
                            │
                            ▼
       [ 4. TELEMETRY AND LOG EXPORTER ]
           └─ Token Allocation Lineage Auditing
              Component Cost Tracking Analytics

Core Architecture Layers

Layer 1: Declarative Component Interface (Like React)

Every prompt segment is built as an isolated, self-contained component object derived from BaseContextComponent:

from contextforge.base import StaticContextComponent, AdaptiveContextPool

# System invariants with guaranteed token allocation
system_block = StaticContextComponent(
    name="system_instructions",
    template="You are an expert assistant. Use context to answer precisely.",
    priority=0  # Highest priority
)

# Dynamic context pool that shrinks/expands with token budget
context_pool = AdaptiveContextPool(
    name="knowledge_base",
    priority=50,
    input_key="fused_contexts"
)

Layer 2: Priority Scheduling Matrix

Components are evaluated sequentially according to a strict priority hierarchy:

Priority Component Type Token Guarantee Behavior
0 System Invariants Full allocation Non-negotiable structural elements
10 User Query Layer Full allocation Direct user questions and context
50+ Elastic Context Pools Remaining budget Expand, compress, or drop entirely

Layer 3: Deep LangChain Integration (First-Class Runnable)

The compilation core inherits directly from LangChain's RunnableSerializable module:

from contextforge.engine import AutomatedContextEngine
from langchain_core.runnables import RunnableSequence

engine = AutomatedContextEngine(
    max_tokens=4000,
    recent_window_size=10
)

# Use directly in LCEL pipe operators
chain = retriever | engine | llm_model

Detailed Component Orchestration Lifecycle

When an input payload hits the context engine during execution:

[Incoming Application Payload]
           │
           ▼
[1. MEMORY PARTITIONING STAGE]
   ├─ Slice history array into 'recent_window_size' buffer
   └─ Linearly aggregate older messages into Archive Trace Summary
           │
           ▼
[2. HYBRID RETRIEVAL FUSION STAGE]
   ├─ Deduplicate dense semantic vectors and sparse BM25 hits
   └─ Map relevance scores to normalize data into unified structures
           │
           ▼
[3. LOST-IN-THE-MIDDLE ALTERNATION STAGE]
   └─ Re-order rows into an alternating marginal placement array
           │
           ▼
[4. FINE-GRAINED ALLOCATION COMPILER STAGE]
   ├─ Evaluate High-Priority components (System/Query)
   ├─ Subtract token costs from total max_tokens budget bounds
   └─ Process Elastic Pools: Apply fractional text compression if budget breaches
           │
           ▼
[Final LangChain StringPromptValue Delivery Envelopes]

Four Core Automation Mechanisms

1. Automated Sliding Memory Partitioning

The framework automatically manages chat windows by splitting the conversation array:

  • Active Window: Latest N messages preserved exactly (default N=10)
  • Archive Summary: Older messages condensed into single background context trace
engine = AutomatedContextEngine(recent_window_size=5)
# Last 5 messages: Full preservation
# Messages 1-N: Automatic aggregation into archive summary

2. Hybrid Search Fusion & Re-ranking

Merges documents from disparate sources (vector embeddings + BM25 keyword indices):

  • Deduplicates based on page content hash
  • Normalizes importance scores across sources
  • Creates unified, ranked document pool
# Engine automatically calls _auto_hybrid_fuse()
# Vector docs (score: 0.95) + BM25 docs (score: 0.70) → merged & deduped

3. The Alternating Marginal Layout ("Middle-Out")

Solves the "Lost-in-the-Middle" problem where LLMs lose focus on center-placed data:

For documents [D1, D2, D3, D4, D5] sorted by importance:

  • D1 (highest) → append to end (high attention boundary)
  • D2 → prepend to start (high attention boundary)
  • D3 (middle) → append to end (low attention zone)
  • D4 → prepend to start (boundary)
  • D5 (second highest) → append to end (boundary)

Result: [D2, D4] + [D5, D3, D1] with peak attention at margins ✓

4. Fine-Grained Token Allocation & Fallback Compression

Token-aware budget allocation across components:

  1. High-priority sections evaluated first (guaranteed space)
  2. Elastic context pools process with remaining budget
  3. Document dropping when budget exhausted
  4. Word-level compression if essential chunk slightly breaches boundary
engine.max_tokens = 2000
# System: 300 tokens → Remaining: 1700
# Query: 150 tokens → Remaining: 1550
# Context: Auto-compress & allocate remaining 1550

Installation

Prerequisites

  • Python 3.10+
  • LangChain Core 0.1.0+
  • tiktoken 0.5.0+

Setup

# Clone the repository
git clone https://github.com/yourusername/contextforge.git
cd contextforge

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\Activate.ps1

# Install dependencies
pip install --upgrade pip setuptools wheel
pip install -e ".[dev]"

Usage Example

Basic Integration with LangChain RAG

from langchain_core.documents import Document
from langchain_core.runnables import RunnableSequence
from contextforge.engine import AutomatedContextEngine

# Initialize the context engine
engine = AutomatedContextEngine(
    max_tokens=4000,
    recent_window_size=10
)

# Prepare input payload
payload = {
    "query": "How do distributed systems handle node failures?",
    "chat_history": [
        {"role": "user", "content": "What is fault tolerance?"},
        {"role": "assistant", "content": "Fault tolerance is..."},
        # ... more messages
    ],
    "vector_docs": [
        Document(
            page_content="Replication strategies for fault tolerance...",
            metadata={"id": "doc_1", "score": 0.95}
        ),
        # ... more vector results
    ],
    "bm25_docs": [
        Document(
            page_content="Consensus algorithms like Raft and Paxos...",
            metadata={"id": "doc_2", "score": 0.82}
        ),
        # ... more BM25 results
    ]
}

# Invoke the engine (produces structured PromptValue)
prompt_value = engine.invoke(payload)
structured_prompt = prompt_value.to_string()

# Use in LLM chain
result = llm_model.invoke(structured_prompt)

Component-Based Custom Workflows

from contextforge.base import StaticContextComponent, AdaptiveContextPool

# Define custom components
system = StaticContextComponent(
    name="system_layer",
    template="You are a {role} assistant specializing in {domain}.",
    priority=0
)

context = AdaptiveContextPool(
    name="retrieval_context",
    priority=20,
    input_key="retrieved_docs"
)

# Components are rendered in priority order
# System (0) → Query (10) → Context (20)

API Reference

AutomatedContextEngine

class AutomatedContextEngine(RunnableSerializable[Dict[str, Any], PromptValue]):
    """
    Main orchestrator for context compilation.
    
    Attributes:
        max_tokens: Total token budget (default: 4000)
        recent_window_size: Active conversation window size (default: 10)
        encoder_name: Tiktoken encoding name (default: "cl100k_base")
    """
    
    def invoke(
        self, 
        input: Dict[str, Any], 
        config: Optional[RunnableConfig] = None
    ) -> PromptValue:
        """Compile context and return LangChain PromptValue."""
        pass

Component Classes

BaseContextComponent

@abc.abstractmethod
def render(
    self, 
    state: Dict[str, Any], 
    token_budget: int
) -> Tuple[str, int]:
    """Render component within token budget."""
    pass

StaticContextComponent

StaticContextComponent(
    name: str,           # Component identifier
    template: str,       # Format string with {placeholders}
    priority: int = 0    # Execution priority
)

AdaptiveContextPool

AdaptiveContextPool(
    name: str,                           # Component identifier
    priority: int = 50,                  # Execution priority
    input_key: str = "fused_contexts"    # State dictionary key
)

Testing

Run the comprehensive test suite:

# Install test dependencies
pip install -e ".[dev]"

# Run all tests
pytest -v

# Run with coverage
pytest --cov=contextforge tests/

# Run specific test class
pytest tests/test_engine.py::TestAutomatedContextEngine -v

Performance Benchmarks

Scenario Input Output Time
Small context (1 doc) ~200 tokens ~300 tokens <10ms
Medium context (5 docs) ~1000 tokens ~1200 tokens ~50ms
Large context (20 docs) ~3500 tokens ~3900 tokens ~150ms
Memory partitioning (100 msgs) ~2000 tokens ~400 tokens ~30ms

Production Deployment Patterns

Pattern 1: Stateless RAG Pipeline

retriever | engine | llm_model

Pattern 2: Stateful Conversation Loop

# Accumulate messages in session storage
messages = retrieve_from_db(session_id)
result = engine.invoke({
    "query": user_input,
    "chat_history": messages,
    "vector_docs": vector_search(user_input),
    "bm25_docs": bm25_search(user_input)
})

Pattern 3: Multi-Document Routing

# Route different query types to specialized context pools
if is_code_query(query):
    pool = code_context_pool
elif is_documentation_query(query):
    pool = doc_context_pool
else:
    pool = general_context_pool

Roadmap

  • v0.2.0: Streaming support with async_invoke()
  • v0.3.0: Dynamic priority reweighting based on query type
  • v0.4.0: Multi-modal document support (images, code, tables)
  • v0.5.0: Telemetry export (token costs, performance metrics)
  • v1.0.0: Production-grade caching and optimization layer

Contributing

We welcome contributions from the community! See CONTRIBUTING.md for guidelines.

License

This project is licensed under the MIT License - see LICENSE for details.

Made with ❤️ for the open-source AI community.

For questions, issues, or feature requests, please open a GitHub issue or reach out to the maintainers.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

contextmg-0.1.0.tar.gz (53.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

contextmg-0.1.0-py3-none-any.whl (16.4 kB view details)

Uploaded Python 3

File details

Details for the file contextmg-0.1.0.tar.gz.

File metadata

  • Download URL: contextmg-0.1.0.tar.gz
  • Upload date:
  • Size: 53.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for contextmg-0.1.0.tar.gz
Algorithm Hash digest
SHA256 50213cd226f395c6d545f0f72dc5933ef7c7c5a7eb6373f17e3cc775fce0d54a
MD5 81486bd5dfb92873ceb3c35961ea1ba1
BLAKE2b-256 73cfb392a8b3742cd45e1b7b3ce923cdbfc9f371ccb4cb8dfadb336165318f71

See more details on using hashes here.

File details

Details for the file contextmg-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: contextmg-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for contextmg-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6e1165ec8260ff0d2979bb9b371fc745d93f66c11a6c6fa1455c7e1dc1f780b3
MD5 a0ad726346d9f59f991223d1fafbcaa6
BLAKE2b-256 79ed57bfca4144da4ca3d28c2792495331cfb406c7161234607078d8d844d3ab

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page