Context Relay Protocol — unbounded context, unbounded generation, amplified reasoning for LLMs

These details have not been verified by PyPI

Project links

Project description

CRP Logo

Context Relay Protocol (CRP)™

An open protocol for structured context management across LLM invocations.

Spec Version: 2.0.0 RFC 2119 Language Neutral Status: Specification Complete Python 3.10+ 1,537 tests

Quick Start • The Problem • Solution • Inter-LLM Sharing • Benchmarks • Specification • SDKs • Community

MCP gives agents tools. A2A lets agents talk. CRP gives every agent unbounded context, unbounded generation, and amplified reasoning — the foundation both protocols assume but neither provides.

The Problem
What CRP Does
Key Differentiators
Quick Start
How CRP Works
Architecture Overview
Core Capabilities
Inter-LLM Context Sharing (HTTP Sidecar)
End-to-End Example: Penetration Test
CRP in the AI Stack
Extraction Quality
Efficiency and Cost
Observability and Auditing
Limitations and Trade-offs
Why Large Context Windows Are Not Enough
Specification Documents
JSON Schemas
API Surface
SDK Status
Comparison with Alternatives
Hardware Requirements
Configuration
Use Cases
Roadmap
Contributing
Governance
Security
Community
Built With
Intellectual Property & License

The Problem

Every agentic AI system forces its LLM to work inside a single, shared context window. Planning, reasoning, tool calling, analysis, memory, and output generation all compete for the same finite token budget. This creates three compounding failures:

Failure	What Happens	Impact
Context Contamination	Tool output from step 3 dilutes reasoning for step 12	The LLM "forgets" early discoveries. Later decisions degrade
Attention Collapse	At 30K+ tokens, attention spreads thin over irrelevant content	Critical facts in the middle are effectively invisible (Liu et al., 2023)
Hard Ceiling	When the context window fills, the system truncates or stops	Reports are incomplete. Analysis is shallow. Output is arbitrarily cut short

These aren't edge cases — they happen on every non-trivial agentic task and get worse the more capable your agent becomes.

What CRP Does

CRP is a middleware layer that wraps your existing LLM calls. It does NOT replace your LLM — it amplifies it.

For every LLM call you already make, CRP:

Builds a better prompt — adds an envelope of relevant historical facts, source passages, and the LLM's own synthesis alongside your system prompt and task input
Calls YOUR LLM — through your existing provider and infrastructure
Returns the raw output unchanged — exactly what the LLM generated, not a filtered version
Observes the output (read-only) — extracts facts into the knowledge fabric so future windows benefit
Carries the LLM's understanding forward — progressive synthesis evolves across windows
Scaffolds reasoning — decomposes complex tasks into micro-steps for models that can't chain-of-thought natively

   WITHOUT CRP                                WITH CRP

   One shared window,                         N dedicated windows,
   everything competing:                      each pristine:

   +---------------------------+              +----------+   +----------+   +----------+
   | System prompt             |              | System   |   | System   |   | System   |
   | + Tool schemas (10K tok)  |              | Envelope |   | Envelope |   | Envelope |
   | + Tool output #1-#3       |              | Task     |   | Task     |   | Task     |
   | + Reasoning history       |              |          |   |          |   |          |
   | + Prior conversation      |              | Full     |   | Full     |   | Full     |
   | + Current task (buried)   |              | 128K     |   | 128K     |   | 128K     |
   +---------------------------+              +----------+   +----------+   +----------+

    Total capacity: 128K (fixed)              Total capacity: N × 128K (unbounded)
    Quality: degrades with length             Quality: peak per window (tier-reported)
    Input limit: context window               Input limit: unbounded (auto-ingest)
    Output limit: max_output_tokens           Output limit: unbounded (continuation)

Key Differentiators

Embedded library, not a server — zero deployment overhead. pip install crprotocol and you're running. No Docker, no infrastructure. Optional HTTP sidecar (crp serve) for inter-LLM context sharing — never started automatically
Works with any LLM provider — auto-detected, 3 fields to configure. Built-in adapters for OpenAI, Anthropic, Ollama, and llama.cpp — plus CustomProvider to wrap any LLM in 3 lines
Structured knowledge extraction — 6-stage graduated pipeline (regex → statistical NLP → GLiNER NER → UIE relations → RST discourse → LLM-assisted relational). Not just text chunking
Contextual Knowledge Fabric (CKF) — graph-structured knowledge with 4-mode retrieval (graph walk + pattern query + semantic fallback + community summaries), event-sourced history, and cross-session persistence
Unbounded input — automatically ingests documents larger than any model's context window through structure-aware chunking with protected spans
Unbounded output — automatic continuation with voice profile preservation, document maps, degradation-triggered re-grounding, and content-type-aware stitching
Honest quality guarantees — a degradation model, not magic claims. Quality tiers S through D, reported with every dispatch. Extraction recall percentages published per stage
Cross-session knowledge — sessions build on each other. CKF persists facts, reasoning traces, and graph structure across sessions
Reasoning amplification — meta-learning scaffolds (ORC + ICML + RTL) enable 2B–7B models to perform multi-step reasoning they cannot do natively
Zero in-window overhead — CRP operates entirely outside the LLM's context window. No protocol tokens, no function call schemas, no memory management instructions inside the window
Full observability — per-window metrics, session dashboards, window DAG traceability, telemetry export. Debug "why did it do that?" by tracing decisions through the DAG

Quick Start

Minimal Integration (3 lines)

import crp

# Auto-detects your LLM from environment (OPENAI_API_KEY, ANTHROPIC_API_KEY, or Ollama)
client = crp.Client()
output, report = client.dispatch(
    system_prompt="You are a helpful assistant.",
    task_input="Summarize this document: ..."
)
# output = raw LLM output, unmodified
# report.quality_tier = "S" | "A" | "B" | "C" | "D"

Explicit Provider

from crp import Client
from crp.providers import OpenAIAdapter

client = Client(provider=OpenAIAdapter(model="gpt-4o"))
output, report = client.dispatch(
    system_prompt="You are a helpful assistant.",
    task_input="Summarize this document: ..."
)

Model Name Shortcut

import crp

# Pass model= for automatic provider detection
client = crp.Client(model="claude-sonnet-4-20250514")   # → AnthropicAdapter
client = crp.Client(model="gpt-4o")              # → OpenAIAdapter
client = crp.Client(model="llama3.1")             # → OllamaAdapter

Local Models (Zero-Config)

from crp import Client
from crp.providers import OllamaAdapter

client = Client(provider=OllamaAdapter())  # Auto-detects localhost:11434
output, report = client.dispatch(
    system_prompt="You are a security analyst.",
    task_input="Analyze these scan results: ..."
)

llama.cpp / vLLM

from crp import Client
from crp.providers import LlamaCppAdapter

client = Client(provider=LlamaCppAdapter(server_url="http://localhost:8080"))
output, report = client.dispatch(system_prompt=system, task_input=user_message)

Any Custom Setup

from crp import Client
from crp.providers import CustomProvider

def my_generate(messages, **kw):
    # Your existing LLM function
    return ("response text", "stop")  # (output, finish_reason)

client = Client(provider=CustomProvider(
    generate_fn=my_generate,
    count_tokens_fn=lambda text: len(text) // 4,
    context_size=128000,
))
output, report = client.dispatch(system_prompt=system, task_input=user_message)

Direct Ingestion (No LLM Window)

client.ingest(nmap_output)      # ~7ms, extraction only — no LLM call
client.ingest(nikto_output)     # Facts go to warm state automatically
client.ingest(api_response)     # Available in next window's envelope

LLM Compatibility

API Style	Provider	Examples
Chat completions	`OpenAIAdapter`, `AnthropicAdapter`, `OllamaAdapter`	OpenAI, Anthropic, Ollama
HTTP completions	`LlamaCppAdapter`	llama.cpp, any OpenAI-compatible HTTP endpoint
Any custom setup	`CustomProvider`	Any function that takes messages → returns (text, reason)

Configuration

# .env — ALL optional (CRP auto-detects LLM from API keys or local Ollama)
CRP_ENABLED=true                   # Master switch (default: enabled)
CRP_LOG_ENVELOPES=false            # Debug logging (default: false)
CRP_MAX_CONTINUATIONS=50           # Safety limit on continuation windows

Async Support

# Works with FastAPI, asyncio, any async framework
output, report = await client.async_dispatch("You are helpful.", "Explain CRP.")
facts_count = await client.async_ingest(text, label="docs")
async for event in client.async_dispatch_stream("You are helpful.", "Explain CRP."):
    if event.event_type == "token":
        print(event.data, end="")
await client.async_close()

More examples: See examples/ for runnable scripts — quickstart, multi-turn, ingestion, streaming, async, and provider selection.

How CRP Works

Four Core Mechanisms

All operate outside the LLM — zero protocol tokens inside the model's window.

1. Task Isolation

Every LLM call gets its own dedicated context window containing: system prompt, context envelope, and task input. Nothing else. CRP does NOT add LLM calls — every crp.dispatch() maps 1:1 to calls your application already makes. The only "extra" windows are continuations when output hits the physical limit.

2. Context Envelopes + Knowledge Fabric

Between windows, an envelope carries forward everything the next window needs. Built by extraction (not summarization) — atomic facts and relationships are pulled from output using a graduated 6-stage pipeline, stored in the Contextual Knowledge Fabric (CKF) — a fact graph with typed edges, event-sourced history, community detection, and multi-mode retrieval:

Graph Walk — traverse edges from seed facts (2-hop BFS) to reconstruct the subgraph around the task's focal point
Pattern Query — content-addressable structured matching inspired by tuple spaces (Gelernter, 1985)
Semantic Fallback — traditional ANN cosine similarity when graph structure is insufficient
Community Summaries — Leiden community detection produces topic clusters; summaries provide high-level context

Facts are scored by multi-aspect semantic similarity with cross-encoder reranking, and packed greedily with dependency-aware graph packing until the window is full.

3. Multi-Signal Completion Detection

The protocol monitors four signals across windows:

Signal	What It Measures	Dominates For
Fact Flow	New facts per token	Entity-rich content
Structural Flow	New headings/paragraphs/list items	Structured documents
Vocabulary Novelty	New n-grams vs. seen n-grams	Creative/discursive content
Structural Completion	Conclusion detection	Summaries and conclusions

Signals are weighted by content type — preventing premature termination of conclusions, summaries, and rhetorical passages that produce few new facts but are genuine content.

4. Envelope-Based Continuation

When output hits the physical limit, CRP:

Incrementally extracts facts from the new window's output — O(N) per window, not O(N²) accumulated
Identifies what's missing via multi-level gap analysis
Builds a continuation envelope with voice profile + document map + structural state for long-chain coherence
Dispatches a fresh window — the continuation sees extracted essence, not raw overlap

What CRP Sends to Your LLM

# You call:
response = client.dispatch(
    system_prompt="You are a security analyst.",
    task_input="Analyze these nmap results: ..."
)

# CRP constructs and sends to YOUR LLM:
messages = [
    {"role": "system", "content": "You are a security analyst."},    # UNCHANGED
    {"role": "user", "content": envelope_text + "\n\n" + task_input} # envelope ADDED
]

Your system prompt and task input pass through unchanged. The envelope is additional context — historical facts from prior windows, scored by relevance. The LLM doesn't know CRP exists. Zero protocol overhead inside the window.

Output Guarantee

dispatch() returns the complete, unmodified LLM output. Always. Extraction is a read-only side effect — it never modifies, filters, or summarizes the returned string.

Architecture Overview

+---------------------------------------------------------------------+
|                        YOUR APPLICATION                              |
|   (any code that calls an LLM — agents, pipelines, reports)         |
+---------------------------------------------------------------------+
                                |
                                |  crp.dispatch(system_prompt, task_input)
                                v
+---------------------------------------------------------------------+
|                        CRP ORCHESTRATOR                              |
|                                                                      |
|   +-----------------+  +-----------------+  +----------------------+ |
|   | Envelope        |  | Warm State      |  | Extraction Pipeline  | |
|   | Builder         |  | Store + Fact    |  | (Blackboard-Reactive)| |
|   | (multi-aspect   |  | Graph + Event   |  | regex → stat → NER → | |
|   |  scoring +      |  | Log             |  | UIE → discourse →    | |
|   |  cross-encoder  |  | (session facts, |  | LLM-relational       | |
|   |  reranking +    |  |  scored +       |  | (graduated, content- | |
|   |  CKF multi-mode |  |  embedded +     |  |  type-adaptive,      | |
|   |  retrieval +    |  |  graph edges +  |  |  self-gating)        | |
|   |  source         |  |  FactEvents)    |  |                      | |
|   |  grounding)     |  |                 |  |                      | |
|   +-----------------+  +-----------------+  +----------------------+ |
|                                                                      |
|   +-----------------+  +-----------------+  +----------------------+ |
|   | Multi-Signal    |  | Continuation    |  | CKF (Knowledge       | |
|   | Completion +    |  | Manager         |  |  Fabric)             | |
|   | Degradation     |  | (auto-ingest,   |  | graph walk +         | |
|   | Monitor         |  |  gap analysis,  |  | pattern query +      | |
|   | (fact flow +    |  |  stitch,        |  | semantic fallback +  | |
|   |  structural +   |  |  voice profile, |  | community summary +  | |
|   |  vocabulary +   |  |  document map,  |  | pub-sub events +     | |
|   |  chain degr.)   |  |  re-grounding)  |  | cross-session graph) | |
|   +-----------------+  +-----------------+  +----------------------+ |
|                                                                      |
|   +-----------------+  +-----------------+  +----------------------+ |
|   | Source          |  | LLM Context     |  | Meta-Learning        | |
|   | Grounding       |  | Curator         |  | Engine               | |
|   | Engine          |  | (periodic       |  | (ORC: orchestrated   | |
|   | (original text  |  |  curation       |  |  reasoning chains,   | |
|   |  passages in    |  |  windows,       |  |  ICML: in-context    | |
|   |  envelopes,     |  |  progressive    |  |  meta-learning,      | |
|   |  dual-layer     |  |  understanding, |  |  RTL: reasoning      | |
|   |  fact+source)   |  |  LLM synthesis) |  |  template library)   | |
|   +-----------------+  +-----------------+  +----------------------+ |
+---------------------------------------------------------------------+
                                |
                                |  Standard LLM API call (unchanged)
                                v
                    +------------------------+
                    |    LLM (any model)     |
                    |    Local or cloud      |
                    +------------------------+

Core Capabilities

Unbounded Context: Input > Model's Window

Problem: Your input is 1M tokens but your model has 128K context.

CRP's auto-ingest handles this transparently:

Detects overflow: system_prompt + task_input + generation_reserve > context_window
Structure-aware chunking at natural boundaries with protected spans (code blocks, tables, JSON objects are never split). 500-token overlap with boundary reconciliation
Extracts facts from each chunk — zero LLM calls for typical content
Builds envelope with multi-aspect scoring, cross-encoder reranking, and dependency-aware graph packing
Dispatches with a maximally-saturated context window

# Transparent — the user doesn't manage chunking
result = crp.dispatch(
    system_prompt="You are a legal analyst.",
    task_input=million_token_contract  # CRP handles the rest
)

Strictly better than truncation (which loses 87% of 1M input on a 128K model). See 02_CORE_PROTOCOL.md §7.6 for the honest degradation model.

Unbounded Generation: Output > Model's Limit

Problem: Your model outputs 4K tokens per call, but you need 100K.

CRP's continuation loop handles this automatically:

LLM generates → hits output limit (finish_reason: "length")
CRP incrementally extracts facts from the output — O(N) not O(N²)
Runs multi-level gap analysis
Builds continuation envelope: facts + structural state + remaining items + voice profile + document map
Dispatches fresh window — full context capacity, no attention degradation
Stitches outputs with content-type-aware boundary detection, echo detection, heading hierarchy validation
Periodically runs re-grounding windows that re-extract from accumulated output to correct warm state drift
Repeats until multi-signal completion detection indicates genuine completion

result = crp.dispatch(
    system_prompt="Write a comprehensive security report.",
    task_input="All findings here...",
    max_continuations=50  # Optional safety limit
)
# result contains the full output, stitched from multiple windows

Peak Quality Per Window

Every window gets the model's full context capacity. The envelope fills all remaining space with semantically-ranked facts. Fresh KV cache per window eliminates attention degradation. Self-calibrating weights and thresholds require zero configuration.

Concurrency Model

Each CRPOrchestrator instance is single-threaded by design — one dispatch at a time per session. Different sessions (separate CRPOrchestrator instances) are fully isolated and can run concurrently without interference. Each session has its own WarmStateStore, FactGraph, WindowDAG, and event log. To process multiple tasks concurrently, create one orchestrator per task.

Inter-LLM Context Sharing (HTTP Sidecar)

Optional. The sidecar is never started automatically. You must explicitly run crp serve to enable it.

CRP includes an HTTP sidecar that exposes the full protocol surface over REST, enabling multiple applications — potentially using different LLMs — to share extracted knowledge without direct LLM-to-LLM communication.

Why This Matters

Application A (Claude) extracts facts about code architecture. Application B (GPT-4) receives those facts via the /facts/share endpoint. Both benefit from the other's knowledge — without API key sharing, without prompt injection, without any LLM talking to another LLM. The knowledge flows through CRP's structured extraction layer.

This is not a chat relay. It is structured, scored, ranked knowledge transfer.

Quick Start

# Start the sidecar (loopback only, no auth — local development)
crp serve

# Start with authentication (recommended)
crp serve --auth-token "my-secret-token"

# Bind to all interfaces (REQUIRES auth token)
crp serve --bind-all --auth-token "my-secret-token" --port 9470

Example: Two LLMs Sharing Knowledge

# 1. Create sessions for two different applications
SESSION_A=$(curl -s -X POST http://localhost:9470/sessions \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"model": "claude-app", "context_window": 128000}' | python -c "import sys,json; print(json.load(sys.stdin)['session_id'])")

SESSION_B=$(curl -s -X POST http://localhost:9470/sessions \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"model": "gpt4-app", "context_window": 128000}' | python -c "import sys,json; print(json.load(sys.stdin)['session_id'])")

# 2. Application A ingests data and dispatches
curl -X POST http://localhost:9470/sessions/$SESSION_A/ingest \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"text": "The authentication module uses bcrypt with cost factor 12..."}'

curl -X POST http://localhost:9470/sessions/$SESSION_A/dispatch \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"system_prompt": "You are a security analyst.", "task_input": "Analyze the auth module."}'

# 3. Share Application A's knowledge → Application B
curl -X POST http://localhost:9470/sessions/$SESSION_A/facts/share \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"target_session_id": "'$SESSION_B'", "min_confidence": 0.5}'

# 4. Application B now has A's extracted facts in its warm state.
#    Its next dispatch will include those facts in the envelope.
curl -X POST http://localhost:9470/sessions/$SESSION_B/dispatch \
  -H "Authorization: Bearer $TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"system_prompt": "You are a code reviewer.", "task_input": "Review the auth module for best practices."}'
# → GPT-4 now sees Claude's extracted security facts in its context envelope

Full Endpoint Reference

Session Lifecycle

Method	Endpoint	Description
`POST`	`/sessions`	Create a new CRP session
`GET`	`/sessions`	List sessions (owned by caller only)
`GET`	`/sessions/:id/status`	Session metrics and health
`POST`	`/sessions/:id/close`	Close and clean up session

Dispatch (All 6 Variants)

Method	Endpoint	Description
`POST`	`/sessions/:id/dispatch`	Basic dispatch
`POST`	`/sessions/:id/dispatch/tools`	Tool-mediated dispatch
`POST`	`/sessions/:id/dispatch/reflexive`	Reflexive (verify) dispatch
`POST`	`/sessions/:id/dispatch/progressive`	Progressive dispatch
`POST`	`/sessions/:id/dispatch/stream-augmented`	Stream-augmented dispatch
`POST`	`/sessions/:id/dispatch/agentic`	Agentic dispatch

Knowledge

Method	Endpoint	Description
`POST`	`/sessions/:id/ingest`	Ingest raw text (extraction only, no LLM call)
`GET`	`/sessions/:id/facts`	Query extracted facts (with `?limit=` and `?min_confidence=`)
`POST`	`/sessions/:id/facts/share`	Share facts to another session (core feature)
`POST`	`/sessions/:id/facts/feedback`	Boost, penalize, or reject a fact
`GET`	`/sessions/:id/envelope`	Preview envelope contents

Admin

Method	Endpoint	Description
`POST`	`/sessions/:id/providers`	Register a fallback provider
`POST`	`/sessions/:id/estimate`	Cost estimation
`GET`	`/health`	Health check (session count, auth status, version)

Security Model

The sidecar is designed with defense-in-depth. Every layer is enforced on every request.

Layer	Protection	Detail
Bind address	Loopback by default	Binds to `127.0.0.1` — only local processes can connect
Authentication	Bearer token	`--auth-token` enables timing-safe (`secrets.compare_digest`) token verification
Bind-all gate	`--bind-all` requires auth	Cannot expose to network without `--auth-token` (or explicit `--allow-unauthenticated` override)
Session ownership	Token-hash binding	Sessions are bound to the SHA-256 hash of the token that created them. Other tokens get `403 Forbidden`
Rate limiting	Per-IP burst window	Default 120 req/60s per IP. Configurable via `--rate-limit`. Uses monotonic clock (immune to clock drift)
Body size limit	10 MB cap	Requests exceeding 10 MB receive `413 Payload Too Large`. Prevents memory exhaustion
Session cap	64 concurrent sessions	Returns `503 Service Unavailable` when exceeded. Configurable via `--max-sessions`
Security headers	On every response	`X-Content-Type-Options: nosniff`, `Cache-Control: no-store`
No HTTPS	By design	Deploy behind a TLS-terminating reverse proxy (nginx, Caddy) for production

CLI Options

crp serve [OPTIONS]

Options:
  --port INTEGER                    Port number (default: 9470)
  --bind-all                        Bind to 0.0.0.0 (requires --auth-token)
  --auth-token TEXT                 Bearer token for authentication
  --allow-unauthenticated           Override auth requirement for --bind-all
  --max-sessions INTEGER            Max concurrent sessions (default: 64)
  --rate-limit INTEGER              Max requests per IP per 60s (default: 120)

Integration with CRP Protocol

The sidecar is a thin HTTP layer over the same CRPOrchestrator that the Python SDK uses directly. Every session created via the sidecar is a full CRP session with:

All 6 extraction stages (regex → statistical → NER → UIE → discourse → LLM-relational)
Contextual Knowledge Fabric (CKF) with graph walk, pattern query, semantic fallback, community summaries
Multi-signal completion detection and automatic continuation
Envelope building with multi-aspect scoring and cross-encoder reranking
Event emission for all pipeline stages (fact.shared, fact.received, dispatch.completed, etc.)
RBAC enforcement, budget tracking, and cost estimation

The sidecar adds no protocol modifications. A fact extracted via the sidecar is identical to one extracted via client.dispatch(). A session created via HTTP behaves identically to one created via Python.

End-to-End Example: A Penetration Test

Your pentest application already has separate LLM calls for planning, tool selection, analysis, and reporting. With CRP, each llm.generate() becomes crp.dispatch(). CRP does not add, remove, or restructure your calls.

Step 1: Planning

plan = crp.dispatch(
    system_prompt="You are a penetration testing planner...",
    task_input="Create a pentest plan for target 192.168.1.50. Scope: external, web focus."
)

Phase	What Happens	Time
Envelope	Empty (first window — cold start)	0ms
LLM generates	Phase 1: Recon. Phase 2: Web vuln. Phase 3: Exploitation. Phase 4: Reporting	~3s
Extraction	regex captures "192.168.1.50"; statistical: "nmap", "nikto" = 8 facts	~6ms
Warm state	8 facts with embeddings	—

Step 2: Tool Selection

tool_choice = crp.dispatch(
    system_prompt="You are a security tool selector...",
    task_input="Select and configure the first tool for recon of 192.168.1.50"
)

Phase	What Happens	Time
Envelope	8 facts from Step 1, scored by similarity to "tool selection for recon"	~3ms
LLM generates	"Run: nmap -sV -sC -p- 192.168.1.50"	~2s
Extraction	regex: full nmap command; statistical: "version detection" = 6 new facts	~5ms
Warm state	Now 14 facts (8 + 6)	—

Step 3: Tool Execution + Ingestion

nmap_result = run_tool("nmap", "-sV -sC -p- 192.168.1.50")  # Your tool runner
crp.ingest(nmap_result)  # ~7ms extraction; 22 new facts (ports, services, versions)

No LLM call. Extraction pipeline processes raw tool output directly. Warm state: 36 facts.

Step 4: Analysis

Phase	What Happens	Time
Envelope	36 facts scored for "vulnerability analysis". ~4200 tokens of dense, relevant context	~4ms
LLM generates	"Critical: Apache 2.4.52 — CVE-2024-XXXX. High: OpenSSH 8.2 — known auth bypass..."	~5s
Extraction	12 new facts: CVEs, severity ratings, affected services, attack vectors	~8ms
Warm state	48 facts	—

Step 5: Report Generation + Continuation

Phase	What Happens	Time
Envelope	48 facts scored for "report writing". All CVEs, findings, recommendations ranked	~5ms
LLM generates	"Executive Summary... Finding 1: Critical..." → hits output limit	~8s
Continuation	Extract from partial report, identify missing sections, build continuation envelope	~15ms
Window 2	Fresh context, continues report. 6 more findings + recommendations	~6s
Stitch	Window 1 + Window 2 joined. Echo detection removes overlap. Clean 12-page report	~2ms

Total CRP Overhead

Step	CRP Time	LLM Time	Overhead
Planning	~6ms	~3,000ms	0.2%
Tool selection	~8ms	~2,000ms	0.4%
Ingestion	~7ms	0ms	N/A
Analysis	~12ms	~5,000ms	0.2%
Report + continuation	~22ms	~14,000ms	0.2%
Total	~55ms	~24,000ms	0.2%

CRP in the AI Stack

The Three-Layer Architecture

+-----------------------------------------------------------+
|  Layer 3:  A2A  —  Agent-to-Agent Communication            |
|  "How agents talk to each other"                           |
+-----------------------------------------------------------+
|  Layer 2:  MCP  —  Model Context Protocol                  |
|  "How agents access tools"                                 |
+-----------------------------------------------------------+
|  Layer 1:  CRP  —  Context Relay Protocol                  |
|  "How each agent manages its own context"                  |
|  THE FOUNDATION LAYER                                      |
+-----------------------------------------------------------+

CRP is complementary to MCP and A2A. MCP defines how agents access tools. A2A defines how agents communicate. CRP defines how each agent manages its own context — the foundation that makes both work at scale.

Without CRP, every MCP tool call competes for context space
Without CRP, every A2A message accumulates in a degrading window
With CRP + MCP: tool results are extracted into facts, not piled into the window
With CRP + A2A: inter-agent messages are structured knowledge, not raw text

Extraction Quality

Stage	Method	What It Extracts	Accuracy	When It Runs
1	Regex	IP addresses, CVEs, JSON, version strings	~99%	Always
2	Statistical (TextRank)	Key sentences by term frequency	~85-90% recall	Always
3	GLiNER NER	Entity spans (software, vulnerabilities)	~80-90% F1	When yield is low
4	UIE Relations	Entity relationships (X vulnerable to Y)	~70-80% F1	When yield is low
5	Discourse Structure	Logical relations (cause→effect, condition→consequence) via RST	~65-75% F1	Reasoning-dense content
6	LLM-Assisted Relational	Implicit logical relationships	~85-90% F1	Optional, high-complexity only

Stages are graduated — 3-6 activate selectively based on content complexity and prior stage yield. Content is auto-classified as ENTITY_RICH, REASONING_DENSE, or NARRATIVE to route through appropriate strategies.

Content Type	Typical Stages	Typical Time
Structured/factual	1-2	~10-15ms
Mixed content	1-4	~50-80ms
Reasoning-dense	1-5	~160ms
High-complexity	1-6	~500ms+ (Stage 6 uses LLM)

Efficiency and Cost

Per-Window Overhead

Operation	Time	When
Multi-aspect scoring + graph packing	~5-10ms	Every window
Cross-encoder reranking (top-200)	~400ms	When >50 facts (amortized)
Extraction Stages 1-2	~6ms	Every window
Extraction Stage 3 (GLiNER)	~50ms	Only when yield is low
Extraction Stage 4 (UIE)	~100ms	Only when yield is low
Extraction Stage 5 (Discourse)	~150ms	Reasoning-dense content
Typical total	~15-20ms	0.1-1% of LLM time

Token Efficiency: CRP vs MCP

Cost Factor	MCP	CRP
Tool schemas in prompt	ALL repeated every call (10K-50K)	Zero — only for tool-selection windows
Accumulated context	All prior results stay, attention degrades	Only relevant extracted facts
Redundant content	Same schemas repeated N times	No repetition — envelope carries only what's relevant

Example: 20-step agentic loop, 50 tools:

MCP: 20 × 10K schema tokens = 200K tokens on tool definitions alone
CRP: Schemas in tool-selection windows only. ~90% fewer protocol tokens

Cloud API Cost

Scenario	Without CRP	With CRP	Savings
20-step agentic loop (50 tools)	~400K tokens	~120K tokens	~70%
Long report (3 continuations)	Truncated at limit	4 windows, complete	N/A (impossible before)
Simple single-turn task	~2K tokens	~2K tokens	0% (no penalty)

Real-World: 200-Page Textbook Generation

Provider	Total Cost	Windows
Claude Opus	~$17	~32
Claude Sonnet	~$3.30	~32
GPT-4o	~$2.50	~32
DeepSeek	~$0.27	~32
Local model (Ollama)	$0	~32

Naive approach (paste all prior chapters into context): ~800K+ input tokens and worse quality.

Cost Controls

client = Client(
    llm=adapter,
    max_windows_per_session=50,
    max_total_input_tokens=1_000_000,
    max_total_output_tokens=500_000,
)

# Pre-flight estimation
estimate = client.estimate_session(planned_dispatches=32, avg_output_tokens=4000)
print(f"Estimated cost: ${estimate.estimated_cost_usd:.2f}")

# Live tracking
status = client.session_status()
print(f"Running total: ${status.total_cost:.2f}")

Budget caps raise BudgetExhaustedError when hit. Rate limits are respected automatically.

Observability and Auditing

Per-Window Metrics (Automatic)

Every crp.dispatch() records:

{
  "window_id": "w-a3f2c1",
  "session_id": "pentest-192.168.1.50",
  "parent_windows": ["w-b7e4d2"],
  "envelope_tokens": 4200,
  "saturation": 0.94,
  "extraction_stages_used": ["regex", "statistical"],
  "extraction_time_ms": 7,
  "facts_extracted": 12,
  "information_flow_rate": 0.0018,
  "quality_tier": "S",
  "gap_analysis": {"required": 7, "fulfilled": 7, "missing": 0},
  "continuation_triggered": false
}

Session Dashboard

Metric	Alert Threshold	What It Means
Total windows	>>2× your call count	Runaway continuations
Continuation rate	>30%	Tasks may be too large for one window
Average saturation	<60%	Extraction yield is low
Extraction yield	<2 facts/window	Content type may need different strategy
Stage escalation rate	>50%	Structured output would help

Window DAG Traceability

Every session produces a directed acyclic graph:

W1 (plan) → W2 (tool select) → W3 (analysis) → W4 (report) → W5 (report cont.)

Each node shows facts produced, facts consumed, information flow, and envelope saturation. Enables "why did it do that?" debugging by tracing decisions through the DAG.

Limitations and Trade-offs

Limitation	Severity	Mitigation
Extraction is lossy	MEDIUM	6-stage pipeline covers spectrum. ~85-90% recall on structured, ~70-80% on reasoning-dense, ~50-65% on implicit. See §7.6 for degradation model
Fact granularity mismatch	MEDIUM	Graduated pipeline from tight entities (regex) through relationships (UIE, discourse). Fact graph preserves inter-fact relationships
Hallucinations may pass fact gate	MEDIUM	Three-tier validation: structural, confidence, anomaly detection. Not perfect for structurally-valid hallucinations
Cold start	LOW	First window: empty envelope. First ~5 windows: calibrating. System bootstraps safely — never prematurely terminates
Not beneficial for single-turn	N/A	CRP adds zero value (and zero cost) for tasks that fit in one window

Why Large Context Windows Are Not Enough

"But my model has 1M context!" — Three problems:

Output limits are NOT 1M. Models with 1M input have output limits of 8K-32K. You still need continuation
Attention degrades with length. "Lost in the middle" means content at position 30K is invisible at position 200K (Liu et al., 2023)
Cost scales quadratically. Growing context = $O(N^2)$ total tokens. CRP envelopes = $O(N)$ linear scaling

But more fundamentally, context size is only 1 of CRP's 9 permanent value propositions:

#	Value Proposition	Why Native Context Cannot Provide It
1	Context Quality	CRP's scored, graph-structured envelopes put the right facts first. Raw text has no ranking
2	Task Isolation	One window per task. No cross-task attention contamination
3	Attention Optimization	Critical facts placed in the attention sink, not buried at position 500K
4	Cost Efficiency	$O(N)$ total tokens vs $O(N^2)$ for growing native context
5	Cross-Session Knowledge	CKF persists facts and reasoning across sessions
6	Structured Knowledge	Typed fact graph with edges, communities, temporal history
7	Multi-Agent Coordination	Envelope = structured state transfer between agents
8	Observability	Full provenance: every fact has source, confidence, lifecycle
9	Reasoning Amplification	Meta-learning scaffolds turn 2B models into reasoning systems

Even a model with infinite native context needs CRP for propositions 1-4, 6-9.

Scientific backing: "Retrieval can significantly improve the performance of LLMs regardless of their extended context window sizes" — Xu et al., ICLR 2024.

Specification Documents

The complete CRP v2.0 specification:

#	Document	Description	Lines
1	01_RESEARCH_FOUNDATIONS.md	Academic research backing — 9 research areas, 40+ papers, meta-learning, retrieval augmentation	~1,200
2	02_CORE_PROTOCOL.md	The core specification — 29 sections: axioms, state model, CKF, extraction, completion detection, quality tiers, hierarchical processing, meta-learning, security, concurrency, observability, deployment, publication	~6,800
3	03_CONTEXT_ENVELOPE.md	Context envelope — multi-phase scoring, CKF retrieval, source grounding, continuation envelopes	~1,200
4	04_TOKEN_GENERATION_PROTOCOL.md	Unbounded output — continuation, stitching, voice profiles, document maps, completion detection	~950
5	05_SYSTEM_WIDE_INTEGRATION.md	Integration architecture — 87+ call sites mapped, component inventory, migration strategy	~1,750
6	06_IMPLEMENTATION_PLAN.md	Implementation plan — phased rollout, 13 modules, ~3,890 lines of code planned	~2,000
7	07_SECURITY.md	Security architecture — threat model, input validation, fact integrity, RBAC, encryption, OWASP, quantum resistance	~1,300
8	08_MONETIZATION.md	Business model — PostgreSQL model (full capability free), 5 revenue pillars, competitive positioning	~2,000
9	09_DEPLOYMENT.md	Deployment — embedded library rationale, resource footprint, Lambda/K8s/MCP comparison, containerization	~2,000

Total specification: ~19,200 lines across 9 documents.

JSON Schemas

All API types are defined as JSON Schema (Draft 2020-12) for language-neutral consumption:

Schema	Description	Source
task-intent.json	`TaskIntent` — declarative, all-optional dispatch input	§6.10.2
quality-report.json	`QualityReport` — returned with every dispatch	§6.10.2
session-status.json	`SessionStatus` — session health snapshot	§6.10.2
cost-estimate.json	`CostEstimate` — pre-flight cost estimation	§6.10.2
envelope-preview.json	`EnvelopePreview` — inspect without dispatching	§6.10.2
session-handle.json	`SessionHandle` — returned by init()	§6.10.8
stream-event.json	`StreamEvent` — streaming dispatch events	§6.10.5
crp-error.json	`CRPError` — standard error format	§6.10.4
persisted-state-header.json	`PersistedStateHeader` — cold state versioning	§6.10.10

API Surface

CRP exposes a synchronous + async + streaming API. All operations use direct function invocation (not network RPC). SDKs MAY expose JSON-RPC or gRPC transports for cross-process access.

Core Operations

Operation	Stability	Description
`Client(provider=..., app_id=...)`	Stable	Create session, init subsystems, restore cold state
`dispatch(system_prompt, task_input, ...)`	Stable	Execute LLM window with envelope, extract facts
`dispatch_stream(...)`	Provisional	Streaming variant — emits token/extraction/continuation/done events
`ingest(raw_text, ...)`	Stable	Extract facts without LLM invocation (~7ms)
`session_status()`	Stable	Session health: windows, tokens, facts, budget remaining, cost
`estimate_session(...)`	Stable	Pre-flight cost estimation with USD pricing
`preview_envelope(...)`	Stable	Inspect what the envelope would contain
`configure(config)`	Stable	Update security/cost config (ADMIN)
`export_state(...)`	Provisional	Export encrypted session state
`close()`	Stable	Flush warm → cold, persist CKF, clean up

Error Taxonomy

Code	Error	Comparable To
1001	`BudgetExhaustedError`	gRPC `RESOURCE_EXHAUSTED`
1002	`RateLimitExceeded`	HTTP 429
1003	`SessionExpired`	gRPC `DEADLINE_EXCEEDED`
1005	`SessionClosed`	gRPC `FAILED_PRECONDITION`
1010	`ValidationError`	gRPC `INVALID_ARGUMENT`
1011	`SecurityInvariantError`	gRPC `ABORTED`
1012	`SignatureInvalidError`	gRPC `UNAUTHENTICATED`
1020	`ProviderError`	gRPC `INTERNAL`
1021	`ProviderTimeoutError`	gRPC `DEADLINE_EXCEEDED`
1030	`StateCorruptedError`	gRPC `DATA_LOSS`
1031	`ChainVerificationFailedError`	gRPC `DATA_LOSS`
Full error taxonomy with all codes in §6.10.4.

RBAC Roles (Planned)

Role	Permissions
OBSERVER	`session_status`, `estimate_session`
OPERATOR	All OBSERVER + `dispatch`, `ingest`, `preview_envelope`
ADMIN	All OPERATOR + `configure`, `reset_session`, `export_state`

RBAC is fully enforced in the SDK. Every dispatch, ingest, and admin operation checks RBACEnforcer.check_permission() and check_rate_limit() before proceeding. Default role is OPERATOR (dispatch + ingest). Set via CRPConfig(default_role="ADMIN") or CRPConfig(default_role="OBSERVER").

SDK Status

Language	Status	Package	Repository
Python	✅ v2.0.0	`pip install -e ".[dev]"`	This repository
TypeScript	📋 Planned	`npm install @crp/sdk`	`crp-typescript`
Rust	📋 Planned	`cargo add crp`	`crp-rust`

Python SDK — Quick Start

pip install -e ".[dev]"

import crp

# Zero-config: auto-detects LLM from environment
client = crp.Client()

# Or explicit: pass model name or provider
client = crp.Client(model="gpt-4o")
# client = crp.Client(provider=CustomProvider(...))

# Dispatch — CRP builds envelope, calls your LLM, extracts facts, returns raw output
output, report = client.dispatch(
    system_prompt="You are a security analyst.",
    task_input="Analyze the authentication flow in auth.py.",
)

print(output)                    # Unmodified LLM output (Axiom 9)
print(report.quality_tier)       # "S" | "A" | "B" | "C" | "D"
print(report.facts_extracted)    # Facts pulled from output
print(report.continuation_windows)  # How many continuation windows were used

Built-in Providers:

Provider	Import	Requirements
Auto-detect	`crp.Client()`	Set `OPENAI_API_KEY`, `ANTHROPIC_API_KEY`, or run Ollama
Custom (any LLM)	`crp.providers.CustomProvider`	None
OpenAI / Azure	`crp.providers.OpenAIAdapter`	`openai>=1.0`, `tiktoken`
Anthropic	`crp.providers.AnthropicAdapter`	`anthropic>=0.25`
Ollama	`crp.providers.OllamaAdapter`	Running Ollama instance
llama.cpp	`crp.providers.LlamaCppAdapter`	`llama-cpp-python` or HTTP server

Key Features:

351 tests passing (integration + benchmarks + unit)
Zero-config auto-detection (Client() or Client(model="..."))
Quality tier classification (S/A/B/C/D) on every dispatch
Zero-LLM ingestion (client.ingest())
Streaming dispatch (client.dispatch_stream())
Continuation with key findings context threading
Full observability (events, audit log, metrics export)
Budget enforcement (windows, input/output tokens)
State export (encrypted AES-256-GCM)

The specification is language-neutral. JSON Schemas in /schemas/ enable code generation for any language.

Comparison with Alternatives

Approach	What It Does	Limitation	In-Window Overhead
Naive Prompting	Everything in one window	Context contamination, attention collapse	None (quality degrades)
RAG	Retrieves relevant documents (flat vectors)	No output management, no continuation, no graph	Retrieved chunks only
MemGPT / Letta	Virtual memory via LLM self-management	LLM burns tokens managing its own memory	High (memory function calls)
GraphRAG	Knowledge graph + community summaries	Static offline indexing, no real-time extraction	Low (query overhead)
Sliding Window	Truncates old context	Early context permanently lost	Low (but lossy)
MCP	Standardized tool interface	Manages tool access, not tool output context	Very High (10K-50K schemas)
A2A	Inter-agent communication	Manages messages between agents, not context within	Varies
CRP	Task isolation + CKF + extraction envelopes + continuation	Extraction is imperfect	Zero

Key distinction: MCP and A2A solve different problems. MCP connects LLMs to tools. A2A connects agents to each other. CRP manages context within each agent. They are complementary and can be used together.

Hardware Requirements

Component	Size	Required?
Your LLM	Varies	Yes (already running)
all-MiniLM-L6-v2 (embeddings)	~80MB	Yes
ms-marco-MiniLM-L6-v2 (reranker)	~80MB	No (bi-encoder sufficient for <50 facts)
GLiNER (NER)	~200MB	No (lazy-loaded, degrades gracefully)
UIE (relations)	~400MB	No (lazy-loaded, degrades gracefully)

Minimum: Any machine running an LLM can run CRP. 80MB required + 0-680MB optional.

Configuration

CRP follows a 5-layer configuration hierarchy (see §25):

Layer 5: Runtime API  (highest priority)
Layer 4: Environment Variables
Layer 3: Session Config File
Layer 2: User Config File
Layer 1: Built-in Defaults  (lowest priority)

All configuration is optional. CRP works with zero configuration if you pass your LLM adapter directly.

Key Environment Variables

Variable	Default	Description
`CRP_ENABLED`	`true`	Master switch
`CRP_LLM_ENDPOINT`	—	Fallback LLM endpoint if no adapter passed
`CRP_LOG_ENVELOPES`	`false`	Debug: log envelope contents
`CRP_MAX_WINDOWS`	`100`	Session window limit
`CRP_TELEMETRY_FILE`	`crp_telemetry.jsonl`	Telemetry output path

Use Cases

Domain	How CRP Helps
Penetration Testing	Each tool selection, analysis, and report section gets a fresh window with full findings context
Report Generation	Unbounded-length reports with per-section windows, gap-aware continuation, quality-tiered output
Multi-Step Reasoning	Each step gets full context; prior conclusions carried as facts. ORC decomposes complex reasoning
Agentic Tool Use	Tool results extracted into facts immediately; next selection sees ALL discoveries, ranked
Code Generation	Large codebases across multiple windows; each sees full architecture via envelopes
Research & Analysis	Long-form analysis exceeding any single window; information flow detects genuine completion
Small Model Amplification	Meta-learning scaffolds enable 2B–7B models to perform reasoning they cannot do natively
Legal Document Analysis	Million-token contracts auto-ingested; cross-reference tracking via fact graph
Medical Literature Review	Cross-session knowledge accumulates across papers; community detection groups related findings

Roadmap

Phase 1: Open Specification ← We are here

Publish CRP v2.0 specification (9 documents, ~19,200 lines)
JSON Schema definitions for all API types
Reference SDK: Python (pip install crprotocol)
Benchmark results: CRP on vs. off across tasks and models
arXiv technical report with empirical evaluation

Phase 2: Ecosystem

JSON-RPC server mode — any language can use CRP over HTTP
TypeScript/JavaScript reference implementation
Integration guides: LangChain, LlamaIndex, AutoGen, CrewAI
MCP + CRP integration example
A2A + CRP integration example

Phase 3: Meta-Learning & Advanced Features

Source-Grounded Envelope engine
LLM-Driven Context Curation with progressive understanding
Reasoning Template Library (RTL)
Orchestrated Reasoning Chains (ORC)
Domain-specialized GLiNER models (cybersecurity, biomedical, legal, financial, regulatory)
Benchmark: reasoning amplification on 2B/7B vs. baseline

Phase 4: Adoption & Standards

IETF Internet-Draft submission
W3C Community Group: "Context Management for AI"
LF AI & Data project hosting
Conformance test suite
Community benchmark suite for context management quality

Contributing

We welcome contributions! See CONTRIBUTING.md for:

How to submit issues, spec clarifications, and pull requests
The RFC process for non-trivial specification changes
Code of conduct
Contributor License Agreement (CLA)

Governance

CRP follows an open governance model inspired by the Apache Software Foundation. See GOVERNANCE.md for:

Roles: Maintainers, Committers, Contributors
Decision-making process (consensus-seeking, lazy consensus for minor changes, formal vote for breaking changes)
Specification versioning and deprecation policy

Security

See SECURITY.md for:

Responsible disclosure policy
Security contact information
What constitutes a security vulnerability in CRP
How security issues in reference implementations are handled

The protocol's security architecture is documented in 07_SECURITY.md — covering threat modeling, input validation, fact integrity, RBAC, encryption at rest, OWASP mapping, and quantum resistance planning.

Community

GitHub Discussions: Join the conversation
GitHub Issues: Bug reports, spec clarifications, feature requests
General enquiries: info@crprotocol.io
Enterprise & licensing: contact@crprotocol.io

Built With

Component	Technology
Knowledge Layer	CKF — graph walk + pattern query + semantic fallback + community summaries, event-sourced history
Extraction	6-stage graduated blackboard-reactive pipeline (regex → TextRank → GLiNER → UIE → RST discourse → LLM-relational)
Source Grounding	Dual-layer envelopes — extracted facts paired with original text passages
Meta-Learning	ORC + ICML + RTL — structured reasoning scaffolding for small models
Embeddings	sentence-transformers/all-MiniLM-L6-v2 (~80MB, CPU)
Reranking	cross-encoder/ms-marco-MiniLM-L6-v2 (~80MB, ~500 pairs/sec on CPU)
Indexing	HNSW approximate nearest neighbor — O(log N) retrieval
Storage	Warm state (in-memory fact graph + event log) + CKF cold storage (SQLite WAL + vector DB + graph)
Coherence	Voice profiles, progressive document maps, degradation-triggered re-grounding
Validation	Pydantic v2 / JSON Schema Draft 2020-12

Positioning Statement

For developers building LLM-powered applications who need reliable context management across multiple LLM invocations, CRP (Context Relay Protocol) is an open protocol that provides structured knowledge extraction, cross-session persistence, and honest quality guarantees. Unlike ad-hoc prompt chaining, proprietary context APIs, or vector-only RAG, CRP offers a formally specified, LLM-agnostic, embedded-library protocol with a graduated extraction pipeline, graph-structured knowledge fabric, and transparent degradation model — all deployable with zero infrastructure overhead.

License

Context Relay Protocol (CRP) is the original work of Constantinos Vidiniotis, created in 2026.

Specification

The protocol specification documents are licensed under the Creative Commons Attribution-ShareAlike 4.0 International License (CC BY-SA 4.0). You may read, share, and adapt the specification with attribution. Full terms: https://creativecommons.org/licenses/by-sa/4.0/

Implementation Code

SDK and implementation code is licensed under the Elastic License 2.0 (ELv2). You may use CRP freely in your own applications. You may NOT offer CRP as a hosted/managed service without a commercial license.

Commercial Licensing

For enterprise licensing, managed-service rights, or OEM inquiries:

AutoCyber AI Pty Ltd · ABN 22 697 087 166 Email: contact@crprotocol.io · General: info@crprotocol.io · Web: crprotocol.io

Trademark

"Context Relay Protocol" is a trademark of Constantinos Vidiniotis (application pending, Class 9 — IP Australia). Use of the name to refer to this project is welcomed; use implying endorsement or affiliation without authorization is not permitted.

See LICENSE.md for the full license text.

Context Relay Protocol v2.0
Zero configuration. Unbounded input. Unbounded output. Amplified reasoning.
Better context at every scale. Honest degradation. Quality-tiered.
Peak quality. Every window.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

2.0.0

Apr 15, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crprotocol-2.0.0.tar.gz (997.3 kB view details)

Uploaded Apr 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

crprotocol-2.0.0-py3-none-any.whl (446.3 kB view details)

Uploaded Apr 15, 2026 Python 3

File details

Details for the file crprotocol-2.0.0.tar.gz.

File metadata

Download URL: crprotocol-2.0.0.tar.gz
Upload date: Apr 15, 2026
Size: 997.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for crprotocol-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`b120097f5e423ec52f50b5bb4f045dd9cc37cedc1721b5635b5ec1ed09d38b8d`
MD5	`26573698bf4f0f0299645c4f7f963218`
BLAKE2b-256	`d1612d051b67ab5d8ee0892eff7cbea1081c66f52b8378dfae31b4b1f50cca78`

See more details on using hashes here.

File details

Details for the file crprotocol-2.0.0-py3-none-any.whl.

File metadata

Download URL: crprotocol-2.0.0-py3-none-any.whl
Upload date: Apr 15, 2026
Size: 446.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.12

File hashes

Hashes for crprotocol-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c981e4953521818be4a2bc364786d9ea0b282f0d8dc25dfc7228262e16465ec5`
MD5	`5550aa44fd56c207eb3c8316f3e9a3bc`
BLAKE2b-256	`45feba5bec6039e7ffa4af4402a2b7bd00422e597d54b844f53d23ffc281e2d6`

See more details on using hashes here.

crprotocol 2.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Context Relay Protocol (CRP)™

Table of Contents

The Problem

What CRP Does

Key Differentiators

Quick Start

Minimal Integration (3 lines)

Explicit Provider

Model Name Shortcut

Local Models (Zero-Config)

llama.cpp / vLLM

Any Custom Setup

Direct Ingestion (No LLM Window)

LLM Compatibility

Configuration

Async Support

How CRP Works

Four Core Mechanisms

1. Task Isolation

2. Context Envelopes + Knowledge Fabric

3. Multi-Signal Completion Detection

4. Envelope-Based Continuation

What CRP Sends to Your LLM

Output Guarantee

Architecture Overview

Core Capabilities

Unbounded Context: Input > Model's Window

Unbounded Generation: Output > Model's Limit

Peak Quality Per Window

Concurrency Model

Inter-LLM Context Sharing (HTTP Sidecar)

Why This Matters

Quick Start

Example: Two LLMs Sharing Knowledge

Full Endpoint Reference

Security Model

CLI Options

Integration with CRP Protocol

End-to-End Example: A Penetration Test

Step 1: Planning

Step 2: Tool Selection

Step 3: Tool Execution + Ingestion

Step 4: Analysis

Step 5: Report Generation + Continuation

Total CRP Overhead

CRP in the AI Stack

The Three-Layer Architecture

Extraction Quality

Efficiency and Cost

Per-Window Overhead

Token Efficiency: CRP vs MCP

Cloud API Cost

Real-World: 200-Page Textbook Generation

Cost Controls

Observability and Auditing

Per-Window Metrics (Automatic)

Session Dashboard

Window DAG Traceability

Limitations and Trade-offs

Why Large Context Windows Are Not Enough

Specification Documents

JSON Schemas

API Surface

Core Operations

Error Taxonomy

RBAC Roles (Planned)

SDK Status

Python SDK — Quick Start

Comparison with Alternatives

Hardware Requirements

Configuration