Skip to main content

Governance middleware for LangChain 1.0 agents — powered by axor-core compression engines

Project description

axor-langchain

CI PyPI Python License: MIT

Cut token costs 40–80% in LangChain multi-agent pipelines.

One middleware. No graph changes. Works with any create_agent() agent.


The problem

LangChain agents accumulate messages. By turn 10 you're paying for:

  • Tool outputs from 8 turns ago that nobody needs
  • Repeated context that hasn't changed
  • Intermediate reasoning that's already been acted on

A 10-node research pipeline can balloon from 5k to 80k tokens by the last node — and you're billed for all of it on every API call.


Installation

pip install axor-langchain

Quick start

from langchain.agents import create_agent
from axor_langchain import AxorMiddleware

# before: bare agent
agent = create_agent("anthropic:claude-sonnet-4-5", tools=tools)

# after: governed agent — one line change
axor = AxorMiddleware(soft_token_limit=100_000, verbose=True)
agent = create_agent(
    "anthropic:claude-sonnet-4-5",
    tools=tools,
    middleware=[axor],
)

result = await agent.ainvoke({"messages": [("user", "research transformers")]})
print(f"Tokens spent: {axor.total_tokens_spent}")

What it does

Context compression

Before each model call, AxorMiddleware compresses the message history based on session length:

Session length Mode Window Tool output cap
≤ 6 messages broad all 8,000 chars
7–20 messages moderate last 16 2,000 chars
21+ messages minimal last 6 800 chars

The longer the session, the more aggressively old context is compressed. Recent messages are always kept. System messages are never dropped.

Typical savings:

Turn  1:  1,200 tokens  (no compression yet)
Turn  5:  1,800 tokens  (moderate: old tools truncated)
Turn 10:  2,100 tokens  (minimal: only recent 6 messages)
Turn 20:  2,300 tokens  (stable — doesn't keep growing)

Without axor:
Turn 20: 45,000 tokens  (full history accumulated)

Tool governance

Filter which tools each agent can call — without changing the graph:

# research agent: read + search only, no write/bash
axor = AxorMiddleware(
    allowed_tools=["search", "read", "web_search"],
)

# audit agent: read only
axor = AxorMiddleware(
    denied_tools=["write", "bash", "delete"],
)

Budget tracking

Hard stop when token limit is reached — no surprise bills:

axor = AxorMiddleware(
    soft_token_limit=80_000,   # log warning
    hard_token_limit=100_000,  # stop agent, return partial result
)

Pinned personality

Personality is always the first system message — survives compression:

axor = AxorMiddleware(
    personality="You are a security-focused code reviewer. "
                "Always check for injection risks and hardcoded secrets.",
)

Cross-session memory (optional)

pip install axor-langchain[memory]
from axor_memory_sqlite import SQLiteMemoryProvider

provider = SQLiteMemoryProvider("~/.axor/memory.db")
axor = AxorMiddleware(
    memory_provider=provider,
    memory_namespace="research-agent",
)
# after each session: last assistant message saved to SQLite
# next session: load with provider.load(MemoryQuery(...))

Anonymous telemetry (opt-in)

pip install axor-langchain[telemetry]
# explicit kwarg
axor = AxorMiddleware(telemetry="local")    # append to local JSONL queue
axor = AxorMiddleware(telemetry="remote")   # also ship to telemetry.useaxor.net

# or env (no code change)
# AXOR_TELEMETRY=local  or  AXOR_TELEMETRY=remote

What gets sent (only with remote, only when opted in):

  • chosen signal (focused_generative, etc), classifier confidence, tokens spent
  • 128-int MinHash fingerprint of the task — non-reversible
  • whether policy was corrected mid-run, axor_version

Never sent: raw task text, file contents, tool arguments, secrets, user/session IDs. IP is hashed SHA-256 truncated to 16 chars, used only for rate-limit buckets.

Live community aggregates and the full data contract: telemetry.useaxor.net/stats. Suppress the one-time opt-in notice with AXOR_NO_BANNER=1.

Default is off — nothing leaves your machine without an explicit opt-in.

Small context bypass

By default, contexts under 4,000 tokens skip the compression pipeline entirely. This avoids overhead on small/early turns where compression can't save more than it costs:

# default: auto-bypass for small contexts (recommended)
axor = AxorMiddleware(soft_token_limit=100_000)

# disable bypass — always compress (aggressive savings, may add overhead on small turns)
axor = AxorMiddleware(bypass_token_threshold=0)

# custom threshold
axor = AxorMiddleware(bypass_token_threshold=8000)

Budget tracking and tool governance still apply even when compression is bypassed.

Impact on savings (real data from claude-sonnet-4-6 benchmark):

Without bypass With bypass (4000)
Total savings (4t + 8t combined) +26.4% +24.8%
Risk of negative savings on small contexts Yes (-9% at 6t) No (0% — passed through)
Large context savings (8t+) +26-48% +26-48% (unchanged)

Bypass trades ~1.6% total savings for stable, predictable behavior — you never pay more than without axor.


LangGraph integration

Works with any LangGraph StateGraph that uses LangChain agents as nodes:

from langgraph.graph import StateGraph, END
from langchain.agents import create_agent
from axor_langchain import AxorMiddleware

# each node gets its own governance config
research_axor = AxorMiddleware(
    allowed_tools=["search", "web_search"],
    soft_token_limit=50_000,
    verbose=True,
)
writer_axor = AxorMiddleware(
    allowed_tools=["read", "write"],
    soft_token_limit=30_000,
)

research_agent = create_agent(
    "anthropic:claude-sonnet-4-5",
    tools=[search_tool, web_search_tool],
    middleware=[research_axor],
)
writer_agent = create_agent(
    "anthropic:claude-sonnet-4-5",
    tools=[read_tool, write_tool],
    middleware=[writer_axor],
)

workflow = StateGraph(State)
workflow.add_node("research", research_agent)
workflow.add_node("write",    writer_agent)
workflow.add_edge("research", "write")
workflow.add_edge("write", END)

app = workflow.compile()
result = await app.ainvoke({"messages": [...]})

print(f"Research tokens: {research_axor.total_tokens_spent}")
print(f"Writer tokens:   {writer_axor.total_tokens_spent}")

Per-node governance: each agent compresses its own context independently.


Configuration reference

AxorMiddleware(
    soft_token_limit=None,           # int | None — warning threshold
    hard_token_limit=None,           # int | None — stop threshold (default: soft * 1.5)
    compression_mode="auto",         # "auto" | "minimal" | "moderate" | "broad"
    bypass_token_threshold=4000,     # int — skip compression below this token count
    allowed_tools=None,              # list[str] | None — whitelist
    denied_tools=None,               # list[str] | None — blacklist
    personality=None,                # str | None — pinned system message
    memory_provider=None,            # MemoryProvider | None
    memory_namespace="axor",         # str
    tool_error_handler=None,         # Callable[[str, Exception], str] | None
    tool_max_retries=0,              # int — extra retry attempts
    tool_retry_delay=0.0,            # float — seconds between retries
    track_tool_stats=False,          # bool — per-tool call/latency/error tracking
    verbose=False,                   # bool — log governance decisions
    telemetry=None,                  # "off" | "local" | "remote" | None (AXOR_TELEMETRY env)
)

Difference from axor-claude

axor-claude axor-langchain
Provider Anthropic only any (OpenAI, Anthropic, Google…)
Framework axor-core GovernedSession LangChain create_agent()
Governance depth full (context shaping, IntentLoop) middleware (message compression, tool filter)
Best for standalone coding agents multi-agent LangGraph pipelines

Requirements

  • Python 3.11+
  • langchain >= 1.0.0
  • langgraph >= 1.0.0

License

MIT


Benchmarks

Live results (claude-sonnet-4-6, 3-node research pipeline)

Real API calls, real usage_metadata token counts. Pipeline: planner → researcher → writer. Default bypass_token_threshold=4000 — small contexts pass through without compression.

Per-node breakdown (8 turns, auto mode):

Node Without axor With axor Saved
planner 13,678 tok 7,112 tok 48.0%
researcher 27,677 tok 19,750 tok 28.6%
writer 44,963 tok 36,811 tok 18.1%
TOTAL 86,318 tok 63,673 tok 26.2%

Writer sees all accumulated context from planner + researcher — this is where token explosion happens in production.

Across configurations:

Prior turns Mode Without axor With axor Savings $/10K runs saved
4 turns auto 28,366 tok 20,717 tok 27.0% $274
8 turns auto 86,318 tok 63,673 tok 26.2% $733
8 turns minimal 65,243 tok 52,451 tok 19.6% $438

Pricing: claude-sonnet-4-6 @ $3/M input, $15/M output tokens. Results vary between runs due to LLM non-determinism. Use --runs 3 for averaged results.

Bypass impact (calculated from real data across 4t + 8t runs):

Without bypass With bypass (default)
Total savings +26.4% +24.8%
Negative savings risk Yes No
Large context savings +26-48% +26-48% (same)

~1.6% less total savings, but guaranteed no overhead on small contexts.

Simulated benchmark (no API key needed)

Tests all middleware features: compression, tool governance, budget, tool retry, bypass detection.

python benchmark/run.py                         # all 17 scenarios
python benchmark/run.py --scenario bypass       # test bypass only
python benchmark/run.py --json                  # CI-friendly output

Live benchmark

export ANTHROPIC_API_KEY=sk-ant-...
python benchmark/live_graph.py --provider anthropic --turns 8
python benchmark/live_graph.py --provider anthropic --runs 3  # averaged

# OpenAI
export OPENAI_API_KEY=sk-...
python benchmark/live_graph.py --provider openai
Flag Default Description
--provider anthropic anthropic or openai
--model claude-sonnet-4-6 / gpt-4.1-mini Override model
--task research topic Task for the agent
--mode auto Compression mode
--turns 6 Prior history turns
--runs 1 Number of runs for averaging
--no-axor Baseline only
--axor-only axor run only

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

axor_langchain-0.3.1.tar.gz (34.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

axor_langchain-0.3.1-py3-none-any.whl (17.6 kB view details)

Uploaded Python 3

File details

Details for the file axor_langchain-0.3.1.tar.gz.

File metadata

  • Download URL: axor_langchain-0.3.1.tar.gz
  • Upload date:
  • Size: 34.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for axor_langchain-0.3.1.tar.gz
Algorithm Hash digest
SHA256 6249f99dace014079a86be1722343e8c360b2568ead24c1e2768e93f5f3dc237
MD5 55b2f0445a0dc21433feb93591069737
BLAKE2b-256 c99b23c193fefe6e9e228a987708b209e4937f0f686f2688b9c0229510e3e81d

See more details on using hashes here.

Provenance

The following attestation bundles were made for axor_langchain-0.3.1.tar.gz:

Publisher: ci.yml on Bucha11/axor-langchain

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file axor_langchain-0.3.1-py3-none-any.whl.

File metadata

  • Download URL: axor_langchain-0.3.1-py3-none-any.whl
  • Upload date:
  • Size: 17.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for axor_langchain-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 cbeca757b3a6c2c5f6c62e1a10783abae0cf493fec7f78b85fb9736d94642f61
MD5 8180effef9e3e54229d7e2acbc26fa13
BLAKE2b-256 8fe06afa137f38626353fb2d94fa50c9bfaab98825b3141cd8d8b8c2f6196431

See more details on using hashes here.

Provenance

The following attestation bundles were made for axor_langchain-0.3.1-py3-none-any.whl:

Publisher: ci.yml on Bucha11/axor-langchain

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page