Model-agnostic, MCP-native agent harness
Project description
JeevesAgent
A model-agnostic, MCP-native, fully-async agent harness with memory done right.
from jeevesagent import Agent
agent = Agent("You are a helpful assistant.", model="claude-opus-4-7")
result = await agent.run("What's 2 + 2?")
print(result.output) # "4"
That's the whole quickstart. Set ANTHROPIC_API_KEY and you're talking
to Claude. Swap "claude-opus-4-7" for "gpt-4o" to talk to GPT, or
"echo" to use the zero-key fake (echoes the prompt — useful for
tests and local dev). Memory, runtime, telemetry, sandbox, audit are
all opt-in behind the same Agent constructor.
⚠️
modelis required as of v0.2.0. Earlier0.1.xreleases silently defaulted toEchoModelwhich produced confusing output; now the harness fails fast with a helpful error if you forget.
Why this exists
Every agent framework today forces a choice you shouldn't have to make:
- LangChain / LangGraph lock you into a graph editor and a specific state model. Production teams report runaway loops, opaque debugging, and brittle abstractions.
- Claude Agent SDK is excellent if you're committed to Anthropic forever. It's not model-agnostic.
- OpenAI Assistants is a black box you don't run yourself.
- CrewAI / AutoGen are abstractions over LangChain — same problems.
JeevesAgent is the harness for engineers who want to ship production agents without binding their stack to one model lab. It's:
- Model-agnostic — Anthropic, OpenAI, and ~100 more via LiteLLM
(Mistral, Cohere, Bedrock, Vertex, Together, Ollama, Gemini, Groq,
Replicate, Azure …) behind one
Modelprotocol. String-based resolver:model="claude-opus-4-7","gpt-4o","mistral-large","command-r-plus", … — no decision lock-in. - Pluggable architectures — the agent loop is a strategy.
Twelve shipped: ReAct (default), SelfRefine, Reflexion,
TreeOfThoughts, PlanAndExecute, ReWOO (single-agent); Router,
Supervisor, ActorCritic, MultiAgentDebate, Swarm,
BlackboardArchitecture (multi-agent). Same
Agentsurface; one kwarg flips the iteration pattern. - MCP-native — MCP isn't an integration, it's the spine. Plug
Jeeves Gateway, Composio, or any MCP server into a single
MCPRegistryand your tools just work. - Memory done right — five backends (in-memory, vector, Chroma, Postgres+pgvector, Redis), pluggable embedders (HashEmbedder for zero-key, OpenAIEmbedder for production), and bi-temporal facts that track when claims were true in the world vs when you learned them — the Zep-style memory wedge, with native fact stores in every backend.
- Durable —
SqliteRuntimegives you crash-recovery replay with zero infrastructure. DBOS / Temporal adapters land next. - Observable — every step emits OpenTelemetry spans and metrics. Drop in your existing exporter; Honeycomb / Datadog / LangSmith just work.
- Safe — permission policies, sandbox layers, append-only HMAC-signed audit log, freshness/lineage policies for certified values.
- Async-only, structured concurrency only — anyio everywhere; zero
raw
asyncio.create_task/gather. Parallel tool dispatch via task groups. Backpressure-aware streaming via memory-object streams.
Three principles govern every line of code:
- The loop is deterministic; the world isn't. Every side effect
goes through
runtime.step(...)so it can be cached and replayed. - Trust boundary stays outside the sandbox. The harness runs the tools inside a sandbox; the harness doesn't run inside one.
- Validate state on write, not on read. Pydantic everywhere.
Install
pip install jeevesagent
# Pick the extras you need:
pip install 'jeevesagent[anthropic]' # Claude
pip install 'jeevesagent[openai]' # GPT
pip install 'jeevesagent[postgres]' # PostgresMemory + facts
pip install 'jeevesagent[mcp]' # real MCP client
pip install 'jeevesagent[otel]' # OpenTelemetry exporters
# Or install everything for development:
pip install -e '.[dev,anthropic,openai,mcp,postgres,otel]'
Requires Python 3.11+.
30-second quickstart
import asyncio
from jeevesagent import Agent, tool
@tool
async def get_weather(city: str) -> str:
"""Look up the current weather."""
return f"It's sunny and 72°F in {city}."
async def main():
agent = Agent(
"You are a travel assistant.",
model="claude-opus-4-7", # or "gpt-4o", or any Model instance
tools=[get_weather],
)
result = await agent.run("What's the weather like in Tokyo?")
print(result.output)
print(f"Used {result.tokens_in + result.tokens_out} tokens, ${result.cost_usd:.4f}")
asyncio.run(main())
Set ANTHROPIC_API_KEY (or OPENAI_API_KEY) before running. That's
it — no LangChain, no LangGraph, no chat_engine = AgentExecutor.from_llm_and_tools(...).
Want to see what's happening as the agent runs?
async for event in agent.stream("plan a 3-day Tokyo trip"):
print(f"[{event.kind}] {event.payload}")
You'll see STARTED → MODEL_CHUNK × N → TOOL_CALL → TOOL_RESULT → MODEL_CHUNK × N → COMPLETED flow through.
Architectures: the agent loop is a strategy
The default loop is ReAct (observe / think / act). When that doesn't fit your problem, swap it with one kwarg — everything else (model, memory, tools, budget, telemetry, runtime) stays exactly the same.
Single-agent loops: pass architecture=
from jeevesagent import Agent
agent = Agent("...", model="claude-opus-4-7") # ReAct default
agent = Agent("...", model="...", architecture="self-refine") # iterate until critic happy
agent = Agent("...", model="...", architecture="reflexion") # verbal RL with lessons
agent = Agent("...", model="...", architecture="plan-and-execute") # plan once, execute steps
agent = Agent("...", model="...", architecture="rewoo") # plan + parallel tools, 30-50% cheaper
agent = Agent("...", model="...", architecture="tree-of-thoughts") # BFS beam over candidate thoughts
Multi-agent teams: use Team builders (the ergonomic facade)
Team mirrors the builder shape every other framework uses
(create_supervisor / Crew / GroupChatManager) so migrating from
LangGraph / CrewAI / AutoGen / OpenAI Agents SDK is muscle-memory.
Each builder returns a regular Agent — same .run() / .stream()
interface, no special calling convention.
from jeevesagent import Agent, Team, RouterRoute
# Coordinator + workers; the manager calls delegate(...) or forward_message(...)
team = Team.supervisor(
workers={"researcher": researcher, "writer": writer, "reviewer": reviewer},
instructions="manage the pipeline",
model="claude-opus-4-7",
)
# Classify-and-dispatch — cheaper than Supervisor when one specialist
# is enough (1 classifier call + 1 specialist run, no synthesis pass)
team = Team.router(
routes=[
RouterRoute(name="billing", agent=billing, description="..."),
RouterRoute(name="tech", agent=tech, description="..."),
],
instructions="customer support entry point",
model="claude-haiku-4-5",
)
# Peer agents passing control via typed handoffs (input_type= for
# structured payloads, input_filter= for selective history pruning)
team = Team.swarm(
agents={"triage": triage, "billing": billing, "tech": tech},
entry_agent="triage",
model="claude-opus-4-7",
)
# Actor + critic with different models for blind-spot diversity
team = Team.actor_critic(
actor=Agent("...", model="claude-opus-4-7"),
critic=Agent("...", model="gpt-4o"), # different model
max_rounds=3,
approval_threshold=0.9,
model="claude-opus-4-7", # coordinator
)
# N debaters + optional judge with similarity-based early termination
team = Team.debate(
debaters=[optimist, skeptic, analyst],
judge=cio,
rounds=2,
convergence_similarity=0.85,
model="claude-opus-4-7",
)
# Coordinator + agents share a workspace; decider synthesizes
team = Team.blackboard(
agents={"hypothesis": h_agent, "evidence": e_agent, "critic": c_agent},
coordinator=coord_agent,
decider=decider_agent,
model="claude-opus-4-7",
)
Recursive composition (the differentiator)
Architectures wrap each other naturally — the property no
sibling-only framework gives you. Wrap a Supervisor in Reflexion for
cross-session learning of delegation patterns; nest Supervisors for
hierarchical teams; wrap an entire pipeline in Reflexion to retry
on low scores:
from jeevesagent import Agent, Reflexion, Supervisor
agent = Agent(
"...",
model="claude-opus-4-7",
architecture=Reflexion(
base=Supervisor(workers={"researcher": ..., "writer": ...}),
max_attempts=3,
threshold=0.85,
lesson_store=InMemoryVectorStore(embedder=HashEmbedder()), # selective recall
),
)
The explicit nested form (Agent(architecture=...)) and Team
builders are interchangeable — Team.supervisor(workers={...}) is
exactly Agent(architecture=Supervisor(workers={...})) under the
hood. Use Team for single-level teams (matches what you've seen
in other frameworks); use the nested form for recursive composition.
Standalone testing of orchestrators
from jeevesagent import Supervisor, run_architecture
sup = Supervisor(workers={"a": agent_a})
result = await run_architecture(sup, "do the thing", model="claude-opus-4-7")
Architectures are pluggable via the Architecture protocol — three
methods (name, run, declared_workers) and you have a custom
strategy. See Subagent.md for the full design
rationale.
Architecture cheat sheet
Visual reference for picking the right pattern. Each diagram shows the actual data flow + LLM-call structure for that architecture.
Single-agent loops
ReAct — observe / think / act loop. The default. One model call per turn; tools dispatch in parallel.
┌────────── loop until no tool calls ──────────┐
│ │
prompt ───► Model ───► tool calls? ──yes──► run tools ──► results
│ (parallel)
└─────────► no calls ───► final output
SelfRefine — single-agent generate → critique → refine. Same model wears both hats.
prompt ───► generate ───► critique ──┬── score ≥ threshold ──► output
▲ │
│ └── below ──► refine ──┐
│ │
└────────────────────────────────┘
Reflexion — wraps any base architecture with verbal-RL retry. Failed attempts produce a "lesson" stored in memory or a vector store; next attempt sees the relevant lessons.
┌─────────── attempt loop (max_attempts) ───────────┐
│ │
│ prompt ──► [recall lessons] ──► base.run() ──► evaluator
│ │
│ score < threshold?
│ │
│ yes ──┴── no ──► output
│ │
│ reflector ──► lesson
│ │
└────────────────────────────────── persist ─────────┘
│
memory block OR vector store (selective recall)
TreeOfThoughts — BFS beam search over candidate thoughts. Proposer + evaluator at every depth; beam keeps top-k; min_score floor drops weak branches early.
proposer (×branch_factor) evaluator
prompt ──► [t1, t2, t3] ──score──► [0.9, 0.4, 0.7]
│
keep top beam_width
drop below min_score
│
▼
[t1, t3] ←── frontier for depth 2
│
(repeat to max_depth)
│
▼
best leaf wins
PlanAndExecute — planner emits a step list once; executor walks each step; synthesizer composes the final answer.
prompt ───► planner ───► [step1, step2, step3]
│
▼
executor (per step) ───► [r1, r2, r3]
│
▼
synthesizer ───► output
ReWOO — like PlanAndExecute but the planner emits structured tool calls with {{En}} placeholders, and independent steps run in parallel. Two LLM calls + N tool calls — 30-50% cheaper than ReAct on tool-heavy workloads.
prompt ───► planner ───► [search({{E1}}), fetch({{E2}}=search.url)]
│
▼
parallel tool dispatch
(independent steps run concurrently;
dependent steps wait for {{En}})
│
▼
synthesizer ───► output
Multi-agent teams
Router — classify-and-dispatch. ONE classifier call decides which specialist runs; that one specialist owns the answer.
┌── refund_agent
prompt ──► classifier ──► technical_agent ◄── only ONE
└── faq_agent ◄── chosen runs
1 classifier call + 1 specialist run. The cheapest multi-agent pattern.
Supervisor — coordinator + workers, glued by a delegate(worker, instructions) tool. Multiple delegations in one supervisor turn run in parallel. forward_message(worker) returns a worker's output verbatim with no synthesis.
prompt ───► manager ───► delegate(...) ─┬─► worker A ─┐
│ ├─► worker B ─┤ parallel
│ └─► worker C ─┤
▼ │
[worker outputs] ◄─────────────┘
│
├─► synthesize ──► output
│
└─► forward_message(worker) ──► verbatim output
ActorCritic — actor + critic pair (use different models for blind-spot diversity). Critic returns structured JSON {score, issues, summary}; actor refines below threshold.
prompt ───► actor ───► critic ──┬── score ≥ threshold ──► output
▲ │
│ └── below ──► refine (apply rubric)
│ │
└──────────── max_rounds cap ──────┘
MultiAgentDebate — N debaters argue across rounds (in parallel each round). Jaccard convergence detects early agreement; optional judge synthesizes the final answer.
prompt ──► [debater1, debater2, debater3] ◄── round 1 (parallel)
│
converged? (Jaccard ≥ 0.85)
yes ───► output
no ───► [responses fed back]
│
[debater1, debater2, debater3] ◄── round 2 (sees prior)
│
▼
judge ──► output (or majority vote if no judge)
Swarm — peer agents handing off control via a handoff tool (or per-target transfer_to_<name> tools when peers are wrapped in Handoff with an input_type). No central coordinator.
prompt ──► agent A
│
│ handoff(B, payload)
▼
agent B
│
│ transfer_to_C(typed_args)
▼
agent C ──► final output
▲
│ cycle detection: A→B→A→B kills the loop
│ max_handoffs caps total depth
BlackboardArchitecture — agents collaborate via a shared mutable workspace. Coordinator picks who acts next; decider says when work is done.
┌───────────── shared blackboard ─────────────┐
│ facts · hypotheses · partial results │
└────▲──────▲──────▲──────▲────────────▲───────┘
│ r/w │ r/w │ r/w │ r/w │
│ │ │ │ │
prompt ──► coordinator ──► picks who acts next │
│ │ │ │ │
agent A agent B agent C │
│ │ │ │ │
▼ ▼ ▼ ▼ │
decider ◄────────────────┘
│
├─ done? ──► output
│
└─ not done ──► next round
Recursive composition
Any architecture can wrap any other. The killer combination: Reflexion of Supervisor — the team learns across attempts which worker handles which intent best.
┌────── Reflexion attempt loop ──────┐
│ │
│ prompt ──► Supervisor ──► output ─┤── score ≥ threshold ──► done
│ (manager + 3 workers) │
│ │
│ └── below ──► lesson ──► retry
│ │
└────────────────────────────────────────────────────────────────┘
agent = Agent(
"...",
model="claude-opus-4-7",
architecture=Reflexion(
base=Supervisor(workers={"researcher": ..., "writer": ..., "reviewer": ...}),
lesson_store=InMemoryVectorStore(embedder=HashEmbedder()), # selective recall
),
)
Capability matrix
| Capability | What you get | Where |
|---|---|---|
| Architecture protocol | Pluggable agent-loop strategy: 12 architectures shipped | Architecture, ReAct, SelfRefine, Reflexion, TreeOfThoughts, PlanAndExecute, ReWOO, Router, Supervisor, ActorCritic, MultiAgentDebate, Swarm, BlackboardArchitecture |
| Team facade | Sibling-style builders (Team.supervisor, Team.swarm, Team.router, Team.debate, Team.actor_critic, Team.blackboard) for the common multi-agent shapes |
Team, Handoff, run_architecture |
| Vector store | add / search / delete with Mongo-style filters, MMR diversity, BM25 hybrid search, save/load |
InMemoryVectorStore, ChromaVectorStore, PostgresVectorStore, FAISSVectorStore, SearchResult |
| Document loader | One-line load for PDF / DOCX / Excel / CSV / HTML / Markdown into chunks | jeevesagent.loader.load, MarkdownChunker, RecursiveChunker, SentenceChunker, TokenChunker |
| Built-in tools | read / write / edit / bash factories with sandbox-aware workdirs |
read_tool, write_tool, edit_tool, bash_tool, default_workdir |
| Model adapters | Anthropic, OpenAI, LiteLLM (~100 providers), Echo (zero-key), Scripted (tests) | jeevesagent.AnthropicModel, OpenAIModel, LiteLLMModel, EchoModel, ScriptedModel |
| String model resolver | model="claude-opus-4-7", "gpt-4o", "mistral-large", "command-r", "echo", "litellm/<any>" |
Agent.__init__ |
| Tools | @tool decorator with auto-schema, sync + async; agent.with_tool decorator; add_tool / remove_tool / tools_list |
jeevesagent.tool, Tool |
| MCP servers | stdio + Streamable HTTP, multi-server registry, name disambiguation | MCPRegistry, MCPServerSpec |
| Jeeves Gateway | One-line: tools=JeevesGateway.from_env() |
jeevesagent.jeeves |
| Memory backends | In-memory dict, vector cosine, Chroma, Postgres+pgvector, Redis | InMemoryMemory, VectorMemory, ChromaMemory, PostgresMemory, RedisMemory |
| Embedders | HashEmbedder (deterministic, zero deps), OpenAIEmbedder, VoyageEmbedder, CohereEmbedder | HashEmbedder, OpenAIEmbedder, VoyageEmbedder, CohereEmbedder |
| Bi-temporal facts | All five memory backends. LLM-driven Consolidator. Auto-consolidate, plus ConsolidationWorker for long-lived agents. |
Fact, Consolidator, *FactStore |
| Durable runtime | sqlite or postgres-backed replay across process restarts | SqliteRuntime, PostgresRuntime, JournaledRuntime |
| Streaming | agent.stream() → AsyncIterator[Event] with backpressure |
Agent.stream |
| Permissions | mode-based + allow/deny lists, mirrors Claude Agent SDK | StandardPermissions, Mode |
| Hooks | @agent.before_tool / @agent.after_tool decorators |
HookRegistry |
| Sandbox | FilesystemSandbox blocks path-arg escapes; SubprocessSandbox for full isolation |
FilesystemSandbox, SubprocessSandbox |
| Budget | Per-token / per-cost / per-wall-clock limits with soft warnings | StandardBudget, BudgetConfig |
| Telemetry | OpenTelemetry spans + metrics for every milestone | OTelTelemetry |
| Audit log | HMAC-signed JSONL or in-memory; tracks every tool call | FileAuditLog, InMemoryAuditLog |
| Certified values | Freshness + lineage policies | FreshnessPolicy, LineagePolicy |
| Declarative config | Build agents from TOML or dicts | Agent.from_config(path), Agent.from_dict(cfg) |
Documentation
| Doc | What's there |
|---|---|
docs/quickstart.md |
Step-by-step examples for each backend combo |
docs/recipes.md |
Production patterns: persistent memory, MCP, durable replay, audit |
docs/architecture.md |
Module tour, lifecycle, extension points |
docs/migration_0.1_to_0.2.md |
What changed in 0.2.0; how to migrate |
Subagent.md |
Architecture-protocol design rationale; full 14-architecture catalogue (the 5 shipped, the 9 candidates) |
project.md |
The full engineering plan (the design doc) |
BUILD_LOG.md |
Slice-by-slice changelog |
examples/ |
26 runnable scripts: 00_hello–19_rewoo cover every architecture; 20_rag_supervisor–22_rag_with_loader are RAG patterns; 23_coding_agent, 24_support_triage, 25_document_pipeline, 26_devops_diagnostic are real-world use cases with permissions / audit / budget wired up |
Status
- 743 tests pass in ~6 seconds (5 env-gated integrations skip
without
JEEVES_TEST_PG_DSN/JEEVES_TEST_REDIS_URL) - mypy
--strictclean across 95 production source files - ruff clean including
flake8-asynclints - v0.5 ships the full vector-store stack (
InMemoryVectorStore/ Chroma / Postgres / FAISS, all with Mongo-style filters, MMR diversity, BM25 hybrid search, persistence), the document loader with chunking strategies, theTeamfacade for ergonomic multi-agent construction, and 12 architectures with selective lesson recall (Reflexion), typed handoffs (Swarm), forward_message (Supervisor), Jaccard convergence (Debate), and parallel proposer/evaluator with min_score floor (TreeOfThoughts).
Verify your install
git clone <repo>
cd jeevesagent
pip install -e '.[dev]'
ruff check jeevesagent
mypy --strict jeevesagent
pytest tests/ -v
You should see 743 passed. Five integration tests skip without
JEEVES_TEST_PG_DSN / JEEVES_TEST_REDIS_URL / API-key env vars set.
Contributing
The harness has a strict CI gate: ruff + mypy --strict + pytest. All
three must pass. Async-only — every public function returning anything
other than a value is async. Every fan-out uses anyio task groups.
Zero raw asyncio.create_task or asyncio.gather calls.
See project.md §2 for the non-negotiable engineering
principles.
License
Apache 2.0.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file jeevesagent-0.6.0.tar.gz.
File metadata
- Download URL: jeevesagent-0.6.0.tar.gz
- Upload date:
- Size: 476.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a9b44d2dba92c075dd50e3b49cccb1b4f899610cf26e21d126ea7e5de66b5afd
|
|
| MD5 |
b9609a7712069a30b8b182330e6d12b9
|
|
| BLAKE2b-256 |
d6abdfe435199ddff6144f3e0637fd0a651404639d13c67111da3cf74e0f29d2
|
Provenance
The following attestation bundles were made for jeevesagent-0.6.0.tar.gz:
Publisher:
release.yml on Anurich/JeevesHarness
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
jeevesagent-0.6.0.tar.gz -
Subject digest:
a9b44d2dba92c075dd50e3b49cccb1b4f899610cf26e21d126ea7e5de66b5afd - Sigstore transparency entry: 1457954438
- Sigstore integration time:
-
Permalink:
Anurich/JeevesHarness@278a03a5e058368764da5874634345239cbb4b36 -
Branch / Tag:
refs/tags/v0.6.0 - Owner: https://github.com/Anurich
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@278a03a5e058368764da5874634345239cbb4b36 -
Trigger Event:
push
-
Statement type:
File details
Details for the file jeevesagent-0.6.0-py3-none-any.whl.
File metadata
- Download URL: jeevesagent-0.6.0-py3-none-any.whl
- Upload date:
- Size: 241.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8ac50f4470dbc0c8d8d6d4d31dda3ba8482ad7d3ef11e67713e33a46b701b3c4
|
|
| MD5 |
72f6f02fe921e94be9dc19add0e985ff
|
|
| BLAKE2b-256 |
12e90aabb40c3127f10b103f244347712412028ba8567cb63b4a26cbb73adf87
|
Provenance
The following attestation bundles were made for jeevesagent-0.6.0-py3-none-any.whl:
Publisher:
release.yml on Anurich/JeevesHarness
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
jeevesagent-0.6.0-py3-none-any.whl -
Subject digest:
8ac50f4470dbc0c8d8d6d4d31dda3ba8482ad7d3ef11e67713e33a46b701b3c4 - Sigstore transparency entry: 1457954569
- Sigstore integration time:
-
Permalink:
Anurich/JeevesHarness@278a03a5e058368764da5874634345239cbb4b36 -
Branch / Tag:
refs/tags/v0.6.0 - Owner: https://github.com/Anurich
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@278a03a5e058368764da5874634345239cbb4b36 -
Trigger Event:
push
-
Statement type: