A framework for building always-on AI agents that you actually own
Project description
sovereign-agent
The eight architectural decisions every serious agent system converges on — implemented as a library you can use, and a tutorial you can read.
Debug by cat. Crash-recover by ls. Teach by reading the same code that runs in production.
pip install sovereign-agent
from sovereign_agent import run_task, register_tool
@register_tool
def get_weather(city: str) -> dict:
"""Look up current weather for a city."""
return {"city": city, "temperature": 18, "condition": "rainy"}
result = run_task("What's the weather in Edinburgh?")
print(result.summary)
# → "Weather in Edinburgh: 18°C, rainy."
The agent ran a planner, called get_weather, wrote a trace, and saved every artifact to sessions/sess_<id>/. Inspect it with ls -R sessions/sess_<id>. That's not a metaphor — it's how you debug this system.
What sovereign-agent is, exactly
Two things in one repository:
A production library you pip install and use to build agents.
A build-from-scratch curriculum that reconstructs the library in five chapters with tests, so you learn by implementing.
Both point at the same code. A CI check (tools/verify_chapter_drift.py) ensures the chapter code and the production library stay identical. This is the fastai pattern — the library and the course teach each other.
sovereign-agent is not trying to be the next Claude Code or OpenHands. It's the thing you read to understand why they and every other production agent system converged on the same eight architectural decisions.
The eight decisions (this is the product)
Every production agent system I've read the internals of — Claude Code, OpenHands, Aider, SWE-agent, Devin, Cognition's public work — has independently arrived at the same eight decisions. sovereign-agent is those decisions, made explicit, with code you can run.
- Sessions are directories.
sessions/sess_<12hex>/contains everything — memory, IPC, state, logs, artifacts. No database. No shared tables. - State is forward-only. A session never transitions backwards. Retries are new sessions, linked to the old one.
- Tickets for every operation. Append-only audit trail. Tickets are to your agent what commits are to a git repo.
- Manifests verify by SHA-256. Detect accidental edits, disk corruption, and tampering. Not cryptographic security — this is about catching the mistakes you actually make.
- Atomic rename for IPC. Two halves of the agent communicate by writing files. No brokers, no Kafka, no Redis. POSIX's
rename()is the only IPC primitive you need until proven otherwise. - Lock at the session level. Not finer (deadlocks), not coarser (unscalable). One serialization boundary per session gives you multi-file consistency for free.
- Parse JSON defensively. The LLM is lying. Write a parser that handles what the model actually produces, not what the prompt instructed it to produce.
- Register tools explicitly. Prompts are advisory; the registry is physics. When the model persistently reaches for a tool you don't want used, remove the tool — don't write the third negative instruction.
Each decision removes a class of bugs rather than handling them. That's why the decisions age well.
Full walk-through in docs/architecture.md — this is the text of the three-hour lecture.
The ninth thing nobody converges on, but should: dataflow integrity
The framework can guarantee that tools were called, tickets were written, manifests verified, state advanced. That is necessary but not sufficient. You also need to verify the LLM used its tool outputs.
Every scenario in this repo ships with a dataflow integrity audit. The research-assistant scenario checks that every arXiv ID cited in the report was actually returned by web_lookup — not fabricated from training data. The code-reviewer scenario checks that the review references findings the analyzer actually returned.
This is not a hypothetical concern:
One morning I had a framework with 148 passing tests and three clean scenarios. I ran the code reviewer against a real LLM for the first time. It produced a perfectly-formatted review of code that did not exist — named
add,multiply,divide. The framework reported ✓ success, ✓ manifest, ✓ complete. Every structural guarantee held. The output was pure fiction.
"It ran" is not "it worked." The library gives you the first; your scenario has to verify the second. Every example in this repo demonstrates the pattern. See class slides §06 for the production healthcare anecdote that underscores why this is not optional.
Install
pip install sovereign-agent # core
pip install "sovereign-agent[all]" # + optional extras (voice, observability, Docker)
pip install "sovereign-agent[dev]" # + development tooling
Requires Python 3.12+.
The two halves
The decision-8 principle — prompts are advisory, registries are physics — generalizes. Not every constraint is a tool registry. Some constraints are rules. sovereign-agent ships a loop half (ReAct-style LLM reasoning) and a structured half (deterministic Python) that communicate by atomic-rename IPC.
from sovereign_agent import LoopHalf, StructuredHalf, Rule
loop = LoopHalf(planner=planner, executor=executor)
structured = StructuredHalf(rules=[
Rule(name="commit_under_cap",
condition=lambda d: d["deposit"] <= 300,
action=commit_booking),
Rule(name="escalate_over_cap",
condition=lambda d: d["deposit"] > 300,
escalate_if=lambda d: True),
])
The loop decides what to try. The structured half decides what's allowed. No amount of prompt engineering can bypass a rule — the business constraint lives in Python where it belongs.
This isn't new. It's how every financial, regulatory, or high-stakes agent system eventually ends up structured. The name varies ("policy layer," "guardrails," "rules engine") but the pattern is the same. sovereign-agent makes it the default.
What ships in v0.2
Five capabilities that came out of running sovereign-agent against real LLMs and hitting real failures:
Parallel tool dispatch. Tools marked parallel_safe=True run concurrently. Writes and handoffs serialize automatically. Five 0.3s calls finish in 0.31s instead of 1.5s.
Worker isolation without Docker. Linux ≥5.13 gets kernel-level Landlock. macOS gets sandbox-exec. No container runtime needed. Tool compromise can't escape the session directory.
Session resume. Resume any terminal session as a new child. Parent context auto-prepends to the child's SESSION.md. Chains preserve ancestry. Forward-only state is preserved — resumes are new sessions, never edits of the old.
Verifier protocol. Rule conditions accept callables, scikit-learn classifiers, or LLM judges. Same audit trail, different backend. Decision 6 generalized.
Human-in-the-loop approval. Tools return requires_human_approval=True. The executor exits cleanly, writes the request to disk, and resumes when a human decides — seconds, hours, or days later. Nothing in memory.
Each is ~200-400 lines of code with tests. See examples/ for end-to-end scenarios that use each one.
The architecture in one picture
┌─────────────────────────────────────────────────────────────┐
│ LOOP HALF Planner → Executor → Tools │
│ (reasoning) ReAct-style, free-form LLM │
│ │
│ ▼ handoff via ipc/handoff_to_structured.json │
│ │
│ STRUCTURED HALF Deterministic rules, classifiers, │
│ (constraints) LLM judges — whatever you configure. │
│ Binds what's allowed. │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ sessions/sess_a382a2149fc1/ │
│ ├── session.json # state machine, forward-only │
│ ├── SESSION.md # the system prompt │
│ ├── workspace/ # tool outputs, agent artifacts │
│ ├── memory/ # persistent facts across runs │
│ ├── ipc/ # atomic-rename message passing │
│ ├── tickets/ # every operation recorded │
│ └── logs/trace.jsonl # every event, every tool call │
└─────────────────────────────────────────────────────────────┘
Debugging by cat
This is the visceral version of "sessions are directories." Last week I debugged five separate failures across three scenarios. Every one was diagnosed from three lines of JSONL:
$ cat sessions/sess_5ab10359c72a/logs/trace.jsonl
{"event_type": "executor.tool_called", "payload":
{"tool": "read_file", "arguments": {"path": "workspace/source.py"},
"success": false, "summary": "file not found"}}
{"event_type": "executor.tool_called", "payload":
{"tool": "list_files", "arguments": {"path": "workspace"},
"success": true, "summary": "0 entries"}}
{"event_type": "executor.tool_called", "payload":
{"tool": "complete_task", "arguments":
{"result": {"status": "failed",
"reason": "No source file found in workspace"}}}}
The LLM never tried the analyzer tool. It looked for files that weren't there, gave up, marked the session complete. Three lines and I knew the fix (decision 8: remove the directory-listing tool from the registry; the model reaches for it reflexively).
No SELECT queries. No vendor viewer. No SDK. cat.
Three surfaces, one codebase
sovereign-agent/
├── sovereign_agent/ # the library you pip install
├── chapters/ # 5 tutorial chapters (minitorch-style, fill in the TODOs)
├── examples/ # 8 reference scenarios (research, code review, HITL, etc.)
├── docs/ # architecture, API stability, class slides
└── tests/ # 267 tests — library + chapters + examples
- If you want to ship an agent today → read
sovereign_agent/and pick scenarios fromexamples/ - If you want to understand how it works → read the chapters. Each one rebuilds a piece of the library. Your tests pass when you're done.
- If you want the full lecture →
docs/class-slides.md(3 hours, 122 slides, the full 8-decisions walkthrough with the actual traces from the debugging session)
The chapter/library drift check (tools/verify_chapter_drift.py) runs in CI. If you change the library and forget to update the chapter, or vice versa, CI fails. The tutorial can't rot.
Lineage
sovereign-agent stands on three lineages:
Teaching artifacts with real libraries — the pattern:
fastai(Jeremy Howard) — the library + course pattern. sovereign-agent's biggest debt.minitorch(Sasha Rush) — rebuild-the-framework pedagogy. Chapters work like this.LLMs-from-scratch(Sebastian Raschka) — book + code as the same artifact. Reading order matters.nanoGPT(Andrej Karpathy) — small, readable, no magic.
Production agent systems that converged on the same architecture:
- NanoClaw (Gavriel Cohen) — TypeScript reference implementation. Read
src/group-queue.ts; two hours of reading and you'll see where sovereign-agent's patterns come from. - Claude Code — session-as-directory, sub-agent isolation.
- OpenHands — closest OSS cousin architecturally.
- Aider — per-repo state in
.aider/, same pattern simpler scope. - SWE-agent, Devin — per-task sandboxes.
Papers that shaped the module design:
- ReAct — the executor loop.
- Reflexion — memory patterns.
- MemGPT — hierarchical memory.
- Voyager — procedural memory; file-based skills.
- SWE-agent: Agent-Computer Interfaces Enable Automated Software Engineering — "give the agent a typewriter, not a console."
See CREDITS.md for the full list.
Quick-start with a real LLM
# Configure your LLM endpoint (Nebius, OpenAI, or any OpenAI-compatible provider)
cp .env.example .env
# edit: set NEBIUS_KEY=sk-... and optionally swap models
# Verify everything is wired up — Python, uv, .env, models, imports, CI
make doctor
# Run a real scenario end-to-end
make example-research-real
A real-LLM run prints every step, every tool call, and finishes with a dataflow integrity audit. Expected output for example-research-real:
▶ research-assistant (real LLM)
planner: MiniMaxAI/MiniMax-M2.5
executor: Qwen/Qwen3-235B-A22B-Instruct-2507
✓ plan produced: 1 subgoal, loop half
✓ web_lookup("retrieval augmented generation") → 2 results
✓ write_file(report.md) — 287 bytes
✓ complete_task
=== Dataflow integrity audit ===
web_lookup calls: 1, successful hits: 2, unique papers returned: 2
✓ all 2 arXiv ID(s) came from web_lookup
titles from web_lookup referenced in report: 2/2
If the model fabricates a paper the audit catches it and flags ✗ with the fabricated ID. This pattern is what you want to copy into your own scenarios.
Launch-checklist-as-Makefile
The Makefile is also the documentation for the release workflow:
make doctor # 15-check tabular status — Python, uv, .env, imports, CI
make preflight # lint + drift + pytest collection + demo importability
make test # 267 tests
make ci-real-estimate # cost preview for a full ci-real run (no API calls)
make ci-real # run every -real scenario against a live LLM
make pre-publish # audit for secrets, PII, forbidden files before public push
make ready-to-ship # preflight + pre-publish + build — ends "next: git tag..."
make help groups every target by category. make doctor output is tabular; paste it into an issue and a maintainer has full diagnostic context.
Where things live
sovereign-agent is a well-behaved Python library. It never writes to your CWD when you import sovereign_agent. Different entry points write to different places by design:
| Entry point | Artifacts go to | Why |
|---|---|---|
sovereign-agent run <task> (production) |
./sessions/ (your CWD) |
Your deployment, your call |
python -m chapters.<n>.demo |
$XDG_DATA_HOME/sovereign-agent/demos/ |
Persists for inspection; outside your git tree |
python -m examples.<n>.run (offline) |
tempdir, auto-cleans | Deterministic dev runs |
python -m examples.<n>.run --real |
$XDG_DATA_HOME/sovereign-agent/examples/ |
Real-LLM runs burn tokens; keep artifacts |
Override via SOVEREIGN_AGENT_DATA_DIR=<path>.
Status
v0.2.0 alpha. The 67 public APIs in sovereign_agent.__all__ are stable within the 0.2.x series — bug fixes flow through, breaking changes will bump to 0.3.0. See docs/API.md for the full semver contract.
- ✅ Framework: sessions, tickets, IPC, parallelism, isolation, resume, verifiers, HITL
- ✅ 267 tests (library + chapter drift + mocked real-path integration)
- ✅ 8 reference scenarios, all with dataflow integrity checks
- ✅ 5 tutorial chapters, CI-enforced against production code
- 🚧 Voice pipeline, observability backends (Evidently/Langfuse/OTel) — shipped as skeletons
- 🚧 Vector-DB memory backends — v0.3
What sovereign-agent is not
It's not trying to replace Claude Code for daily coding or LangGraph for orchestration-heavy workflows. It's not a framework I'm trying to grow into the next big thing. It's not abandoned; it's not vibe-coded; it's not a thin wrapper over LangChain.
What it is: a substrate for teaching the eight architectural decisions, plus a library that implements them cleanly enough that you can use it for real work. fastai for agents.
If you want an agent you can own, audit, reproduce, teach, and — crucially — understand at the bottom of the stack, this is probably the smallest codebase in the world that gives you all five.
Learn more
- 📖
docs/architecture.md— the 8 decisions in detail, with code - 🎓
docs/class-slides.md— the 3-hour lecture, 122 slides, full debugging journey - 🧭
chapters/— rebuild the framework yourself in 5 runnable chapters - 🧪
examples/— 8 reference scenarios, each with a dataflow integrity audit - 📋
docs/API.md— semver contract for the 67 public symbols - 📝
CHANGELOG.md— what shipped and when
Contributing
Pull requests, issues, and architectural criticism are welcome. See CONTRIBUTING.md.
git clone https://github.com/sovereignagents/sovereign-agent
cd sovereign-agent
make first-run # install, preflight, sanity check
make test # 267 tests, ~20s
make demo-ch5 # see a working agent end-to-end
Credits
NanoClaw (Gavriel Cohen) is the TypeScript reference implementation whose patterns — session-as-directory, group-queue serialization, tickets, filesystem IPC — sovereign-agent ports and extends in Python.
sovereign-agent is also built by reading and comparing production agent systems (Claude Code, OpenHands, Aider) and the foundational papers (ReAct, Reflexion, SWE-agent, Voyager, MemGPT). The pedagogical format is modelled on nanoGPT (Karpathy), minitorch (Rush), LLMs-from-scratch (Raschka), and fastai (Howard).
See CREDITS.md for full attributions.
License
Apache 2.0. Use commercially, modify, fork — just keep the notice.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sovereign_agent-0.2.0.tar.gz.
File metadata
- Download URL: sovereign_agent-0.2.0.tar.gz
- Upload date:
- Size: 115.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e1a252cc4daaf959996824d2cddb56624ec52faa301b43c8c711d48344c0319
|
|
| MD5 |
197d6da59af2ec54c5f3a064c962f7e6
|
|
| BLAKE2b-256 |
d613815c991532b4dec4779978ce6f36fd3c4bd29f5b51e63afa4fbae47105f4
|
Provenance
The following attestation bundles were made for sovereign_agent-0.2.0.tar.gz:
Publisher:
publish.yml on sovereignagents/sovereign-agent
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sovereign_agent-0.2.0.tar.gz -
Subject digest:
1e1a252cc4daaf959996824d2cddb56624ec52faa301b43c8c711d48344c0319 - Sigstore transparency entry: 1367410683
- Sigstore integration time:
-
Permalink:
sovereignagents/sovereign-agent@9d934cf53ff223175d01ebf07483fd608fae66a0 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/sovereignagents
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9d934cf53ff223175d01ebf07483fd608fae66a0 -
Trigger Event:
push
-
Statement type:
File details
Details for the file sovereign_agent-0.2.0-py3-none-any.whl.
File metadata
- Download URL: sovereign_agent-0.2.0-py3-none-any.whl
- Upload date:
- Size: 130.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a77eb2549eeab19700a1faa26610f71774ee3e66ffa4fec645e1d62d84509a7b
|
|
| MD5 |
43d2fe274358e263f38b67080f65d852
|
|
| BLAKE2b-256 |
5b7b63847a1aeb4f35c0cd08dadf9604b828d2f1fa17a5df627fb0366bb259fc
|
Provenance
The following attestation bundles were made for sovereign_agent-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on sovereignagents/sovereign-agent
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
sovereign_agent-0.2.0-py3-none-any.whl -
Subject digest:
a77eb2549eeab19700a1faa26610f71774ee3e66ffa4fec645e1d62d84509a7b - Sigstore transparency entry: 1367410758
- Sigstore integration time:
-
Permalink:
sovereignagents/sovereign-agent@9d934cf53ff223175d01ebf07483fd608fae66a0 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/sovereignagents
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9d934cf53ff223175d01ebf07483fd608fae66a0 -
Trigger Event:
push
-
Statement type: