Skip to main content

Local-first control plane for coding agents — assignment, evidence, and learning routing in your repo.

Project description

Polis Protocol — three AI agents, one protocol, unified intelligence

Polis Protocol

A self-optimizing city of AI agents. A team of Claude, Codex, Gemini, and any other vendor can share one project, route work to whoever is best at it, and measurably get better over time — using nothing but a folder of markdown files.

tests License: MIT Python Skill Vendor-agnostic PRs welcome

30 second Polis Protocol terminal demo


The 10-second version

Three AI agents share one project: Claude (research), Codex (frontend), Gemini (translation).

A Spanish-translation task comes in. Who gets it?

Early on, Claude did — it rated itself highly. But two finished contracts and one lesson later ("the corporate word 'líder' reads wrong here; use the movement loan-word 'madrij'"), the router quietly moved that work to Gemini. Nobody reassigned it. The team learned, and the routing followed.

That loop — work routed by track record, track record updated by outcomes — is the entire point. See it yourself in one command, no install, no API keys:

git clone https://github.com/yehudalevy-collab/polis-protocol.git
cd polis-protocol && bash scripts/demo.sh
Score breakdown (sorted by total):
  gemini-translator-pesaj   total=0.588  hist=0.25  self=1.00  cost=1.00  avail=1.00
  claude-research-pesaj     total=0.453  hist=0.15  self=0.60  cost=1.00  avail=1.00
  codex-frontend-pesaj      total=0.290  hist=0.00  self=0.20  cost=1.00  avail=1.00

Recommendation: gemini-translator-pesaj   ← won on history, not self-rating

If that loop is interesting to you, a ⭐ genuinely helps other multi-agent builders find this.


What it is

There is now a wave of git-and-markdown task boards for AI agents — claim a task, do it, mark it done. They're good, and Polis can write to them. But a board is passive: it records what happened and never gets smarter. The protocol is frozen the day it ships.

Polis is the only one of these that is active — the coordination layer itself learns and governs:

  1. Communication — every meaningful action lands in an append-only chronicle.md. (Every board does this.)
  2. Optimization — tasks are structured contracts, routed to whichever citizen has the strongest track record on the required capability tags by a multi-armed-bandit policy. (A board can't; it has no notion of who's best.)
  3. Self-development — every settled contract produces a structured lesson; lessons feed back into the router so the team's wisdom compounds. The team measurably gets better over time.
  4. Constitutional evolution — when a rule stops working, citizens propose, vote on, and ratify amendments to the protocol itself. No other coordination tool ships this — it otherwise exists only in research papers.

A board is something you fill in. Polis is a team that develops. It learns who's best, and it can rewrite its own rules.

The whole thing lives in a folder. There is no central server, no required runtime, no proprietary format. If a tool can read and write markdown, it can participate.

If you are wondering how Polis compares with AGENTS.md, CrewAI, LangGraph, hcom, SwarmClaw, or agent memory systems, see docs/comparisons.md.


Why "polis"

A polis is a small Greek city — a few thousand people who all know each other and run their own affairs. The metaphor maps cleanly:

Polis Polis Protocol
Citizen An AI agent from any vendor
Capability card A content-hashed YAML manifest of what an agent can do
Contract A structured task with intent, assignment, and settlement
Chronicle An append-only event log every citizen reads on session start
Lesson A retrospective filed by capability tag
Chavruta A paired critique by a citizen from a different vendor before a high-stakes action
Amendment A vote-ratified change to the constitution

It is opinionated on purpose. The names are sticky, the file format is rigid, the chronicle line shape is non-negotiable. Rigidity at the protocol layer is what lets four different vendors' models read the same folder and agree on what they're looking at.


Quick start

One-command install

From the root of any project:

curl -fsSL https://raw.githubusercontent.com/yehudalevy-collab/polis-protocol/main/install.sh | bash

That creates _polis/, bridge files for Claude/Codex/Gemini/Aider, and a starter capability card. You can pass a better citizen identity when you want one:

curl -fsSL https://raw.githubusercontent.com/yehudalevy-collab/polis-protocol/main/install.sh | bash -s -- \
  --agent-id claude-research-yourproject \
  --vendor anthropic \
  --model claude-opus-4-7 \
  --tool "claude code"

Manual install

git clone https://github.com/yehudalevy-collab/polis-protocol.git

Then found a polis:

python polis-protocol/scripts/init_polis.py \
  --project-root /path/to/your/project \
  --agent-id claude-research-yourproject \
  --vendor anthropic \
  --model claude-opus-4-7 \
  --tool "claude code" \
  --project-name "Your Project Name"

Preview the scaffold without writing files:

python polis-protocol/scripts/init_polis.py \
  --project-root /path/to/your/project \
  --agent-id claude-research-yourproject \
  --dry-run

You now have:

your-project/
├── CLAUDE.md / AGENTS.md / GEMINI.md / AIDER.md ← cross-tool entry pointers
├── .agents/skills/polis-protocol/SKILL.md ← Codex-format mirror
├── .antigravity/skills/polis-protocol/SKILL.md ← Google Antigravity skill
└── _polis/
    ├── CONSTITUTION.md                    ← canonical protocol
    ├── README.md
    ├── index.md                           ← "where things stand"
    ├── chronicle.md                       ← append-only event log
    ├── citizens/<you>/                    ← capability_card, status, inbox, journal
    └── contracts/
        ├── open/                          ← active tasks
        ├── settled/                       ← closed tasks with lessons
        └── routing_stats.yml              ← learned routing policy

Open a contract

Drop a file in _polis/contracts/open/:

---
contract_id: literature-review
opened_by: claude-research-yourproject
status: proposed
stakes: medium
required_tags: [long-context-reading, source-checking]
cost_ceiling: medium
---

# Literature review of multi-agent coordination protocols
...

Route it

python polis-protocol/scripts/route_contract.py \
  --polis-root _polis \
  --contract _polis/contracts/open/literature-review.md \
  --explain

Output:

Score breakdown:
  claude-research-yourproject  total=0.430  hist=0.00  self=0.90  cost=1.00  avail=1.00
  codex-frontend-yourproject   total=0.350  hist=0.00  self=0.50  cost=1.00  avail=1.00

Recommendation: claude-research-yourproject

Settle and learn

When the contract closes, the owner files a lesson under _polis/lessons/<tag>/. Then:

python polis-protocol/scripts/route_contract.py --polis-root _polis --reconcile

The bandit's routing_stats.yml updates. Next time a similar contract opens, the routing decision is sharper.


The four institutions

The Register

Every citizen publishes one file: _polis/citizens/<agent-id>/capability_card.yml. Vendor, model, languages, capability tags with self-ratings, cost envelope, latency envelope, standing instructions, signature. The card is the polis's answer to "who can do what". No central directory, no permission needed to join — the Register is open by design.

The Contract

Tasks are three-section markdown files:

  • Intent — goal, acceptance criteria, required tags, deadline, cost ceiling, stakes
  • Assignment — owner, plan, estimated effort (filled when claimed)
  • Settlement — outcome, quality self-score, what worked, what bit (filled when closed)

Open contracts live in contracts/open/. Settled contracts move to contracts/settled/ and never get deleted. The shape of a contract is fixed so any citizen — and the router — can read every contract without guessing the schema.

The Chronicle

_polis/chronicle.md is an append-only event log. One line per meaningful action:

- 2026-05-14 09:12 | claude-research-pesaj | drafted outline | [[contracts/open/literature-review]] | covers 2019-2025, 14 papers
- 2026-05-14 09:15 | codex-frontend-pesaj  | settled contract | [[contracts/settled/auth-refactor]] | tests passing, lesson filed
- 2026-05-14 09:18 | gemini-translator-es  | requested review | [[reviews/2026-05-14-0918-spanish-rollout]] | high-stakes, needs chavruta

Reserved verbs (opened contract, claimed contract, settled contract, filed lesson, requested review, proposed amendment, blocked on <thing>, …) carry semantic weight that the router and other citizens parse on.

Lessons live separately in _polis/lessons/<capability-tag>/. The chronicle records what happened; the lessons record what was learned. Most events are not lessons, and most lessons distill many events.

The Amendment

When a rule stops working, any citizen can propose a change. The proposal goes in _polis/amendments/proposed/<id>.md. Other citizens append response blocks: agree | disagree | abstain | request_changes. When a simple majority of active citizens (those with a chronicle line in the last 14 days) agree, the file moves to amendments/ratified/ and the constitution is edited.

The protocol changes itself. The default rules in this skill are the seed; over time a given polis will diverge in small ways that fit its project. That divergence is the point.


Chavruta review

Borrowed from the paired-study model of the beit midrash, chavruta review is the polis's safeguard against single-model failure. Any contract flagged stakes: high requires a second citizen from a different vendor to critique the plan before execution. The critique answers three questions:

What is the owner getting right? What might they be missing? Decision: signed_off, requested_changes, or rejected.

Two citizens of the same vendor reviewing each other is allowed but weaker — the value of the chavruta is exactly the structural difference between models. Use it sparingly. Most contracts are low-stakes.


How the router learns

The default router is a multi-armed bandit:

  • Exploit (85%): route to the citizen with the highest combined score on the required tags. The score weights historical quality (55%), self-rating (20%), cost fit (15%), and current availability (10%).
  • Explore (15%): route to a non-top citizen, weighted by score, to keep the policy honest about whether the current leader is still actually best.
  • Cold start: when no history exists for a tag, self-ratings dominate. Self-ratings get displaced within a handful of contracts per tag.

When a contract settles, routing_stats.yml updates with the new quality score and minutes. That update is what makes the team get better over time. The full math is in references/routing.md.

You can run the router as:

  • a 60-line Python script (scripts/route_contract.py),
  • a brief reasoning step inside any agent's session (the math is small enough to do in-context).

Both produce the same recommendation. Citizens can always override.


Repository contents

Path What it is
SKILL.md The Claude Code skill: when to activate, full workflow
scripts/init_polis.py Bootstrap a new polis (idempotent, content-hashed cards, bridge pointers)
scripts/route_contract.py The bandit router and the --reconcile job that rebuilds stats from settled contracts
templates/POLIS_CONSTITUTION.md The canonical constitution written into every new polis
templates/bridge_pointer.md The short CLAUDE.md / AGENTS.md / GEMINI.md that points each tool at the constitution
references/protocol-spec.md Full schema for every file (cards, contracts, lessons, amendments, reviews, status, inbox)
references/templates.md Copy-paste templates for every file the protocol uses
references/routing.md Bandit math, cold-start, explore-rate tuning, stats update procedure
references/amendments.md When to amend vs. when to file a lesson; quorum rules; worked examples
references/troubleshooting.md Failure modes, recovery, scaling, and the migration path from agent-vault

Working across vendors

The protocol is vendor-agnostic. The same polis can be shared by Claude, Codex, Gemini CLI, Google Antigravity, Aider, GPT-based tools, and anything else that reads markdown. Bootstrap writes these discovery pointers:

  • CLAUDE.md — entry point for Claude Code
  • AGENTS.md — entry point for Codex, Jules, goose, opencode, Zed, Warp, VS Code, and Devin
  • GEMINI.md — entry point for Gemini CLI and Google Antigravity
  • AIDER.md — entry point for Aider
  • .agents/skills/polis-protocol/SKILL.md — a Codex-format skill mirror
  • .antigravity/skills/polis-protocol/SKILL.md — auto-loaded by Google Antigravity (integration guide)

They all point at one place: _polis/CONSTITUTION.md. Updating the protocol means editing that one file.

Cross-vendor routing is where this protocol earns its keep. A Spanish translation goes to whichever citizen has the best track record on spanish-translation, not whichever happens to be the user's current chat. Over time, that means team output stops being bottlenecked by any single model's blind spots.


Relationship to agent-vault

agent-vault is a sister project: a simpler, communication-only protocol where agents share an Obsidian-style markdown blackboard. If you only need agents to leave each other notes, agent-vault is enough.

Pick Polis Protocol when:

  • You have agents from multiple vendors and routing matters.
  • You want the team to measurably get better over time.
  • You want a way to amend the protocol itself when reality demands it.

The migration path from agent-vault is documented in references/troubleshooting.md.


Status

Reference implementation. The protocol is intentionally minimal — every file is markdown, every script is plain Python stdlib (route_contract.py adds one optional PyYAML dependency for parsing capability cards). Forks, issues, and amendments welcome.


Roadmap

The protocol layer is stable. Work in flight, in rough order of expected impact:

  • examples/ gallery — 3 worked polises (research team, product team, OSS maintainer trio) to teach by example. Contributions welcome.
  • Alternate routers — UCB and Thompson-sampling variants of route_contract.py, side-by-side with the default ε-greedy bandit. Benchmark harness on synthetic capability traces.
  • Contextual bandit — incorporate per-contract features (deadline pressure, stakes level, language) into the routing decision, not just per-tag history.
  • Auto-rollover — quarterly chronicle rollover and 90-day settled-contract archival as a one-line cron, so a year-long polis stays bounded without manual hygiene.
  • Bridge expansions — first-class entry pointers for Aider, opencode, Zed, Devin, Cursor agent mode. Each is a 30-line markdown stub.
  • Polis-of-polises — a documented pattern for multi-team projects where each subteam is its own polis and a thin meta-polis routes cross-team contracts.
  • Visualizer — small static dashboard that reads routing_stats.yml + the chronicle and shows the team's growth over time. (Bonus: dogfood it by opening it as the first contract in a fresh polis.)
  • Academic write-up — short paper situating Polis in the multi-agent-coordination literature (bandit-based task assignment, blackboard architectures, agent-based simulation).

File an amendment-proposal issue if your need isn't on this list.


Contributing

See CONTRIBUTING.md. Bug reports, amendment proposals, new bridge tools, and worked examples are all valued. Security reports go to SECURITY.md.


Citing

If you use Polis Protocol in academic work, please cite it via CITATION.cff or the "Cite this repository" button on GitHub.


License

MIT — Yehuda Levy, 2026.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polis_protocol-2.0.0a0.tar.gz (59.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

polis_protocol-2.0.0a0-py3-none-any.whl (62.9 kB view details)

Uploaded Python 3

File details

Details for the file polis_protocol-2.0.0a0.tar.gz.

File metadata

  • Download URL: polis_protocol-2.0.0a0.tar.gz
  • Upload date:
  • Size: 59.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for polis_protocol-2.0.0a0.tar.gz
Algorithm Hash digest
SHA256 a0208b562ec07a597a287ff8fdd93afc2be02d5714013ae633382275c461db59
MD5 181714e586e620dae832d9022335667a
BLAKE2b-256 d85458833a12478ed7cf0530cfbd7be27a0275a4900b0bff1a4b6ddbbaacb1a3

See more details on using hashes here.

File details

Details for the file polis_protocol-2.0.0a0-py3-none-any.whl.

File metadata

File hashes

Hashes for polis_protocol-2.0.0a0-py3-none-any.whl
Algorithm Hash digest
SHA256 9af73a70cf124878ec9064e92367041c214d0e549d26dc5344249a1d3e310897
MD5 4a4a2904936084ade1165d07b3ef74b5
BLAKE2b-256 182e72180026a1d450c1600b252f1a033f73e03c6accba55ffcf14cecfc7805c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page