Lightweight, hackable multi-agent orchestration lab (CLI + Python) with transcripts, checkpoints, budgets, and pluggable providers/tools.
Project description
Agentry Lab — Multi‑Agent Orchestration for Experiments Serious tooling, delightfully unserious outcomes. 😏
A lightweight, hackable lab for building and evaluating multi‑agent workflows. Define your lab room (agents, tools, providers, schedules) in YAML, then run and iterate quickly from the CLI or Python. Stream outputs, save transcripts, stash checkpoints — because sometimes you want agents to argue… on purpose.
Highlights ✨ Because single agents are boring.
- 📦 YAML‑first presets for agents/advisors/moderator/summarizer (your config, your rules)
- 🔌 Pluggable LLM providers (OpenAI, Ollama) and tools (ddgs, Wolfram Alpha)
- 📡 Streaming CLI with resume support and transcript/DB persistence (forget nothing, replay everything)
- ⏳ Budgets for tools (per‑run/per‑iteration) with shared‑per‑tick semantics (no more runaway tool spam)
- 🧩 Small, readable runtime: nodes, scheduler, engine, state (batteries included, drama optional)
Requirements
- 🐍 Python 3.11+
- 🧰 A virtualenv (recommended; sanity‑preserving)
- 🖥️ Optional: Ollama for local models (default base URL
http://localhost:11434) - 🔑 API keys as needed (e.g.,
OPENAI_API_KEY,WOLFRAM_APP_ID) — bring your own secrets
Install (editable)
python -m venv .venv
. .venv/bin/activate
pip install -U pip
pip install -e .
Environment (.env)
Create a .env file (loaded via python-dotenv) with any secrets:
OPENAI_API_KEY=sk-...
WOLFRAM_APP_ID=...
OLLAMA_BASE_URL=http://localhost:11434
Quickstart (CLI) 🚀 Spin up a room and let the sparks fly.
.venv/bin/agentrylab run src/agentrylab/presets/debates.yaml \
--max-iters 4 --thread-id demo --show-last 10
Quickstart (Python API) 🐍 Orchestrate from Python with minimal fuss.
from agentrylab import init
from agentrylab.presets import path as preset_path
lab = init(preset_path("debates.yaml"), experiment_id="demo")
lab.run(rounds=2)
See the “Python API” section below for full details and streaming options (callbacks, timeouts, early‑stops).
Output streams each iteration ("=== New events ===") and prints a final tail
of the last N transcript entries. Transcripts are written to outputs/*.jsonl
and checkpoints to outputs/checkpoints.db.
CLI
- Run a preset
agentrylab run <preset.yaml> [--thread-id ID] [--max-iters N] [--stream/--no-stream] [--resume/--no-resume] [--show-last K]
- Inspect a thread’s checkpoint
agentrylab status <preset.yaml> <thread-id>
- List all known threads
agentrylab ls <preset.yaml>
See src/agentrylab/docs/CLI.md for full command docs.
Configuration ⚙️ Describe your room in YAML; everything else clicks into place.
- Presets live under
src/agentrylab/presets/(seedebates.yaml). - Providers: OpenAI (HTTP), Ollama; add your own under
runtime/providers. - Tools:
ddgssearch, Wolfram Alpha; add your own underruntime/tools. - Scheduler: Round‑robin and Every‑N; build your own in
runtime/scheduler.
Presets 🎭 Have fun out of the box — llama3‑friendly and non‑strict by default.
-
Stand‑Up Club (
standup_club.yaml): two comedians riff on a topic, a punch‑up advisor adds tweaks, and the MC closes the set.- Seed a topic:
JOKE_TOPIC="airports" - Cadence (Every‑N):
comicA/comicBevery turn,punch_upevery 2,mcevery 2 (run on last) - Run:
agentrylab run src/agentrylab/presets/standup_club.yaml --max-iters 6 --thread-id standup-1 --show-last 20
- Seed a topic:
-
Drifty Thoughts (
drifty_thoughts.yaml): three “thinkers” drift playfully; a gentle advisor nudges; optional summarizer digests.- Seed a topic:
TOPIC="surprising ideas" - Cadence (Every‑N): thinkers every turn, advisor every 2, summarizer every 3 (run on last)
- Tip: prompts avoid asking for user input; outputs are standalone prose
- Seed a topic:
-
Research Collaboration (
research.yaml): two scientists brainstorm, a style coach gives clarity bullets, moderator emits JSON actions, summarizer keeps things readable.- Seed a topic:
TOPIC="curious scientific question" - Cadence (Every‑N): scientists every turn; style_coach/moderator every 2; summarizer every 2 (run on last)
- Tip: moderator includes a JSON exemplar to improve compliance; style coach gives ultra‑short bullets
- Seed a topic:
-
Therapy Session (
therapy_session.yaml): a reflective client and gentle therapist; summarizer offers a compassionate wrap‑up.- Seed a topic:
TOPIC="something on your mind" - Cadence (Every‑N): client/therapist every turn; summarizer every 2 (run on last)
- Tip: therapist responds in 3–5 sentences and ends with one open question
- Seed a topic:
-
DDG Quick Summary (
ddg_quick_summary.yaml): one agent searches DuckDuckGo and writes a 5‑bullet web summary with URLs.- Seed a topic:
SUMMARY_TOPIC="your topic" - Cadence: single agent speaks once (tool call + summary)
- Tip: good “starter” preset to test tools with llama3
- Seed a topic:
-
Small Talk (
small_talk.yaml): two friendly voices chat; a host recaps every few turns.- Seed a topic:
SMALL_TALK_TOPIC="coffee rituals" - Cadence (Every‑N):
pal/friendevery turn;hostevery 3 (run on last) - Tip: configured for
gpt‑4o‑miniby default; swap provider to llama3 if you prefer local
- Seed a topic:
-
Brainstorm Buddies (
brainstorm_buddies.yaml): two idea buddies riff; a scribe pulls a shortlist.- Seed a topic:
BRAINSTORM_TOPIC="rainy day activities" - Cadence (Every‑N):
buddyA/buddyBevery turn;scribeevery 3 (run on last) - Tip: buddies write short lines; scribe outputs a clean shortlist (no bullets)
- Seed a topic:
-
Follow‑Up Q&A (
follow_up.yaml): explainer → interviewer → explainer → interviewer → summarizer.- Seed a topic:
FOLLOWUP_TOPIC="solar panels at home" - Cadence (Round‑Robin, exact order): explainer → interviewer → explainer → interviewer → summarizer
- Tip: simple 5‑turn Q&A flow with a tidy wrap‑up
- Seed a topic:
See more tips: src/agentrylab/docs/PRESET_TIPS.md
Budgets (Tools)
per_run_max: total calls per tool id across the runper_iteration_max: calls per engine tick (counters reset each tick)- Scope: enforced per tool id, shared across agents that act in the same tick
- Minima (
per_run_min,per_iteration_min) are advisory (not enforced); useful for prompting and analysis
Persistence 📜💾 Transcripts for storytelling; checkpoints for recovery.
- 📜 Transcript JSONL: default
outputs/<thread-id>.jsonl - 💾 Checkpoints (SQLite): default
outputs/checkpoints.db - ⏭️ Resume:
run --resume(default) merges the saved snapshot into state before running;--no-resumestarts fresh for that thread id. - 🧠 Schemas and field definitions: see
src/agentrylab/docs/PERSISTENCE.md - ⏱️ Timekeeping: all timestamps are recorded as Unix epoch seconds (UTC)
Architecture (at a glance)
- Engine: steps the scheduler, executes nodes, applies outputs/actions
- Nodes: Agent, Moderator, Summarizer, Advisor (see
runtime/nodes/*) - Providers: thin HTTP adapters (OpenAI, Ollama)
- Tools: simple callables with normalized envelopes (e.g., ddgs)
- State: history window composition, budgets, message contracts, rollback
Development 🧑💻 Serious tooling for serious… tinkering.
pip install -e .[dev]
# lint and tests
ruff check . && pytest -q
# coverage (uses pytest-cov; default fail-under=40%)
make coverage # or: pytest --cov=src/agentrylab --cov-branch --cov-report=term-missing
☕️ Pro tip: keep a coffee nearby. Agents love to riff.
Python API
- Initialize a lab and run for N rounds
from agentrylab import init, Event # When installed from PyPI, use the packaged preset path helper: # from agentrylab.presets import path as preset_path # lab = init(preset_path("debates.yaml"), experiment_id="unique_id_1234", prompt="What makes jokes funny?") # When working from a checkout, you can also reference the file under src/ lab = init("src/agentrylab/presets/debates.yaml", experiment_id="unique_id_1234", prompt="What makes jokes funny?") status = lab.run(rounds=5) print(status.iter, status.is_active) print(lab.history(limit=10))
- One-shot run with optional streaming callback
from agentrylab import run def on_event(ev: dict): print(ev["iter"], ev["agent_id"], ev["role"]) # transcript events lab, status = run( "src/agentrylab/presets/debates.yaml", prompt="What makes jokes funny?", experiment_id="unique_id_1234", rounds=5, stream=True, on_event=on_event, )
Budgets (Python)
- Set global/per-tool budgets in the preset, then inspect counters via the checkpoint snapshot.
from agentrylab import init preset = { "id": "budget-demo", "providers": [{"id": "p1", "impl": "tests.fake_impls.TestProvider", "model": "test"}], "tools": [{"id": "echo", "impl": "tests.fake_impls.EchoTool"}], "agents": [{"id": "pro", "role": "agent", "provider": "p1", "system_prompt": "You are the agent.", "tools": ["echo"]}], "runtime": { "scheduler": {"impl": "agentrylab.runtime.scheduler.round_robin.RoundRobinScheduler", "params": {"order": ["pro"]}}, "budgets": {"tools": {"per_run_max": 1}}, }, } lab = init(preset, experiment_id="budget-demo-1", resume=False) lab.run(rounds=1) snap = lab.store.load_checkpoint("budget-demo-1") print("total tool calls:", snap.get("_tool_calls_run_total"))
Logging/Tracing from Python
- Configure runtime logging/trace in the preset and call via Python.
preset = { # ... providers/tools/agents ... "runtime": { "logs": {"level": "INFO", "format": "%(asctime)s %(levelname)s %(name)s: %(message)s"}, "trace": {"enabled": True}, "scheduler": {"impl": "agentrylab.runtime.scheduler.round_robin.RoundRobinScheduler", "params": {"order": ["pro"]}}, }, } lab = init(preset, experiment_id="log-1") lab.run(rounds=1)
API reference (Python)
-
init(config, *, experiment_id=None, prompt=None, user_messages=None, resume=True) -> Labconfig: YAML path, dict, or a validated Preset objectexperiment_id: logical run/thread id; enables resumeprompt: setscfg.objectivefor the run (used in prompts when enabled)user_messages: str or list[str]; seeds initial user message(s) into contextresume: attempts to load checkpoint forexperiment_id
-
run(config, *, prompt=None, experiment_id=None, rounds=None, resume=True, stream=False, on_event=None, timeout_s=None, stop_when=None, on_tick=None, on_round=None) -> (Lab, LabStatus)- One-shot helper; see
Lab.runfor params
- One-shot helper; see
-
Lab.run(*, rounds=None, stream=False, on_event=None, timeout_s=None, stop_when=None, on_tick=None, on_round=None) -> LabStatusrounds: number of iterations to runstream: when True, callson_event(event: Event)for newly appended transcript entriestimeout_s: optional wall-clock timeout for streaming runsstop_when: optional predicateEvent -> bool; when returns True, run stops
-
Lab.stream(*, rounds=None, timeout_s=None, stop_when=None, on_tick=None, on_round=None) -> Iterator[Event]- Generator that yields transcript events as they occur
- Optional callbacks:
on_tick(info),on_round(info)whereinfo = {"iter": int, "elapsed_s": float}
-
Lab.status(property) ->LabStatus -
Lab.history(limit=50)->list[Event] -
Lab.clean(thread_id=None, delete_transcript=True, delete_checkpoint=True) -> None: delete outputs for a thread -
list_threads(config) -> list[tuple[str, float]]: list (thread_id, updated_at) in persistence
Releasing 📦
- We publish on tags via GitHub Actions (see
.github/workflows/release.yml). - Maintainers: bump
versioninpyproject.toml, updateCHANGELOG.md, then:git tag -a vX.Y.Z -m 'vX.Y.Z' && git push --tags
- CI builds sdist/wheel and uploads to PyPI using
PYPI_API_TOKENsecret.
Event schema
from agentrylab import Event
def handle(ev: Event) -> None:
print(ev["iter"], ev["agent_id"], ev["role"], ev.get("latency_ms"))
# keys: t, iter, agent_id, role, content (str|dict), metadata (dict|None), actions (dict|None), latency_ms
Checkpoint snapshot fields
- Returned by
lab.store.load_checkpoint(thread_id)as a dict of state attributes (when available):thread_id: current experiment iditer: iteration counterstop_flag: stop signal for the enginehistory: in‑memory context entries{agent_id, role, content}used by prompt compositionrunning_summary: summarizer running summary if set_tool_calls_run_total,_tool_calls_iteration: global tool counters_tool_calls_run_by_id,_tool_calls_iter_by_id: per‑tool counterscfg,contracts: complex/opaque objects (implementation detail)- If a legacy/opaque pickle was saved, you’ll get
{ "_pickled": ... }instead
Recipes
-
Programmatic preset construction
from agentrylab import init preset = { "id": "programmatic", "providers": [{"id": "p1", "impl": "agentrylab.runtime.providers.openai.OpenAIProvider", "model": "gpt-4o"}], "tools": [], "agents": [{"id": "pro", "role": "agent", "provider": "p1", "system_prompt": "You are the agent."}], "runtime": { "scheduler": {"impl": "agentrylab.runtime.scheduler.round_robin.RoundRobinScheduler", "params": {"order": ["pro"]}} }, } lab = init(preset, experiment_id="prog-1", user_messages=["Start topic: ..."]) lab.run(rounds=3)
-
Multiple runs in a loop
topics = ["jokes", "puns", "metaphors"] for i, topic in enumerate(topics): lab = init("src/agentrylab/presets/debates.yaml", experiment_id=f"exp-{i}", prompt=f"Explore {topic}") lab.run(rounds=2)
-
Inspecting transcripts
lab = init("src/agentrylab/presets/debates.yaml", experiment_id="inspect-1") lab.run(rounds=1) for ev in lab.history(limit=20): print(ev["iter"], ev["agent_id"], ev["role"], str(ev["content"])[:80]) # Or read directly from the store rows = lab.store.read_transcript("inspect-1", limit=100)
-
Cleaning outputs (transcript + checkpoint)
from agentrylab import init lab = init("src/agentrylab/presets/debates.yaml", experiment_id="demo-clean") lab.run(rounds=1) # Remove persisted outputs for this experiment lab.clean() # or lab.clean(thread_id="some-other-id")
License MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentrylab-0.1.0.tar.gz.
File metadata
- Download URL: agentrylab-0.1.0.tar.gz
- Upload date:
- Size: 72.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d51a78e7c0f4417db25eb637a3fab351e4894e471a9bbf8f3074de626349d828
|
|
| MD5 |
b8db9494b84ba4738479abd70421804b
|
|
| BLAKE2b-256 |
5bafefa3d56c7f690f304157f0b963d073e5b50691710a14e5e0542a1a93c088
|
File details
Details for the file agentrylab-0.1.0-py3-none-any.whl.
File metadata
- Download URL: agentrylab-0.1.0-py3-none-any.whl
- Upload date:
- Size: 81.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e927a0b8dbabd4ccfb114d454eb89d30494e8f0ddd203793a7b0555d9b15f76
|
|
| MD5 |
a3f0b9f7f124a6d4cebfaa6b23584399
|
|
| BLAKE2b-256 |
3352f5c073dacb1810f48bcbc8320d711ad9955b0c11f5f312ecae3ca2d1b1a6
|