agentrylab

Lightweight, hackable multi-agent orchestration lab (CLI + Python) with transcripts, checkpoints, budgets, and pluggable providers/tools.

These details have not been verified by PyPI

Project links

Project description

Agentry Lab — Multi‑Agent Orchestration for Experiments Serious tooling, delightfully unserious outcomes. 😏

A lightweight, hackable lab for building and evaluating multi‑agent workflows. Define your lab room (agents, tools, providers, schedules) in YAML, then run and iterate quickly from the CLI or Python. Stream outputs, save transcripts, stash checkpoints — because sometimes you want agents to argue… on purpose.

Highlights ✨ Because single agents are boring.

📦 YAML‑first presets for agents/advisors/moderator/summarizer (your config, your rules)
🔌 Pluggable LLM providers (OpenAI, Ollama) and tools (ddgs, Wolfram Alpha)
📡 Streaming CLI with resume support and transcript/DB persistence (forget nothing, replay everything)
⏳ Budgets for tools (per‑run/per‑iteration) with shared‑per‑tick semantics (no more runaway tool spam)
🧩 Small, readable runtime: nodes, scheduler, engine, state (batteries included, drama optional)

Requirements

🐍 Python 3.11+
🧰 A virtualenv (recommended; sanity‑preserving)
🖥️ Optional: Ollama for local models (default base URL http://localhost:11434)
🔑 API keys as needed (e.g., OPENAI_API_KEY, WOLFRAM_APP_ID) — bring your own secrets

Install (editable)

python -m venv .venv
. .venv/bin/activate
pip install -U pip
pip install -e .

Environment (.env) Create a .env file (loaded via python-dotenv) with any secrets:

OPENAI_API_KEY=sk-...
WOLFRAM_APP_ID=...
OLLAMA_BASE_URL=http://localhost:11434

Quickstart (CLI) 🚀 Spin up a room and let the sparks fly.

.venv/bin/agentrylab run src/agentrylab/presets/debates.yaml \
  --max-iters 4 --thread-id demo --show-last 10

Quickstart (Python API) 🐍 Orchestrate from Python with minimal fuss.

from agentrylab import init
from agentrylab.presets import path as preset_path

lab = init(preset_path("debates.yaml"), experiment_id="demo")
lab.run(rounds=2)

See the “Python API” section below for full details and streaming options (callbacks, timeouts, early‑stops). Output streams each iteration ("=== New events ===") and prints a final tail of the last N transcript entries. Transcripts are written to outputs/*.jsonl and checkpoints to outputs/checkpoints.db.

CLI

Run a preset
- agentrylab run <preset.yaml> [--thread-id ID] [--max-iters N] [--stream/--no-stream] [--resume/--no-resume] [--show-last K]
Inspect a thread’s checkpoint
- agentrylab status <preset.yaml> <thread-id>
List all known threads
- agentrylab ls <preset.yaml>

See src/agentrylab/docs/CLI.md for full command docs.

Configuration ⚙️ Describe your room in YAML; everything else clicks into place.

Presets live under src/agentrylab/presets/ (see debates.yaml).
Providers: OpenAI (HTTP), Ollama; add your own under runtime/providers.
Tools: ddgs search, Wolfram Alpha; add your own under runtime/tools.
Scheduler: Round‑robin and Every‑N; build your own in runtime/scheduler.

Presets 🎭 Have fun out of the box — llama3‑friendly and non‑strict by default.

Stand‑Up Club (standup_club.yaml): two comedians riff on a topic, a punch‑up advisor adds tweaks, and the MC closes the set.
- Seed a topic: JOKE_TOPIC="airports"
- Cadence (Every‑N): comicA/comicB every turn, punch_up every 2, mc every 2 (run on last)
- Run: agentrylab run src/agentrylab/presets/standup_club.yaml --max-iters 6 --thread-id standup-1 --show-last 20
Drifty Thoughts (drifty_thoughts.yaml): three “thinkers” drift playfully; a gentle advisor nudges; optional summarizer digests.
- Seed a topic: TOPIC="surprising ideas"
- Cadence (Every‑N): thinkers every turn, advisor every 2, summarizer every 3 (run on last)
- Tip: prompts avoid asking for user input; outputs are standalone prose
Research Collaboration (research.yaml): two scientists brainstorm, a style coach gives clarity bullets, moderator emits JSON actions, summarizer keeps things readable.
- Seed a topic: TOPIC="curious scientific question"
- Cadence (Every‑N): scientists every turn; style_coach/moderator every 2; summarizer every 2 (run on last)
- Tip: moderator includes a JSON exemplar to improve compliance; style coach gives ultra‑short bullets
Therapy Session (therapy_session.yaml): a reflective client and gentle therapist; summarizer offers a compassionate wrap‑up.
- Seed a topic: TOPIC="something on your mind"
- Cadence (Every‑N): client/therapist every turn; summarizer every 2 (run on last)
- Tip: therapist responds in 3–5 sentences and ends with one open question
DDG Quick Summary (ddg_quick_summary.yaml): one agent searches DuckDuckGo and writes a 5‑bullet web summary with URLs.
- Seed a topic: SUMMARY_TOPIC="your topic"
- Cadence: single agent speaks once (tool call + summary)
- Tip: good “starter” preset to test tools with llama3
Small Talk (small_talk.yaml): two friendly voices chat; a host recaps every few turns.
- Seed a topic: SMALL_TALK_TOPIC="coffee rituals"
- Cadence (Every‑N): pal/friend every turn; host every 3 (run on last)
- Tip: configured for gpt‑4o‑mini by default; swap provider to llama3 if you prefer local
Brainstorm Buddies (brainstorm_buddies.yaml): two idea buddies riff; a scribe pulls a shortlist.
- Seed a topic: BRAINSTORM_TOPIC="rainy day activities"
- Cadence (Every‑N): buddyA/buddyB every turn; scribe every 3 (run on last)
- Tip: buddies write short lines; scribe outputs a clean shortlist (no bullets)
Follow‑Up Q&A (follow_up.yaml): explainer → interviewer → explainer → interviewer → summarizer.
- Seed a topic: FOLLOWUP_TOPIC="solar panels at home"
- Cadence (Round‑Robin, exact order): explainer → interviewer → explainer → interviewer → summarizer
- Tip: simple 5‑turn Q&A flow with a tidy wrap‑up

See more tips: src/agentrylab/docs/PRESET_TIPS.md

Budgets (Tools)

per_run_max: total calls per tool id across the run
per_iteration_max: calls per engine tick (counters reset each tick)
Scope: enforced per tool id, shared across agents that act in the same tick
Minima (per_run_min, per_iteration_min) are advisory (not enforced); useful for prompting and analysis

Persistence 📜💾 Transcripts for storytelling; checkpoints for recovery.

📜 Transcript JSONL: default outputs/<thread-id>.jsonl
💾 Checkpoints (SQLite): default outputs/checkpoints.db
⏭️ Resume: run --resume (default) merges the saved snapshot into state before running; --no-resume starts fresh for that thread id.
🧠 Schemas and field definitions: see src/agentrylab/docs/PERSISTENCE.md
⏱️ Timekeeping: all timestamps are recorded as Unix epoch seconds (UTC)

Architecture (at a glance)

Engine: steps the scheduler, executes nodes, applies outputs/actions
Nodes: Agent, Moderator, Summarizer, Advisor (see runtime/nodes/*)
Providers: thin HTTP adapters (OpenAI, Ollama)
Tools: simple callables with normalized envelopes (e.g., ddgs)
State: history window composition, budgets, message contracts, rollback

Development 🧑‍💻 Serious tooling for serious… tinkering.

pip install -e .[dev]

# lint and tests
ruff check . && pytest -q

# coverage (uses pytest-cov; default fail-under=40%)
make coverage            # or: pytest --cov=src/agentrylab --cov-branch --cov-report=term-missing

☕️ Pro tip: keep a coffee nearby. Agents love to riff.

Python API

Initialize a lab and run for N rounds

from agentrylab import init, Event

# When installed from PyPI, use the packaged preset path helper:
# from agentrylab.presets import path as preset_path
# lab = init(preset_path("debates.yaml"), experiment_id="unique_id_1234", prompt="What makes jokes funny?")
# When working from a checkout, you can also reference the file under src/
lab = init("src/agentrylab/presets/debates.yaml", experiment_id="unique_id_1234", prompt="What makes jokes funny?")
status = lab.run(rounds=5)
print(status.iter, status.is_active)
print(lab.history(limit=10))

One-shot run with optional streaming callback

from agentrylab import run

def on_event(ev: dict):
    print(ev["iter"], ev["agent_id"], ev["role"])  # transcript events

lab, status = run(
    "src/agentrylab/presets/debates.yaml",
    prompt="What makes jokes funny?",
    experiment_id="unique_id_1234",
    rounds=5,
    stream=True,
    on_event=on_event,
)

Budgets (Python)

Set global/per-tool budgets in the preset, then inspect counters via the checkpoint snapshot.

from agentrylab import init

preset = {
    "id": "budget-demo",
    "providers": [{"id": "p1", "impl": "tests.fake_impls.TestProvider", "model": "test"}],
    "tools": [{"id": "echo", "impl": "tests.fake_impls.EchoTool"}],
    "agents": [{"id": "pro", "role": "agent", "provider": "p1", "system_prompt": "You are the agent.", "tools": ["echo"]}],
    "runtime": {
        "scheduler": {"impl": "agentrylab.runtime.scheduler.round_robin.RoundRobinScheduler", "params": {"order": ["pro"]}},
        "budgets": {"tools": {"per_run_max": 1}},
    },
}
lab = init(preset, experiment_id="budget-demo-1", resume=False)
lab.run(rounds=1)
snap = lab.store.load_checkpoint("budget-demo-1")
print("total tool calls:", snap.get("_tool_calls_run_total"))

Logging/Tracing from Python

Configure runtime logging/trace in the preset and call via Python.

preset = {
    # ... providers/tools/agents ...
    "runtime": {
        "logs": {"level": "INFO", "format": "%(asctime)s %(levelname)s %(name)s: %(message)s"},
        "trace": {"enabled": True},
        "scheduler": {"impl": "agentrylab.runtime.scheduler.round_robin.RoundRobinScheduler", "params": {"order": ["pro"]}},
    },
}
lab = init(preset, experiment_id="log-1")
lab.run(rounds=1)

API reference (Python)

init(config, *, experiment_id=None, prompt=None, user_messages=None, resume=True) -> Lab
- config: YAML path, dict, or a validated Preset object
- experiment_id: logical run/thread id; enables resume
- prompt: sets cfg.objective for the run (used in prompts when enabled)
- user_messages: str or list[str]; seeds initial user message(s) into context
- resume: attempts to load checkpoint for experiment_id
run(config, *, prompt=None, experiment_id=None, rounds=None, resume=True, stream=False, on_event=None, timeout_s=None, stop_when=None, on_tick=None, on_round=None) -> (Lab, LabStatus)
- One-shot helper; see Lab.run for params
Lab.run(*, rounds=None, stream=False, on_event=None, timeout_s=None, stop_when=None, on_tick=None, on_round=None) -> LabStatus
- rounds: number of iterations to run
- stream: when True, calls on_event(event: Event) for newly appended transcript entries
- timeout_s: optional wall-clock timeout for streaming runs
- stop_when: optional predicate Event -> bool; when returns True, run stops
Lab.stream(*, rounds=None, timeout_s=None, stop_when=None, on_tick=None, on_round=None) -> Iterator[Event]
- Generator that yields transcript events as they occur
- Optional callbacks: on_tick(info), on_round(info) where info = {"iter": int, "elapsed_s": float}
Lab.status (property) -> LabStatus
Lab.history(limit=50) -> list[Event]
Lab.clean(thread_id=None, delete_transcript=True, delete_checkpoint=True) -> None: delete outputs for a thread
list_threads(config) -> list[tuple[str, float]]: list (thread_id, updated_at) in persistence

Releasing 📦

We publish on tags via GitHub Actions (see .github/workflows/release.yml).
Maintainers: bump version in pyproject.toml, update CHANGELOG.md, then:
- git tag -a vX.Y.Z -m 'vX.Y.Z' && git push --tags
CI builds sdist/wheel and uploads to PyPI using PYPI_API_TOKEN secret.

Event schema

from agentrylab import Event

def handle(ev: Event) -> None:
    print(ev["iter"], ev["agent_id"], ev["role"], ev.get("latency_ms"))
    # keys: t, iter, agent_id, role, content (str|dict), metadata (dict|None), actions (dict|None), latency_ms

Checkpoint snapshot fields

Returned by lab.store.load_checkpoint(thread_id) as a dict of state attributes (when available):
- thread_id: current experiment id
- iter: iteration counter
- stop_flag: stop signal for the engine
- history: in‑memory context entries {agent_id, role, content} used by prompt composition
- running_summary: summarizer running summary if set
- _tool_calls_run_total, _tool_calls_iteration: global tool counters
- _tool_calls_run_by_id, _tool_calls_iter_by_id: per‑tool counters
- cfg, contracts: complex/opaque objects (implementation detail)
- If a legacy/opaque pickle was saved, you’ll get { "_pickled": ... } instead

Recipes

Programmatic preset construction

from agentrylab import init

preset = {
    "id": "programmatic",
    "providers": [{"id": "p1", "impl": "agentrylab.runtime.providers.openai.OpenAIProvider", "model": "gpt-4o"}],
    "tools": [],
    "agents": [{"id": "pro", "role": "agent", "provider": "p1", "system_prompt": "You are the agent."}],
    "runtime": {
        "scheduler": {"impl": "agentrylab.runtime.scheduler.round_robin.RoundRobinScheduler", "params": {"order": ["pro"]}}
    },
}
lab = init(preset, experiment_id="prog-1", user_messages=["Start topic: ..."]) 
lab.run(rounds=3)

Multiple runs in a loop

topics = ["jokes", "puns", "metaphors"]
for i, topic in enumerate(topics):
    lab = init("src/agentrylab/presets/debates.yaml", experiment_id=f"exp-{i}", prompt=f"Explore {topic}")
    lab.run(rounds=2)

Inspecting transcripts

lab = init("src/agentrylab/presets/debates.yaml", experiment_id="inspect-1")
lab.run(rounds=1)
for ev in lab.history(limit=20):
    print(ev["iter"], ev["agent_id"], ev["role"], str(ev["content"])[:80])
# Or read directly from the store
rows = lab.store.read_transcript("inspect-1", limit=100)

Cleaning outputs (transcript + checkpoint)

from agentrylab import init
lab = init("src/agentrylab/presets/debates.yaml", experiment_id="demo-clean")
lab.run(rounds=1)
# Remove persisted outputs for this experiment
lab.clean()  # or lab.clean(thread_id="some-other-id")

License MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.0

Sep 23, 2025

0.1.6

Sep 21, 2025

0.1.5

Sep 7, 2025

0.1.4

Sep 7, 2025

0.1.3

Sep 6, 2025

0.1.2

Sep 6, 2025

0.1.1

Sep 5, 2025

This version

0.1.0

Sep 5, 2025

0.0.1

Sep 1, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentrylab-0.1.0.tar.gz (72.8 kB view details)

Uploaded Sep 5, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agentrylab-0.1.0-py3-none-any.whl (81.6 kB view details)

Uploaded Sep 5, 2025 Python 3

File details

Details for the file agentrylab-0.1.0.tar.gz.

File metadata

Download URL: agentrylab-0.1.0.tar.gz
Upload date: Sep 5, 2025
Size: 72.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for agentrylab-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`d51a78e7c0f4417db25eb637a3fab351e4894e471a9bbf8f3074de626349d828`
MD5	`b8db9494b84ba4738479abd70421804b`
BLAKE2b-256	`5bafefa3d56c7f690f304157f0b963d073e5b50691710a14e5e0542a1a93c088`

See more details on using hashes here.

File details

Details for the file agentrylab-0.1.0-py3-none-any.whl.

File metadata

Download URL: agentrylab-0.1.0-py3-none-any.whl
Upload date: Sep 5, 2025
Size: 81.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for agentrylab-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6e927a0b8dbabd4ccfb114d454eb89d30494e8f0ddd203793a7b0555d9b15f76`
MD5	`a3f0b9f7f124a6d4cebfaa6b23584399`
BLAKE2b-256	`3352f5c073dacb1810f48bcbc8320d711ad9955b0c11f5f312ecae3ca2d1b1a6`

See more details on using hashes here.

agentrylab 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes