Skip to main content

Lightweight, hackable multi-agent orchestration lab (CLI + Python) with transcripts, checkpoints, budgets, and pluggable providers/tools.

Project description

CI PyPI Coverage License

Agentry Lab — Multi‑Agent Orchestration Laboratory

A lightweight, hackable lab for building and experimenting with multi‑agent workflows. 🧪⚡

Pick preset lab setup or define your own lab (agents, tools, providers, schedules) in YAML, then run and iterate quickly from the CLI or Python. Stream outputs, save transcripts, stash checkpoints!

10 preset lab environments - ready to have fun out of the box! 🎭

🎤 Stand-Up Club - Two comedians riff on any topic, MC closes the set
🏛️ Debates - Pro/con arguments with evidence, moderator keeps it civil
🧠 Drifty Thoughts - Three thinkers wander playfully through ideas
🔬 Research - Scientists collaborate, style coach polishes the output
🛋️ Therapy Session - Compassionate client-therapist conversations
💡 Brainstorm Buddies - Idea generation with a scribe pulling shortlists

🚀 Get Started in 2 Minutes

pip install agentrylab

🦙 llama3-friendly lab presets:

# Simple chat (works great with local Ollama!)
agentrylab run solo_chat_user.yaml --max-iters 3

# Quick web research
agentrylab run ddg_quick_summary.yaml --objective "quantum computing"

🤖 OpenAI-friendly lab presets:

# Formal debates with evidence
agentrylab run debates.yaml --objective "Should we colonize Mars?" --max-iters 4

# Comedy club (hybrid: llama3 + GPT-4o-mini)
agentrylab run standup_club.yaml --objective "remote work" --max-iters 6

✨ Why AgentryLab?

Because single agents are boring. 🤖

  • 📦 YAML‑first presets for agents/advisors/moderator/summarizer (your config, your rules)
  • 🔌 Pluggable LLM providers (OpenAI, Ollama) and tools (DuckDuckGo, Wolfram Alpha)
  • 📡 Streaming CLI with resume support and transcript/DB persistence (forget nothing, replay everything)
  • Smart budgets for tools (per‑run/per‑iteration) with shared‑per‑tick semantics (no more runaway tool spam)
  • 🧩 Small, readable runtime: nodes, scheduler, engine, state (batteries included, drama optional)
  • 🫵 Human‑in‑the‑loop turns: schedule user nodes and poke runs from CLI/API (agentrylab say …)

📋 Requirements

  • 🐍 Python 3.11+
  • 🧰 Virtual environment (recommended; sanity‑preserving)
  • 🖥️ Optional: Ollama for local models (default: http://localhost:11434)
  • 🔑 API keys as needed (e.g., OPENAI_API_KEY, WOLFRAM_APP_ID) — bring your own secrets

💾 Installation

Option 1: From PyPI (Recommended)

pip install agentrylab

Option 2: From Source (Development)

git clone https://github.com/Alexeyisme/agentrylab.git
cd agentrylab
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate
pip install -U pip
pip install -e .

🔧 Environment Setup

Create a .env file (loaded via python-dotenv) with any secrets you need:

# For OpenAI models (optional)
OPENAI_API_KEY=sk-...

# For Wolfram Alpha (optional)
WOLFRAM_APP_ID=...

# For Ollama (optional, defaults to localhost:11434)
OLLAMA_BASE_URL=http://localhost:11434

💡 Pro tip: You can start with just Ollama (free, local) and add API keys later!

🚀 Quick Start

CLI Quickstart

Spin up a room and let the sparks fly:

# Simple chat (works with Ollama/llama3)
agentrylab run solo_chat_user.yaml --max-iters 3

# Or with a custom topic
agentrylab run standup_club.yaml --objective "remote work" --max-iters 4

# Or a debate (needs OpenAI API key)
agentrylab run debates.yaml --max-iters 4 --thread-id demo

Set a custom objective/topic at runtime:

agentrylab run debates.yaml --thread-id debate1 --objective "Proposition: apples — good or scam?" --max-iters 4

Interactive mode (prompt for user message each round when a user node exists):

# Solo chat with a scheduled user turn; prompt on each iteration
agentrylab run solo_chat_user.yaml --thread-id demo --resume --max-iters 3 --interactive --user-id user

Check version:

agentrylab --version

User Messages (User-in-the-Loop)

Let a human chime in via API or CLI, and optionally schedule a user turn in cadence.

# 1) Post a user message into a thread
agentrylab say solo_chat_user.yaml demo 'Hello from Alice!'

# 2) Run one iteration to consume it (user turn then assistant)
agentrylab run solo_chat_user.yaml --thread-id demo --resume --max-iters 1

Python API:

from agentrylab import init

lab = init("src/agentrylab/presets/solo_chat_user.yaml", experiment_id="demo")
lab.post_user_message("Hello from Alice!", user_id="user:alice")
lab.run(rounds=1)

Python API Quickstart

Orchestrate from Python with minimal fuss:

from agentrylab import init, list_threads

# 1. Create lab (using solo_chat_user preset - perfect for llama3!)
lab = init("src/agentrylab/presets/solo_chat_user.yaml", 
           experiment_id="my-chat",
           prompt="Tell me about your favorite hobby!")

# 2. Run with callback
def callback(event):
    if event.get("event") == "provider_result":
        print(f"Agent responded: {event.get('content_len', 0)} chars")

status = lab.run(rounds=3, stream=True, on_event=callback)

# 3. Show conversation
for msg in lab.state.history:
    print(f"[{msg['role']}]: {msg['content']}")

# 4. Resume with new topic
lab.state.objective = "Now tell me about your dream vacation!"
lab.run(rounds=2)

# 5. List threads
threads = list_threads("src/agentrylab/presets/solo_chat_user.yaml")

Python examples:

  • user_in_the_loop_quick.py — post once and run N rounds
  • user_in_the_loop_interactive.py — type a line, run a round, repeat

📝 Note: Output streams each iteration ("=== New events ===") and prints a final tail of the last N transcript entries. Transcripts are written to outputs/*.jsonl and checkpoints to outputs/checkpoints.db.

🖥️ CLI Commands

Basic Commands

# Run a preset
agentrylab run <preset.yaml> [--thread-id ID] [--max-iters N] [--show-last K] [--objective TEXT]

# Inspect a thread's checkpoint
agentrylab status <preset.yaml> <thread-id>

# List all known threads
agentrylab ls <preset.yaml>

Common Options

  • --max-iters N: Run for N iterations (default: varies by preset)
  • --thread-id ID: Use specific thread ID (enables resume)
  • --show-last K: Show last K messages at the end
  • --stream/--no-stream: Enable/disable real-time streaming (default: enabled)
  • --resume/--no-resume: Resume from checkpoint or start fresh (default: resume)
  • --objective TEXT: Override the preset objective (topic) just for this run

📚 Full docs: See src/agentrylab/docs/CLI.md for complete command reference.

User-in-the-loop:

  • agentrylab say <preset.yaml> <thread-id> 'message' [--user-id USER] appends a user message into a thread.
  • Works with scheduled user nodes (role user) so messages are consumed on their turns.

⚙️ Configuration

Describe your room in YAML; everything else clicks into place.

  • Presets: shipped with the package; the CLI accepts packaged names like solo_chat_user.yaml (file paths work too)
  • Providers: OpenAI (HTTP), Ollama; add your own under runtime/providers
  • Tools: DuckDuckGo search, Wolfram Alpha; add your own under runtime/tools
  • Scheduler: Round‑robin and Every‑N; build your own in runtime/scheduler

🎭 Built-in Presets

Have fun out of the box — llama3‑friendly and non‑strict by default.

🎤 Solo Chat (User Turn) (solo_chat_user.yaml) - Perfect for beginners!

  • What: Single friendly agent with scheduled user turns
  • Best for: Testing, simple conversations, llama3 users, human-in-the-loop
  • Run: agentrylab run solo_chat_user.yaml --max-iters 3
  • Topic: --objective "your topic"

🎭 Stand‑Up Club (standup_club.yaml) - Comedy gold!

  • What: Two comedians riff on a topic, punch‑up advisor adds tweaks, MC closes the set
  • Best for: Entertainment, creative writing, humor
  • Run: agentrylab run standup_club.yaml --objective "airports" --max-iters 6
  • Topic: --objective "your topic"

🧠 Drifty Thoughts (drifty_thoughts.yaml) - Free-form thinking

  • What: Three "thinkers" drift playfully; gentle advisor nudges; optional summarizer
  • Best for: Creative brainstorming, philosophical discussions
  • Run: agentrylab run drifty_thoughts.yaml --objective "surprising ideas"
  • Topic: --objective "your topic"

🔬 Research Collaboration (research.yaml) - Academic vibes

  • What: Two scientists brainstorm, style coach gives clarity, summarizer wraps up
  • Best for: Research, academic discussions, structured thinking
  • Run: agentrylab run research.yaml --objective "curious scientific question"
  • Topic: --objective "your topic"

🛋️ Therapy Session (therapy_session.yaml) - Compassionate chat

  • What: Reflective client and gentle therapist; summarizer offers compassionate wrap‑up
  • Best for: Emotional discussions, self-reflection, supportive conversations
  • Run: agentrylab run therapy_session.yaml --objective "something on your mind"
  • Topic: --objective "your topic"

🔍 DDG Quick Summary (ddg_quick_summary.yaml) - Web research

  • What: One agent searches DuckDuckGo and writes a 5‑bullet web summary with URLs
  • Best for: Quick research, web summaries, fact-finding
  • Run: agentrylab run ddg_quick_summary.yaml --objective "your topic"
  • Topic: --objective "your topic"

Small Talk (small_talk.yaml) - Casual chat

  • What: Two friendly voices chat; host recaps every few turns
  • Best for: Casual conversations, social interactions
  • Run: agentrylab run small_talk.yaml --objective "coffee rituals"
  • Topic: --objective "your topic"

💡 Brainstorm Buddies (brainstorm_buddies.yaml) - Idea generation

  • What: Two idea buddies riff; scribe pulls a shortlist
  • Best for: Brainstorming, creative ideation, problem-solving
  • Run: agentrylab run brainstorm_buddies.yaml --objective "rainy day activities"
  • Topic: --objective "your topic"

🏛️ Debates (debates.yaml) - Formal arguments

  • What: Pro/con debaters with moderator and evidence-based arguments
  • Best for: Formal debates, argument analysis, structured discussions
  • Run: agentrylab run debates.yaml --max-iters 4
  • Note: Requires OpenAI API key for best results

🗣️ Simple Argument (argue.yaml) - Casual debates

  • What: Two agents having a natural debate without strict rules
  • Best for: Casual arguments, opinion discussions
  • Run: agentrylab run argue.yaml --objective "Should remote work become standard?"
  • Topic: --objective "your topic"

💡 Pro tip: Start with Solo Chat (User Turn) for testing, then try Stand‑Up Club for fun!
📚 More tips: See src/agentrylab/docs/PRESET_TIPS.md for advanced configuration.

💰 Tool Budgets

Control how many times tools can be called to prevent runaway costs:

  • per_run_max: Total calls per tool across the entire run
  • per_iteration_max: Calls per engine tick (resets each tick)
  • Scope: Enforced per tool ID, shared across agents in the same tick
  • Minima (per_run_min, per_iteration_min) are advisory (not enforced)

📜💾 Persistence

Transcripts for storytelling; checkpoints for recovery.

  • 📜 Transcript JSONL: outputs/<thread-id>.jsonl (human-readable conversation logs)
  • 💾 Checkpoints (SQLite): outputs/checkpoints.db (resume from any point)
  • ⏭️ Resume: --resume (default) continues from last checkpoint; --no-resume starts fresh
  • 🧠 Schemas: See src/agentrylab/docs/PERSISTENCE.md for detailed field definitions
  • ⏱️ Timestamps: All recorded as Unix epoch seconds (UTC)

Cleaning outputs (all threads)

  • Remove everything (default paths): rm -rf outputs/
  • Or per-thread: agentrylab ls <preset.yaml> then agentrylab reset <preset.yaml> <thread-id> --delete-transcript

🏗️ Architecture (at a glance)

Simple, readable runtime components:

  • Engine: Steps the scheduler, executes nodes, applies outputs/actions
  • Nodes: Agent, Moderator, Summarizer, Advisor (see runtime/nodes/*)
  • Providers: Thin HTTP adapters (OpenAI, Ollama)
  • Tools: Simple callables with normalized envelopes (e.g., DuckDuckGo)
  • State: History window composition, budgets, message contracts, rollback

🧑‍💻 Development

Serious tooling for serious… tinkering.

# Install development dependencies
pip install -e .[dev]

# Lint and test
ruff check . && pytest -q

# Coverage (uses pytest-cov; default fail-under=40%)
make coverage
# or: pytest --cov=src/agentrylab --cov-branch --cov-report=term-missing

☕️ Pro tip: Keep a coffee nearby. Agents love to riff.

🐍 Python API

Basic Usage

from agentrylab import init

# Initialize a lab and run for N rounds
lab = init("src/agentrylab/presets/solo_chat_user.yaml", 
           experiment_id="my-experiment", 
           prompt="Tell me about your favorite hobby!")
status = lab.run(rounds=5)
print(f"Iterations: {status.iter}, Active: {status.is_active}")

# View conversation history
for msg in lab.state.history:
    print(f"[{msg['role']}]: {msg['content']}")

Posting User Messages

from agentrylab import init

lab = init("src/agentrylab/presets/solo_chat_user.yaml", experiment_id="chat-1")
# Append a user line into history and transcript; also enqueue for scheduled user nodes
lab.post_user_message("Please keep it concise.", user_id="user:alice")
lab.run(rounds=1)

One-shot Run with Streaming

from agentrylab import run

def on_event(ev: dict):
    print(f"Iteration {ev['iter']}: {ev['agent_id']} ({ev['role']})")

lab, status = run(
    "src/agentrylab/presets/solo_chat_user.yaml",
    prompt="What makes jokes funny?",
    experiment_id="streaming-demo",
    rounds=5,
    stream=True,
    on_event=on_event,
)

Budget Management

from agentrylab import init

# Set budgets in preset, then inspect counters
preset = {
    "id": "budget-demo",
    "providers": [{"id": "p1", "impl": "tests.fake_impls.TestProvider", "model": "test"}],
    "tools": [{"id": "echo", "impl": "tests.fake_impls.EchoTool"}],
    "agents": [{"id": "pro", "role": "agent", "provider": "p1", "system_prompt": "You are the agent.", "tools": ["echo"]}],
    "runtime": {
        "scheduler": {"impl": "agentrylab.runtime.scheduler.round_robin.RoundRobinScheduler", "params": {"order": ["pro"]}},
        "budgets": {"tools": {"per_run_max": 1}},
    },
}
lab = init(preset, experiment_id="budget-demo-1", resume=False)
lab.run(rounds=1)
snap = lab.store.load_checkpoint("budget-demo-1")
print("Total tool calls:", snap.get("_tool_calls_run_total"))

Logging & Tracing

# Configure runtime logging/trace in the preset
preset = {
    # ... providers/tools/agents ...
    "runtime": {
        "logs": {"level": "INFO", "format": "%(asctime)s %(levelname)s %(name)s: %(message)s"},
        "trace": {"enabled": True},
        "scheduler": {"impl": "agentrylab.runtime.scheduler.round_robin.RoundRobinScheduler", "params": {"order": ["pro"]}},
    },
}
lab = init(preset, experiment_id="log-1")
lab.run(rounds=1)

📚 API Reference

Core Functions

init(config, *, experiment_id=None, prompt=None, user_messages=None, resume=True) -> Lab

  • config: YAML path, dict, or validated Preset object
  • experiment_id: Logical run/thread ID; enables resume
  • prompt: Sets cfg.objective for the run (used in prompts when enabled)
  • user_messages: String or list of strings; seeds initial user message(s) into context
  • resume: Attempts to load checkpoint for experiment_id

run(config, *, prompt=None, experiment_id=None, rounds=None, resume=True, stream=False, on_event=None, timeout_s=None, stop_when=None, on_tick=None, on_round=None) -> (Lab, LabStatus)

  • One-shot helper; see Lab.run for parameters

Lab Methods

Lab.run(*, rounds=None, stream=False, on_event=None, timeout_s=None, stop_when=None, on_tick=None, on_round=None) -> LabStatus

  • rounds: Number of iterations to run
  • stream: When True, calls on_event(event: Event) for newly appended transcript entries
  • timeout_s: Optional wall-clock timeout for streaming runs
  • stop_when: Optional predicate Event -> bool; when returns True, run stops

Lab.stream(*, rounds=None, timeout_s=None, stop_when=None, on_tick=None, on_round=None) -> Iterator[Event]

  • Generator that yields transcript events as they occur
  • Optional callbacks: on_tick(info), on_round(info) where info = {"iter": int, "elapsed_s": float}

Other Lab Methods:

  • Lab.status (property) -> LabStatus
  • Lab.history(limit=50) -> list[Event]
  • Lab.clean(thread_id=None, delete_transcript=True, delete_checkpoint=True) -> None: Delete outputs for a thread
  • list_threads(config) -> list[tuple[str, float]]: List (thread_id, updated_at) in persistence

📦 Releasing

We publish on tags via GitHub Actions (see .github/workflows/release.yml).

For maintainers:

  1. Bump version in pyproject.toml
  2. Update CHANGELOG.md
  3. git tag -a vX.Y.Z -m 'vX.Y.Z' && git push --tags
  4. CI builds sdist/wheel and uploads to PyPI using PYPI_API_TOKEN secret

📋 Event Schema

from agentrylab import Event

def handle(ev: Event) -> None:
    print(ev["iter"], ev["agent_id"], ev["role"], ev.get("latency_ms"))
    # Keys: t, iter, agent_id, role, content (str|dict), metadata (dict|None), actions (dict|None), latency_ms

💾 Checkpoint Snapshot Fields

Returned by lab.store.load_checkpoint(thread_id) as a dict of state attributes:

  • thread_id: Current experiment ID
  • iter: Iteration counter
  • stop_flag: Stop signal for the engine
  • history: In‑memory context entries {agent_id, role, content} used by prompt composition
  • running_summary: Summarizer running summary if set
  • _tool_calls_run_total, _tool_calls_iteration: Global tool counters
  • _tool_calls_run_by_id, _tool_calls_iter_by_id: Per‑tool counters
  • cfg, contracts: Complex/opaque objects (implementation detail)

Note: If a legacy/opaque pickle was saved, you'll get { "_pickled": ... } instead

🍳 Recipes

Programmatic Preset Construction

from agentrylab import init

preset = {
    "id": "programmatic",
    "providers": [{"id": "p1", "impl": "agentrylab.runtime.providers.openai.OpenAIProvider", "model": "gpt-4o"}],
    "tools": [],
    "agents": [{"id": "pro", "role": "agent", "provider": "p1", "system_prompt": "You are the agent."}],
    "runtime": {
        "scheduler": {"impl": "agentrylab.runtime.scheduler.round_robin.RoundRobinScheduler", "params": {"order": ["pro"]}}
    },
}
lab = init(preset, experiment_id="prog-1", user_messages=["Start topic: ..."]) 
lab.run(rounds=3)

Multiple Runs in a Loop

topics = ["jokes", "puns", "metaphors"]
for i, topic in enumerate(topics):
    lab = init("src/agentrylab/presets/debates.yaml", experiment_id=f"exp-{i}", prompt=f"Explore {topic}")
    lab.run(rounds=2)

Inspecting Transcripts

lab = init("src/agentrylab/presets/debates.yaml", experiment_id="inspect-1")
lab.run(rounds=1)
for ev in lab.history(limit=20):
    print(ev["iter"], ev["agent_id"], ev["role"], str(ev["content"])[:80])

# Or read directly from the store
rows = lab.store.read_transcript("inspect-1", limit=100)

Cleaning Outputs (Transcript + Checkpoint)

from agentrylab import init
lab = init("src/agentrylab/presets/debates.yaml", experiment_id="demo-clean")
lab.run(rounds=1)
# Remove persisted outputs for this experiment
lab.clean()  # or lab.clean(thread_id="some-other-id")

📄 License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentrylab-0.1.3.tar.gz (83.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentrylab-0.1.3-py3-none-any.whl (90.9 kB view details)

Uploaded Python 3

File details

Details for the file agentrylab-0.1.3.tar.gz.

File metadata

  • Download URL: agentrylab-0.1.3.tar.gz
  • Upload date:
  • Size: 83.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for agentrylab-0.1.3.tar.gz
Algorithm Hash digest
SHA256 32da1214a21628f9b3d1580897b6ad681ec45d628501f87561e31beba343fbec
MD5 379588d6fdc595c2a7f9ac3af50d87fd
BLAKE2b-256 e3d10a02c5455a1c9457d2a339350ec200686df7074f2eabc665adba5c084dce

See more details on using hashes here.

File details

Details for the file agentrylab-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: agentrylab-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 90.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for agentrylab-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 26e02ee9e2197186d735551a90e80aaa8084f928fe422296ebdfa67436bbee15
MD5 2524cf65328b526cdfc70d6950e6fc18
BLAKE2b-256 44d3f2e2cbc5abf84be710c3dee19b30aecbf89dc6b8fa4458ae50606260b116

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page