Skip to main content

pytest for AI agents — trace, debug and catch regressions in LLM swarms

Project description

SwarmTrace

Observability for AI agents — trace, debug, and monitor with 2 lines of code

PyPI Python License Built at AMD Hackathon

Dashboard · PyPI · GitHub


Install

pip install swarmtrace

Quick Start

from swarmtrace import observe

@observe
def my_agent(question):
    return llm.chat(question)

my_agent("What is machine learning?")
swarmtrace    # view traces in terminal

Every call is recorded — latency, tokens, cost, errors. Nothing else to configure.


Single Agent

Wrap your agent with @observe. Any LLM or tool calls inside it get tagged with kind="llm" or kind="tool" so they roll up into the agent's stats — they never appear as phantom agents on the dashboard.

from swarmtrace import observe, init

init(api_key="your-key", endpoint="https://swarmtrace.vercel.app")

@observe
def my_agent(query):
    plan = call_llm(query)
    return search_web(plan)

@observe(kind="llm")
def call_llm(prompt):
    return client.chat(model="gpt-4o-mini", messages=[...])

@observe(kind="tool")
def search_web(q):
    ...

One agent card on the dashboard. call_llm and search_web fold their tokens, cost, and errors into my_agent — they never get their own card.


Quickstart — inject into any agent in 2 lines

import swarmtrace
swarmtrace.init()              # auto-detects OpenAI, Anthropic, Gemini, LiteLLM

That's all. Now decorate your top-level function:

@swarmtrace.observe
def my_agent(prompt):
    return openai_client.chat.completions.create(...)  # traced automatically

swarmtrace.init() patches installed LLM clients so every raw LLM call is recorded as kind="llm" — with latency, model, tokens, and cost — and attributed to whatever agent is currently running. You don't decorate the LLM call. You don't pick a kind. You don't configure anything else.

Single agent

import swarmtrace
swarmtrace.init()

from openai import OpenAI
client = OpenAI()

@swarmtrace.observe                    # one decorator. that's it.
def my_agent(prompt):
    return client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
    )

my_agent("What is AGI?")

Dashboard: one "my_agent" card with tokens, cost, latency, error rate.

Multi-agent swarm

import swarmtrace
swarmtrace.init()

@swarmtrace.observe                    # own card on the dashboard
def researcher(q):
    return client.chat.completions.create(model="gpt-4o-mini", messages=[...])

@swarmtrace.observe                    # own card
def summarizer(text):
    return client.chat.completions.create(model="gpt-4o-mini", messages=[...])

@swarmtrace.observe                    # own card — orchestrator
def orchestrator(q):
    research = researcher(q)
    return summarizer(research)

orchestrator("Explain transformers")

Dashboard: three cards. researcher's and summarizer's LLM costs roll into their own cards automatically.

Auto-detection@observe figures out the role at call time:

  • Nothing running yet → this call is the agent (gets its own card).
  • Already inside an agent → rolls up into it (tokens + errors fold in, no extra card).

So @observe everywhere is safe — helpers and inner calls just disappear into the agent that ran them, instead of cluttering the Agents page.

Need separate cards for named sub-agents? Add kind="agent" explicitly:

@swarmtrace.observe(kind="agent")
def researcher(q): ...

That's the only knob. kind="llm" / kind="tool" exist for labeling, but the dashboard works correctly without them.


Every bare @observe is its own agent card. Nesting is handled automatically via contextvars — no IDs, no config.

from swarmtrace import observe

@observe
def researcher(q):
    return call_llm(f"Research: {q}")

@observe
def summarizer(text):
    return call_llm(f"Summarize: {text}")

@observe
def orchestrator(q):
    research = researcher(q)
    return summarizer(research)

orchestrator("What is AGI?")
▶ orchestrator    4.2s  |  7 in / 78 out   |  $0.0003
  ▶ researcher    3.4s  |  7 in / 330 out  |  $0.0013
  ▶ summarizer    0.8s  |  338 in / 78 out |  $0.0005

Three agent cards on the dashboard — one per named agent. Sub-calls (call_llm) fold into whichever agent invoked them.


Span Kinds

Kind Decorator Dashboard
agent @observe (default) Own card — tasks, tokens, cost, status
llm @observe(kind="llm") Rolls up into calling agent
tool @observe(kind="tool") Rolls up into calling agent
function @observe(kind="function") Rolls up into calling agent

The rule: only functions you want as separate dashboard cards get bare @observe. Everything else gets a kind=.


Span Kinds — agents vs. tool/LLM calls

By default, @observe marks a call as kind="agent" — it gets its own entry on the dashboard's Agents page, with its own task count, tokens, cost, and status. That's the right default for named agents like orchestrator, researcher, and summarizer above.

If you also wrap raw LLM or tool calls with @observe for visibility, tag them so they roll up into the calling agent's stats instead of showing up as their own (fake) agents:

from swarmtrace import observe

@observe(kind="llm")
def call_llm(prompt):
    return client.chat(model="gpt-4o-mini", messages=[...])

@observe(kind="tool")
def search_web(query):
    ...

@observe(kind="function")
def helper(x):
    ...

@observe                      # kind="agent" (default)
def researcher(q):
    return call_llm(f"Research: {q}")

call_llm and search_web are attributed to whichever kind="agent" call is currently running (researcher, here) — their tokens, cost, and any errors are folded into researcher's stats. They never appear as separate entries on the Agents page, no matter how deeply nested.


Async Support

import asyncio
from swarmtrace import observe

@observe
async def async_agent(q):
    return await llm.achat(q)

@observe
async def orchestrator(q):
    results = await asyncio.gather(
        async_agent(q),
        async_agent(q + " — deep dive"),
    )
    return " | ".join(results)

asyncio.run(orchestrator("Explain transformers"))

Live Cost Tracking

Automatic cost calculation for any model from any provider — powered by LiteLLM's live pricing registry.

@observe
def agent(q):
    # OpenAI, Anthropic, Google, Mistral, DeepSeek,
    # Groq, Cohere, xAI — cost tracked automatically
    return client.chat(model="gpt-4o-mini", messages=[...])

Custom or fine-tuned models:

from swarmtrace import set_model_pricing

set_model_pricing("my-finetune", input_per_million=5.00, output_per_million=15.00)

Token Budget

Stop runaway agents before they burn your budget.

from swarmtrace import observe, budget

@observe
@budget(max_tokens=10_000, on_exceed="warn")   # or "stop"
def agent(q):
    return llm.chat(q)

Regression Detection

Catch when a prompt change breaks your agent's behavior.

pip install swarmtrace[regression]
from swarmtrace.regression import compare

compare(
    my_agent,
    inputs=["What is ML?", "How does Python work?", "What is an API?"],
    version_a_prompt="You are a helpful assistant.",
    version_b_prompt="Reply only in emojis.",
    threshold=0.6,
)
INPUT                    SIMILARITY   REGRESSION?
What is ML?              0.10         🔴 YES
How does Python work?    0.15         🔴 YES
What is an API?          0.12         🔴 YES

Result: 3/3 regressions detected

Tool Attention

Reduce token overhead by up to 95% — only pass relevant tools to each agent call, scored via ISO Scoring (arXiv:2604.21816).

pip install swarmtrace[tools]
from swarmtrace import ToolAttention

ta = ToolAttention(tools=all_my_tools)

@observe
def agent(query):
    relevant_tools = ta.select(query, top_k=3)
    return llm.chat(query, tools=relevant_tools)

Remote Dashboard

Send traces to the SwarmTrace dashboard for live monitoring.

from swarmtrace import init, observe

init(
    api_key="your-swarmtrace-api-key",
    endpoint="https://swarmtrace.vercel.app",
)

@observe
def my_agent(q):
    ...

Or via environment variables:

export SWARMTRACE_API_KEY=your-key
export SWARMTRACE_ENDPOINT=https://swarmtrace.vercel.app

CLI

swarmtrace                       # last 100 traces
swarmtrace --limit 50            # last 50
swarmtrace-replay <id>           # replay any trace
swarmtrace-export --format json
swarmtrace-export --format csv

vs LangSmith

Feature SwarmTrace LangSmith
Open source
Works offline
Any LLM / any framework ❌ LangChain only
Live cost tracking ✅ all models
Regression detection
Token budget enforcement
Tool attention (ISO)
Setup 2 lines SDK + account
Price Free $20/month

Optional Extras

pip install swarmtrace[regression]   # AI regression detection
pip install swarmtrace[tools]        # Tool attention + FAISS
pip install swarmtrace[budget]       # Token budget with tiktoken
pip install swarmtrace[scraper]      # Web scraping traces
pip install swarmtrace[all]          # Everything

AMD MI300X Benchmarks

Tested on AMD Instinct MI300X 192GB via AMD Developer Cloud.

Metric Value
Swarms tested 5
Total agent calls 20
Avg orchestrator latency 6.1s
Avg researcher latency 1.8s
Trace overhead < 1ms

Built with ❤️ at AMD Hackathon 2026 by Ravi Kumar

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swarmtrace-0.4.1.tar.gz (35.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

swarmtrace-0.4.1-py3-none-any.whl (33.3 kB view details)

Uploaded Python 3

File details

Details for the file swarmtrace-0.4.1.tar.gz.

File metadata

  • Download URL: swarmtrace-0.4.1.tar.gz
  • Upload date:
  • Size: 35.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for swarmtrace-0.4.1.tar.gz
Algorithm Hash digest
SHA256 87a4a5bd4a311216667c92990f998278a7f753772fcd5ff7e38fef12d70f365b
MD5 b8c8bebf2e16c272732fb38dae3a6d57
BLAKE2b-256 0075b7b1f63ea621c2c200c6c82e58e507486de5bc43e80dc53c4e385950717f

See more details on using hashes here.

File details

Details for the file swarmtrace-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: swarmtrace-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 33.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for swarmtrace-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8d484b9ae626518e78ee72c912c6708c1140c2cc1418225615ebafe796fd4507
MD5 61e19ececf638c79c46e25030cabcb69
BLAKE2b-256 e0c0e7f719f0f8fd6360736f0189cbb5d1c9d083c45bf6a254e1879f027c7c58

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page