pytest for AI agents — trace, debug and catch regressions in LLM swarms

These details have not been verified by PyPI

Project description

SwarmTrace

Observability for AI agents — trace, debug, and monitor with 2 lines of code

Install

pip install swarmtrace

Quick Start

from swarmtrace import observe

@observe
def my_agent(question):
    return llm.chat(question)

my_agent("What is machine learning?")

swarmtrace    # view traces in terminal

Every call is recorded — latency, tokens, cost, errors. Nothing else to configure.

Single Agent

Wrap your agent with @observe. Any LLM or tool calls inside it get tagged with kind="llm" or kind="tool" so they roll up into the agent's stats — they never appear as phantom agents on the dashboard.

from swarmtrace import observe, init

init(api_key="your-key", endpoint="https://swarmtrace.vercel.app")

@observe
def my_agent(query):
    plan = call_llm(query)
    return search_web(plan)

@observe(kind="llm")
def call_llm(prompt):
    return client.chat(model="gpt-4o-mini", messages=[...])

@observe(kind="tool")
def search_web(q):
    ...

One agent card on the dashboard. call_llm and search_web fold their tokens, cost, and errors into my_agent — they never get their own card.

Quickstart — inject into any agent in 2 lines

import swarmtrace
swarmtrace.init()              # auto-detects OpenAI, Anthropic, Gemini, LiteLLM

That's all. Now decorate your top-level function:

@swarmtrace.observe
def my_agent(prompt):
    return openai_client.chat.completions.create(...)  # traced automatically

swarmtrace.init() patches installed LLM clients so every raw LLM call is recorded as kind="llm" — with latency, model, tokens, and cost — and attributed to whatever agent is currently running. You don't decorate the LLM call. You don't pick a kind. You don't configure anything else.

Single agent

import swarmtrace
swarmtrace.init()

from openai import OpenAI
client = OpenAI()

@swarmtrace.observe                    # one decorator. that's it.
def my_agent(prompt):
    return client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": prompt}],
    )

my_agent("What is AGI?")

Dashboard: one "my_agent" card with tokens, cost, latency, error rate.

Multi-agent swarm

import swarmtrace
swarmtrace.init()

@swarmtrace.observe                    # own card on the dashboard
def researcher(q):
    return client.chat.completions.create(model="gpt-4o-mini", messages=[...])

@swarmtrace.observe                    # own card
def summarizer(text):
    return client.chat.completions.create(model="gpt-4o-mini", messages=[...])

@swarmtrace.observe                    # own card — orchestrator
def orchestrator(q):
    research = researcher(q)
    return summarizer(research)

orchestrator("Explain transformers")

Dashboard: three cards. researcher's and summarizer's LLM costs roll into their own cards automatically.

Auto-detection — @observe figures out the role at call time:

Nothing running yet → this call is the agent (gets its own card).
Already inside an agent → rolls up into it (tokens + errors fold in, no extra card).

So @observe everywhere is safe — helpers and inner calls just disappear into the agent that ran them, instead of cluttering the Agents page.

Need separate cards for named sub-agents? Add kind="agent" explicitly:

@swarmtrace.observe(kind="agent")
def researcher(q): ...

That's the only knob. kind="llm" / kind="tool" exist for labeling, but the dashboard works correctly without them.

Every bare @observe is its own agent card. Nesting is handled automatically via contextvars — no IDs, no config.

from swarmtrace import observe

@observe
def researcher(q):
    return call_llm(f"Research: {q}")

@observe
def summarizer(text):
    return call_llm(f"Summarize: {text}")

@observe
def orchestrator(q):
    research = researcher(q)
    return summarizer(research)

orchestrator("What is AGI?")

▶ orchestrator    4.2s  |  7 in / 78 out   |  $0.0003
  ▶ researcher    3.4s  |  7 in / 330 out  |  $0.0013
  ▶ summarizer    0.8s  |  338 in / 78 out |  $0.0005

Three agent cards on the dashboard — one per named agent. Sub-calls (call_llm) fold into whichever agent invoked them.

Span Kinds

Kind	Decorator	Dashboard
`agent`	`@observe` (default)	Own card — tasks, tokens, cost, status
`llm`	`@observe(kind="llm")`	Rolls up into calling agent
`tool`	`@observe(kind="tool")`	Rolls up into calling agent
`function`	`@observe(kind="function")`	Rolls up into calling agent

The rule: only functions you want as separate dashboard cards get bare @observe. Everything else gets a kind=.

Span Kinds — agents vs. tool/LLM calls

By default, @observe marks a call as kind="agent" — it gets its own entry on the dashboard's Agents page, with its own task count, tokens, cost, and status. That's the right default for named agents like orchestrator, researcher, and summarizer above.

If you also wrap raw LLM or tool calls with @observe for visibility, tag them so they roll up into the calling agent's stats instead of showing up as their own (fake) agents:

from swarmtrace import observe

@observe(kind="llm")
def call_llm(prompt):
    return client.chat(model="gpt-4o-mini", messages=[...])

@observe(kind="tool")
def search_web(query):
    ...

@observe(kind="function")
def helper(x):
    ...

@observe                      # kind="agent" (default)
def researcher(q):
    return call_llm(f"Research: {q}")

call_llm and search_web are attributed to whichever kind="agent" call is currently running (researcher, here) — their tokens, cost, and any errors are folded into researcher's stats. They never appear as separate entries on the Agents page, no matter how deeply nested.

Async Support

import asyncio
from swarmtrace import observe

@observe
async def async_agent(q):
    return await llm.achat(q)

@observe
async def orchestrator(q):
    results = await asyncio.gather(
        async_agent(q),
        async_agent(q + " — deep dive"),
    )
    return " | ".join(results)

asyncio.run(orchestrator("Explain transformers"))

Live Cost Tracking

Automatic cost calculation for any model from any provider — powered by LiteLLM's live pricing registry.

@observe
def agent(q):
    # OpenAI, Anthropic, Google, Mistral, DeepSeek,
    # Groq, Cohere, xAI — cost tracked automatically
    return client.chat(model="gpt-4o-mini", messages=[...])

Custom or fine-tuned models:

from swarmtrace import set_model_pricing

set_model_pricing("my-finetune", input_per_million=5.00, output_per_million=15.00)

Token Budget

Stop runaway agents before they burn your budget.

from swarmtrace import observe, budget

@observe
@budget(max_tokens=10_000, on_exceed="warn")   # or "stop"
def agent(q):
    return llm.chat(q)

Regression Detection

Catch when a prompt change breaks your agent's behavior.

pip install swarmtrace[regression]

from swarmtrace.regression import compare

compare(
    my_agent,
    inputs=["What is ML?", "How does Python work?", "What is an API?"],
    version_a_prompt="You are a helpful assistant.",
    version_b_prompt="Reply only in emojis.",
    threshold=0.6,
)

INPUT                    SIMILARITY   REGRESSION?
What is ML?              0.10         🔴 YES
How does Python work?    0.15         🔴 YES
What is an API?          0.12         🔴 YES

Result: 3/3 regressions detected

Tool Attention

Reduce token overhead by up to 95% — only pass relevant tools to each agent call, scored via ISO Scoring (arXiv:2604.21816).

pip install swarmtrace[tools]

from swarmtrace import ToolAttention

ta = ToolAttention(tools=all_my_tools)

@observe
def agent(query):
    relevant_tools = ta.select(query, top_k=3)
    return llm.chat(query, tools=relevant_tools)

Remote Dashboard

Send traces to the SwarmTrace dashboard for live monitoring.

from swarmtrace import init, observe

init(
    api_key="your-swarmtrace-api-key",
    endpoint="https://swarmtrace.vercel.app",
)

@observe
def my_agent(q):
    ...

Or via environment variables:

export SWARMTRACE_API_KEY=your-key
export SWARMTRACE_ENDPOINT=https://swarmtrace.vercel.app

CLI

swarmtrace                       # last 100 traces
swarmtrace --limit 50            # last 50
swarmtrace-replay <id>           # replay any trace
swarmtrace-export --format json
swarmtrace-export --format csv

vs LangSmith

Feature	SwarmTrace	LangSmith
Open source	✅	❌
Works offline	✅	❌
Any LLM / any framework	✅	❌ LangChain only
Live cost tracking	✅ all models	✅
Regression detection	✅	❌
Token budget enforcement	✅	❌
Tool attention (ISO)	✅	❌
Setup	2 lines	SDK + account
Price	Free	$20/month

Optional Extras

pip install swarmtrace[regression]   # AI regression detection
pip install swarmtrace[tools]        # Tool attention + FAISS
pip install swarmtrace[budget]       # Token budget with tiktoken
pip install swarmtrace[scraper]      # Web scraping traces
pip install swarmtrace[all]          # Everything

AMD MI300X Benchmarks

Tested on AMD Instinct MI300X 192GB via AMD Developer Cloud.

Metric	Value
Swarms tested	5
Total agent calls	20
Avg orchestrator latency	6.1s
Avg researcher latency	1.8s
Trace overhead	< 1ms

Built with ❤️ at AMD Hackathon 2026 by Ravi Kumar

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.4.1

Jun 17, 2026

0.4.0

Jun 17, 2026

0.3.1

Jun 15, 2026

0.3.0

Jun 15, 2026

0.2.1

Jun 14, 2026

0.2.0

Jun 12, 2026

0.1.9

Jun 12, 2026

0.1.8

Jun 9, 2026

0.1.7

May 7, 2026

0.1.6

May 7, 2026

0.1.5

May 7, 2026

0.1.4

May 6, 2026

0.1.3

May 4, 2026

0.1.2

May 1, 2026

0.1.0

Apr 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

swarmtrace-0.4.1.tar.gz (35.8 kB view details)

Uploaded Jun 17, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

swarmtrace-0.4.1-py3-none-any.whl (33.3 kB view details)

Uploaded Jun 17, 2026 Python 3

File details

Details for the file swarmtrace-0.4.1.tar.gz.

File metadata

Download URL: swarmtrace-0.4.1.tar.gz
Upload date: Jun 17, 2026
Size: 35.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for swarmtrace-0.4.1.tar.gz
Algorithm	Hash digest
SHA256	`87a4a5bd4a311216667c92990f998278a7f753772fcd5ff7e38fef12d70f365b`
MD5	`b8c8bebf2e16c272732fb38dae3a6d57`
BLAKE2b-256	`0075b7b1f63ea621c2c200c6c82e58e507486de5bc43e80dc53c4e385950717f`

See more details on using hashes here.

File details

Details for the file swarmtrace-0.4.1-py3-none-any.whl.

File metadata

Download URL: swarmtrace-0.4.1-py3-none-any.whl
Upload date: Jun 17, 2026
Size: 33.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for swarmtrace-0.4.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8d484b9ae626518e78ee72c912c6708c1140c2cc1418225615ebafe796fd4507`
MD5	`61e19ececf638c79c46e25030cabcb69`
BLAKE2b-256	`e0c0e7f719f0f8fd6360736f0189cbb5d1c9d083c45bf6a254e1879f027c7c58`

See more details on using hashes here.

swarmtrace 0.4.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

SwarmTrace

Install

Quick Start

Single Agent

Quickstart — inject into any agent in 2 lines

Span Kinds

Span Kinds — agents vs. tool/LLM calls

Async Support

Live Cost Tracking

Token Budget

Regression Detection

Tool Attention

Remote Dashboard

CLI

vs LangSmith

Optional Extras

AMD MI300X Benchmarks

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes