Skip to main content

When your AI agent breaks, AgentLens tells you exactly which decision caused it — and what to change.

Project description

AgentLens

agentlens.run · GitHub

When your AI agent breaks, AgentLens tells you exactly which decision caused it — and what to change.

Not logs. Not traces. The answer.

ROOT CAUSE:   tool_selection
FAILED AT:    Step 2 (search_web)
WHY:          Both tools had identical descriptions — the agent treated them as
              interchangeable and picked the wrong one.
FIX:          Rewrite tool descriptions so search_web is clearly for external
              lookup and query_db is clearly for local records.
CONFIDENCE:   0.90

Works with Anthropic · OpenAI · LangGraph · CrewAI · AutoGen · PydanticAI · raw API


Install

pip install agentlens-ai

Or from source:

git clone https://github.com/abishekgiri/agentlens.git
cd agentlens
pip install -e ".[anthropic,openai]"

Quickstart — 2 lines

Add AgentLens before your existing provider client. No other changes.

import agentlens
agentlens.init()                        # patches Anthropic + OpenAI automatically

import anthropic
client = anthropic.Anthropic()          # captured from here on

@agentlens.run(name="my_agent")         # groups everything into one run
def run_agent(query):
    response = client.messages.create(
        model="claude-3-5-sonnet-latest",
        max_tokens=512,
        messages=[{"role": "user", "content": query}]
    )
    return response

Async agents work exactly the same — agentlens.init() patches AsyncAnthropic and AsyncOpenAI too.

Run your agent, then:

agentlens runs list
agentlens diagnose <run_id>

Framework examples

LangGraph

import agentlens
agentlens.init()
agentlens.patch_langgraph()             # call before graph.compile()

from langgraph.graph import StateGraph

graph = StateGraph(MyState)
graph.add_node("planner", planner_fn)
graph.add_node("executor", executor_fn)
app = graph.compile()                   # automatically wrapped — all nodes traced

@agentlens.run(name="langgraph_agent")
async def run(input):
    return await app.ainvoke({"messages": input})

OpenAI async

import agentlens
agentlens.init()

from openai import AsyncOpenAI
client = AsyncOpenAI()                  # captured automatically

@agentlens.run(name="openai_agent")
async def run_agent(query):
    response = await client.chat.completions.create(
        model="gpt-4o-mini",
        messages=[{"role": "user", "content": query}],
        tools=[],
    )
    return response

Multi-agent tracing

# Parent agent
ctx = agentlens.get_trace_context()

# Child agent (different process / service)
agentlens.init(parent_context=ctx)      # stitches child trace into parent

What AgentLens catches

Six failure categories, detected automatically:

Category What it means
tool_selection Agent picked the wrong tool — usually because descriptions were too similar
loop Agent repeated the same tool call with the same inputs without exit
cascade A tool returned bad/stale data and a downstream step used it and failed
context_pollution Contradictory instructions in the prompt diluted the agent's goal
state_drift Agent abandoned its original goal mid-run
overflow Critical context was pushed out of the context window before the key decision

Plus hallucination detection — invented tool parameters, missing required fields, LLM output that contradicts what the tool actually returned.


CLI reference

# Runs
agentlens runs list                     # all recent runs with status + span count
agentlens runs show <run_id>            # full span detail for one run
agentlens runs view <run_id>            # open visual timeline in browser
agentlens runs prompt <run_id>          # print exact LLM prompts sent at each step
agentlens runs prompt <run_id> --step 2 # prompt for a specific LLM call only
agentlens runs replay <run_id>          # interactive step-by-step playback (ENTER to advance)
agentlens runs stitch <run_id>          # show multi-agent trace tree rooted at this run

# Diagnosis
agentlens diagnose <run_id>             # root cause analysis + hallucination report
agentlens similar <run_id>              # find historically similar failures
agentlens similar <run_id> --top 10     # top-N matches
agentlens clusters                      # failure clusters across all runs + top fix

# Stats & cost
agentlens stats                         # token usage, latency, cost across all runs
agentlens stats <run_id>                # per-run breakdown

# Utilities
agentlens anonymize <run_id>            # redact secrets before sharing
agentlens feedback-template <run_id>    # structured feedback form
agentlens evaluate                      # accuracy check against fixtures + real cases
agentlens doctor                        # system health check

Real example output

AgentLens Diagnosis
===================

ROOT CAUSE:
  cascade

FAILED AT:
  Step 3 (get_user_profile)

WHY:
  Step 3 produced bad or corrupted output that caused a failure at step 6.
  get_user_profile returned {"email": null, "warning": "stale cache entry"}
  and send_email downstream tried to use the null email field.

FIX:
  Validate the output from 'get_user_profile' before using it downstream;
  if it is stale, empty, or malformed, stop and recover instead of feeding
  it into the next step.

SECONDARY:
  None

CONFIDENCE: 0.90

HALLUCINATIONS DETECTED:
  [HIGH] step 5 — invented param: 'send_email' was called with 'priority'
  which is not in its schema. Valid params: ['to', 'body'].

How it works

agentlens.init() monkeypatches your provider clients at import time — no changes to existing code. Every LLM call, tool call, error, and memory snapshot is captured as a span and saved locally to .agentlens/runs/<run_id>.json.

agentlens diagnose runs the trace through a preprocessing pipeline, then either an LLM-powered classifier (if ANTHROPIC_API_KEY or OPENAI_API_KEY is set) or a fast local heuristic fallback. The local fallback works offline with no API key required.

All data stays on your machine. No cloud. No signup. No account.


Add a real-world test case

real_world_cases/my-broken-agent/
├── trace.json              # anonymized run from .agentlens/runs/
├── expected_diagnosis.json # {"root_cause_category": "loop", "failed_at_step": 4}
└── notes.md                # what the agent was doing and what actually broke

Run agentlens evaluate to score the diagnosis engine against your case.


What this is not

No dashboard. No hosted API. No database. No billing. No auth.

This is a local developer tool. The goal: when your agent breaks, run one command and get the answer in under 30 seconds.


Feedback

If AgentLens finds (or misses) a real bug in your agent, we want to know.

agentlens anonymize <run_id>            # redact secrets
agentlens feedback-template <run_id>    # fill this in and send it

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

runlens-0.1.0.tar.gz (51.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

runlens-0.1.0-py3-none-any.whl (53.3 kB view details)

Uploaded Python 3

File details

Details for the file runlens-0.1.0.tar.gz.

File metadata

  • Download URL: runlens-0.1.0.tar.gz
  • Upload date:
  • Size: 51.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.20

File hashes

Hashes for runlens-0.1.0.tar.gz
Algorithm Hash digest
SHA256 7f3aeeeb69e67cc0100b381b6241a0edd33d1ea892b0ef59ba177281ea0081d5
MD5 f9ebc3ff474ff138e0acbc5c65f9d07c
BLAKE2b-256 a5f17c2dfe002214bf435a049950b1ff3933168add0eb4e4edbc7534b3ad4dfb

See more details on using hashes here.

File details

Details for the file runlens-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: runlens-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 53.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.20

File hashes

Hashes for runlens-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 46a5ce2a9c8545d8efb384cb141882e7a170a4d8b1a24ec1219621cee49e10e4
MD5 acd13ff9e60405252e14e8b7d014b069
BLAKE2b-256 6db81a666c6d870cf2f082a3f0518dfdba3af4dc07fe4f71d0aa99f6c728212b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page