Skip to main content

LLM observability, quality scoring, hallucination detection, cost tracking, agent run debugging, and a drop-in LangChain callback — in 2 lines of code.

Project description

llm-evaltrack

Drop-in observability for LLM applications. Automatic quality scoring, hallucination detection, cost tracking, and agent run debugging — with 2 lines of code.

Install

pip install agentlens-monitor

Quick Start

import agentlens

agentlens.init(api_url="https://www.agentlens.one/ingest")
agentlens.patch_openai()     # auto-track all OpenAI calls
agentlens.patch_anthropic()  # auto-track all Anthropic calls

That's it. Your existing code is unchanged. Every chat.completions.create() and messages.create() is now automatically tracked.

LangChain (v0.3.0)

Drop-in callback handler — attach it once and every chain, LLM call, tool, and retriever shows up as a span in the AgentLens trace view.

import llm_observe
from llm_observe.integrations.langchain import AgentLensCallbackHandler

llm_observe.init(api_url="https://www.agentlens.one/ingest")
handler = AgentLensCallbackHandler(trace_name="my_qa_chain")

# Works with any LangChain runnable, chain, or agent
chain.invoke({"input": "..."}, config={"callbacks": [handler]})

What gets captured per run:

  • Top-level chain → one AgentLens trace
  • Nested chains, LLM calls, tool calls, retriever calls → spans (with parent/child)
  • Models, prompts, outputs, token counts, errors — all wired through automatically

Requires langchain-core (or the full langchain package) installed.

Agent Debugging (v0.2.0)

Trace multi-step agent runs to see every step, find where things break, and measure cost per span:

from agentlens import trace_agent

with trace_agent("research_agent", input="Research renewable energy trends") as trace:

    with trace.span("web_search", span_type="retrieval") as s:
        results = search("renewable energy 2024")
        s.set_output(results)

    with trace.span("llm_summarize", span_type="llm", model="gpt-4o") as s:
        summary = llm.summarize(results)
        s.set_output(summary)
        s.set_tokens(1200)
        s.set_cost(0.009)

    with trace.span("fact_check", span_type="tool") as s:
        verified = fact_check(summary)
        s.set_output(verified)

    trace.set_output("Report complete")

Span types: llm · tool · retrieval · decision · custom

Each trace captures: total duration, tokens, cost, per-step timing, inputs/outputs, and errors. Errors inside a with block are automatically caught and marked as failed.

What Gets Tracked (auto)

Field Source
Input / Output Message content
Model response.model
Tokens response.usage
Cost (USD) Calculated from token counts
Quality Score Heuristic evaluation or LLM judge
Hallucination flags Automatic detection

Manual Tracking

agentlens.track_llm_call(
    input="What is the capital of France?",
    output="Paris.",
    prompt="You are a helpful assistant.",
    model="gpt-4o",
    metadata={"feature": "qa", "user_id": "u_123", "cost_usd": 0.0003},
)

Configuration

agentlens.init(
    api_url="https://www.agentlens.one/ingest",
    api_key="your-secret",   # optional bearer token
    max_retries=3,
    timeout=5.0,
    enabled=True,            # set False in tests
)

Dashboard

Pair with the self-hosted server for:

  • Real-time quality trend charts
  • Bad response categorization
  • Root-cause analysis by prompt
  • Cost vs quality per model
  • Regression alerts
  • Agent Debugger — waterfall timeline of every span in a trace

Live demo: www.agentlens.one

Self-host: github.com/Soufianeazz/llm-evaltrack

Changelog

v0.3.0

  • Added AgentLensCallbackHandler for LangChain — one callback captures every chain, LLM call, tool call, and retriever call as a waterfall span tree
  • Supports nested chains via LangChain run_id / parent_run_id
  • Graceful import fallback if langchain-core isn't installed

v0.2.0

  • Added trace_agent() context manager for agent run tracing
  • Added span() and trace.span() for individual steps
  • Spans support: set_output(), set_tokens(), set_cost(), set_error()
  • Automatic error capture on exceptions inside with blocks
  • Nested span support via parent_span_id

v0.1.0

  • Initial release: patch_openai(), patch_anthropic(), track_llm_call()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentlens_monitor-0.3.0.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentlens_monitor-0.3.0-py3-none-any.whl (11.6 kB view details)

Uploaded Python 3

File details

Details for the file agentlens_monitor-0.3.0.tar.gz.

File metadata

  • Download URL: agentlens_monitor-0.3.0.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for agentlens_monitor-0.3.0.tar.gz
Algorithm Hash digest
SHA256 9283e8c5a6b8781a17a3dad80348e06c47c6dfde512414148fd454a78e0864c2
MD5 31d7f275681b7c402993720a92266dea
BLAKE2b-256 49c4b13031534f82d81309a66369a0558fa55f634b6838014eed900c290c469b

See more details on using hashes here.

File details

Details for the file agentlens_monitor-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for agentlens_monitor-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e50ce92835dc7a1ae1845475f0185d5b2e36551bc8695f7f1d17db65ab8959c5
MD5 6c1a0304a013f5ad42b9f8aa3cf80973
BLAKE2b-256 fa81b3ca62fcc612b8ef326702da38d251cafdbb575cb65588f6db3fb16de745

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page