Skip to main content

LLM observability, quality scoring, hallucination detection, cost tracking, and agent run debugging — in 2 lines of code.

Project description

llm-evaltrack

Drop-in observability for LLM applications. Automatic quality scoring, hallucination detection, cost tracking, and agent run debugging — with 2 lines of code.

Install

pip install agentlens-monitor

Quick Start

import llm_observe

llm_observe.init(api_url="https://your-server.com/ingest")
llm_observe.patch_openai()     # auto-track all OpenAI calls
llm_observe.patch_anthropic()  # auto-track all Anthropic calls

That's it. Your existing code is unchanged. Every chat.completions.create() and messages.create() is now automatically tracked.

Agent Debugging (v0.2.0)

Trace multi-step agent runs to see every step, find where things break, and measure cost per span:

from llm_observe import trace_agent

with trace_agent("research_agent", input="Research renewable energy trends") as trace:

    with trace.span("web_search", span_type="retrieval") as s:
        results = search("renewable energy 2024")
        s.set_output(results)

    with trace.span("llm_summarize", span_type="llm", model="gpt-4o") as s:
        summary = llm.summarize(results)
        s.set_output(summary)
        s.set_tokens(1200)
        s.set_cost(0.009)

    with trace.span("fact_check", span_type="tool") as s:
        verified = fact_check(summary)
        s.set_output(verified)

    trace.set_output("Report complete")

Span types: llm · tool · retrieval · decision · custom

Each trace captures: total duration, tokens, cost, per-step timing, inputs/outputs, and errors. Errors inside a with block are automatically caught and marked as failed.

What Gets Tracked (auto)

Field Source
Input / Output Message content
Model response.model
Tokens response.usage
Cost (USD) Calculated from token counts
Quality Score Heuristic evaluation or LLM judge
Hallucination flags Automatic detection

Manual Tracking

llm_observe.track_llm_call(
    input="What is the capital of France?",
    output="Paris.",
    prompt="You are a helpful assistant.",
    model="gpt-4o",
    metadata={"feature": "qa", "user_id": "u_123", "cost_usd": 0.0003},
)

Configuration

llm_observe.init(
    api_url="https://your-server.com/ingest",
    api_key="your-secret",   # optional bearer token
    max_retries=3,
    timeout=5.0,
    enabled=True,            # set False in tests
)

Dashboard

Pair with the self-hosted server for:

  • Real-time quality trend charts
  • Bad response categorization
  • Root-cause analysis by prompt
  • Cost vs quality per model
  • Regression alerts
  • Agent Debugger — waterfall timeline of every span in a trace

Live demo: llm-evaltrack-production.up.railway.app

Self-host: github.com/Soufianeazz/llm-evaltrack

Changelog

v0.2.0

  • Added trace_agent() context manager for agent run tracing
  • Added span() and trace.span() for individual steps
  • Spans support: set_output(), set_tokens(), set_cost(), set_error()
  • Automatic error capture on exceptions inside with blocks
  • Nested span support via parent_span_id

v0.1.0

  • Initial release: patch_openai(), patch_anthropic(), track_llm_call()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agentlens_monitor-0.2.1.tar.gz (96.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agentlens_monitor-0.2.1-py3-none-any.whl (10.0 kB view details)

Uploaded Python 3

File details

Details for the file agentlens_monitor-0.2.1.tar.gz.

File metadata

  • Download URL: agentlens_monitor-0.2.1.tar.gz
  • Upload date:
  • Size: 96.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.10

File hashes

Hashes for agentlens_monitor-0.2.1.tar.gz
Algorithm Hash digest
SHA256 79f241512086c4ff128304c5ede2d645c3f7298da8c09124b99795ffb8b2a89d
MD5 86a8140d03801627829c652d3f31c3fa
BLAKE2b-256 d10c0c512e29e01cf8244f8ca9725940d7e0751b2095987a678696fffd6b5cd8

See more details on using hashes here.

File details

Details for the file agentlens_monitor-0.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for agentlens_monitor-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e9c5a47966d8961af1b27c6001a2ca77b05a501b1deafdc5242a8ead67119230
MD5 50fdc752bb2c235bf1387c4abfb3097c
BLAKE2b-256 b5f1f6f385eb61c0f7d8e7d3fdd87237beb7474f50e03183edb6c8ac50cc613b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page