LLM observability, quality scoring, hallucination detection, cost tracking, agent run debugging, and drop-in LangChain + LlamaIndex callbacks — in 2 lines of code.
Project description
llm-evaltrack
Drop-in observability for LLM applications. Automatic quality scoring, hallucination detection, cost tracking, and agent run debugging — with 2 lines of code.
Install
pip install agentlens-monitor
Quick Start
import agentlens
agentlens.init(api_url="https://www.agentlens.one/ingest")
agentlens.patch_openai() # auto-track all OpenAI calls
agentlens.patch_anthropic() # auto-track all Anthropic calls
That's it. Your existing code is unchanged. Every chat.completions.create() and messages.create() is now automatically tracked.
LlamaIndex (v0.4.0)
Drop-in callback handler for RAG pipelines built with LlamaIndex — every query, retrieval, LLM call, and agent step shows up as a span.
import llm_observe
from llm_observe.integrations.llama_index import AgentLensLlamaIndexHandler
from llama_index.core import Settings
from llama_index.core.callbacks import CallbackManager
llm_observe.init(api_url="https://www.agentlens.one/ingest")
Settings.callback_manager = CallbackManager([AgentLensLlamaIndexHandler()])
# Any LlamaIndex query is now traced automatically
Requires llama-index-core installed.
LangChain (v0.3.0)
Drop-in callback handler — attach it once and every chain, LLM call, tool, and retriever shows up as a span in the AgentLens trace view.
import llm_observe
from llm_observe.integrations.langchain import AgentLensCallbackHandler
llm_observe.init(api_url="https://www.agentlens.one/ingest")
handler = AgentLensCallbackHandler(trace_name="my_qa_chain")
# Works with any LangChain runnable, chain, or agent
chain.invoke({"input": "..."}, config={"callbacks": [handler]})
What gets captured per run:
- Top-level chain → one AgentLens trace
- Nested chains, LLM calls, tool calls, retriever calls → spans (with parent/child)
- Models, prompts, outputs, token counts, errors — all wired through automatically
Requires langchain-core (or the full langchain package) installed.
Agent Debugging (v0.2.0)
Trace multi-step agent runs to see every step, find where things break, and measure cost per span:
from agentlens import trace_agent
with trace_agent("research_agent", input="Research renewable energy trends") as trace:
with trace.span("web_search", span_type="retrieval") as s:
results = search("renewable energy 2024")
s.set_output(results)
with trace.span("llm_summarize", span_type="llm", model="gpt-4o") as s:
summary = llm.summarize(results)
s.set_output(summary)
s.set_tokens(1200)
s.set_cost(0.009)
with trace.span("fact_check", span_type="tool") as s:
verified = fact_check(summary)
s.set_output(verified)
trace.set_output("Report complete")
Span types: llm · tool · retrieval · decision · custom
Each trace captures: total duration, tokens, cost, per-step timing, inputs/outputs, and errors.
Errors inside a with block are automatically caught and marked as failed.
What Gets Tracked (auto)
| Field | Source |
|---|---|
| Input / Output | Message content |
| Model | response.model |
| Tokens | response.usage |
| Cost (USD) | Calculated from token counts |
| Quality Score | Heuristic evaluation or LLM judge |
| Hallucination flags | Automatic detection |
Manual Tracking
agentlens.track_llm_call(
input="What is the capital of France?",
output="Paris.",
prompt="You are a helpful assistant.",
model="gpt-4o",
metadata={"feature": "qa", "user_id": "u_123", "cost_usd": 0.0003},
)
Configuration
agentlens.init(
api_url="https://www.agentlens.one/ingest",
api_key="your-secret", # optional bearer token
max_retries=3,
timeout=5.0,
enabled=True, # set False in tests
)
Dashboard
Pair with the self-hosted server for:
- Real-time quality trend charts
- Bad response categorization
- Root-cause analysis by prompt
- Cost vs quality per model
- Regression alerts
- Agent Debugger — waterfall timeline of every span in a trace
Live demo: www.agentlens.one
Self-host: github.com/Soufianeazz/llm-evaltrack
Changelog
v0.4.0
- Added
AgentLensLlamaIndexHandlerfor LlamaIndex — one callback captures queries, retrievals, LLM calls, agent steps, and function calls as spans - Parent/child span tree built automatically from LlamaIndex event tree
v0.3.0
- Added
AgentLensCallbackHandlerfor LangChain — one callback captures every chain, LLM call, tool call, and retriever call as a waterfall span tree - Supports nested chains via LangChain
run_id/parent_run_id - Graceful import fallback if
langchain-coreisn't installed
v0.2.0
- Added
trace_agent()context manager for agent run tracing - Added
span()andtrace.span()for individual steps - Spans support:
set_output(),set_tokens(),set_cost(),set_error() - Automatic error capture on exceptions inside
withblocks - Nested span support via
parent_span_id
v0.1.0
- Initial release:
patch_openai(),patch_anthropic(),track_llm_call()
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agentlens_monitor-0.4.0.tar.gz.
File metadata
- Download URL: agentlens_monitor-0.4.0.tar.gz
- Upload date:
- Size: 1.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0bfd5825e29c072939c727f40f9bb4182c6240459116c4e4c6914346bb3cc429
|
|
| MD5 |
006af8b8091357927a3486e3c5ba6bc1
|
|
| BLAKE2b-256 |
24f4232e7ada13642a7def0f5b1ba09d1ca215edac86040e26e81a9a19a1883e
|
File details
Details for the file agentlens_monitor-0.4.0-py3-none-any.whl.
File metadata
- Download URL: agentlens_monitor-0.4.0-py3-none-any.whl
- Upload date:
- Size: 11.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1c8708795a1ff454166b65d124fe1a2ab3abb2637c570d25eeb8f3cc524589bb
|
|
| MD5 |
7c2fb07a9a0651393dddde470aaa70be
|
|
| BLAKE2b-256 |
dd899af5568964c14e22d275fcb575b27cda251feffea1fa54e92c951e34333d
|