Skip to main content

LLM & Agent Observability — structured tracing, Prometheus metrics, and OpenTelemetry export via Python decorators

Project description

Rastir

Rastir

LLM & Agent Observability for Python
Structured tracing and Prometheus metrics via decorators — no monkey-patching, no vendor lock-in.

PyPI Python Docs License GitHub


Why Rastir?

Most LLM observability tools require SDK wrappers, monkey-patching, or vendor-specific clients. Rastir takes a different approach:

  • Decorators, not wrappers — add @llm, @agent, @tool to your existing functions. No code rewrites.
  • Adapters, not monkey-patches — Rastir inspects return values to extract model, tokens, and provider metadata. Works with any SDK version.
  • Two-phase enrichment — model/provider metadata is captured from function arguments before the call and refined from the response after. If the API call fails, metadata still survives.
  • Self-hosted collector — a lightweight FastAPI server you own. Prometheus metrics out of the box, OTLP export to Tempo/Jaeger if you want it.
  • Zero external infrastructure — no database, no Redis, no Kafka. The collector is stateless and runs in a single container.
Your Python App                          Rastir Collector
┌──────────────────────────────┐         ┌──────────────────────────────┐
│  @agent                      │  HTTP   │  FastAPI ingestion            │
│    @llm (OpenAI)             │ ──────▸ │  ├─ Prometheus /metrics       │
│    @tool (search)            │  spans  │  ├─ Trace store /v1/traces    │
│    @retrieval (RAG)          │         │  ├─ Sampling & backpressure   │
│                              │         │  └─ OTLP → Tempo/Jaeger      │
│  Two-phase enrichment:       │         │                                │
│    request args → response   │         │  Defence-in-depth:             │
│                              │         │    cardinality guards          │
│  wrap(obj, name="cache")     │         │    error normalisation         │
└──────────────────────────────┘         │    bounded enum validation     │
        decorators + wrap()              └──────────────────────────────┘

Supported Providers

Provider Auto-detection Tokens Model Streaming Request-phase
OpenAI
Azure OpenAI
Anthropic
AWS Bedrock
Google Gemini
Cohere
Mistral
Groq
LangChain
LangGraph
LlamaIndex
CrewAI

15 adapters are priority-ordered and composable: LangGraph → LangChain → OpenAI resolution happens automatically.

Request-phase enrichment: For provider adapters, model/provider metadata is extracted from function kwargs (e.g., model="gpt-4o") before the API call. If the call fails, the span still contains the model and provider.

Installation

pip install rastir              # Client library (decorators + HTTP push)
pip install rastir[server]      # + Collector server (FastAPI, Prometheus, OTLP)
pip install rastir[all]         # Everything including dev tools

Quick Start

1. Instrument your code (3 lines to add)

from rastir import configure, agent, llm, tool, retrieval

configure(
    service="my-app",
    push_url="http://localhost:8080/v1/telemetry",
)

@agent(agent_name="research_agent")
def run_research(query: str) -> str:
    context = fetch_docs(query)
    return ask_llm(query, context)

@retrieval
def fetch_docs(query: str) -> list[str]:
    return vector_db.search(query)           # auto-tracked

@llm(model="gpt-4o", provider="openai")
def ask_llm(query: str, context: list[str]) -> str:
    return openai.chat(messages=[...])        # tokens & model extracted automatically

2. Start the collector

rastir-server                              # default: 0.0.0.0:8080
# or
docker run -p 8080:8080 rastir-server

3. Query metrics

curl http://localhost:8080/metrics          # Prometheus format
curl http://localhost:8080/v1/traces        # JSON trace store

That's it. Prometheus scrapes /metrics, you build Grafana dashboards, and optionally forward spans to Tempo or Jaeger via OTLP.

What you get in Prometheus

# Token usage by model
rastir_tokens_input_total{model="gpt-4o",provider="openai",agent="research_agent"} 1250
rastir_tokens_output_total{model="gpt-4o",provider="openai",agent="research_agent"} 380

# Latency percentiles
rastir_duration_seconds_bucket{span_type="llm",le="0.5"} 12
rastir_duration_seconds_bucket{span_type="llm",le="1.0"} 45

# Tool & retrieval call rates
rastir_tool_calls_total{tool_name="web_search",agent="research_agent"} 89
rastir_retrieval_calls_total{agent="research_agent"} 156

# Error tracking with normalised categories
rastir_errors_total{span_type="llm",error_type="rate_limit"} 7
rastir_errors_total{span_type="llm",error_type="timeout"} 3

Two-Phase Enrichment

Rastir captures metadata in two phases to ensure observability even when API calls fail:

Phase 1 (request): Scan function kwargs for model/provider
  └─ e.g., model="gpt-4o" extracted before the call

Phase 2 (response): Adapter pipeline extracts from return value
  └─ Concrete response values override request-phase guesses
  └─ If call raises, request-phase metadata survives

Example — failed API call still produces useful metrics:

@llm
def ask_model(query: str):
    return openai.chat.completions.create(
        model="gpt-4o",          # ← captured in Phase 1
        messages=[...],
    )
    # If this raises RateLimitError, the span still records:
    #   model="gpt-4o", provider="openai", status="ERROR"
    #   error_type="rate_limit"

Nested Spans

Rastir automatically links parent–child relationships for agent call trees:

@agent(agent_name="supervisor")
def supervisor(task):
    plan = planner(task)            # nested agent
    return executor(plan)

@agent(agent_name="planner")
def planner(task):
    return ask_llm(task)            # nested LLM call

@llm(model="gpt-4o")
def ask_llm(prompt):
    return openai.chat(messages=[...])
supervisor (agent, 3200ms)
├── planner (agent, 1100ms)
│   └── ask_llm (llm, 980ms) → model=gpt-4o, tokens_in=150, tokens_out=85
└── executor (agent, 2000ms)
    ├── web_search (tool, 450ms)
    └── ask_llm (llm, 1200ms) → model=gpt-4o, tokens_in=320, tokens_out=200

Works with LangGraph

from langgraph.prebuilt import create_react_agent

app = create_react_agent(ChatOpenAI(model="gpt-4o-mini"), tools=[search, calc])

@agent(agent_name="react_agent")
def run(query: str):
    return app.invoke({"messages": [HumanMessage(query)]})
    # Rastir auto-detects LangGraph state → LangChain messages → OpenAI response
    # Extracts: model, tokens, tool calls, message counts — zero config

Generic Object Wrapper

Instrument any object without decorator access using rastir.wrap():

import rastir

# Wrap a Redis client, vector store, or any infrastructure component
wrapped_cache = rastir.wrap(redis_client, name="redis")
wrapped_cache.get("key")       # creates INFRA span: "redis.get"
wrapped_cache.set("key", val)  # creates INFRA span: "redis.set"

# Wrap with filtering
wrapped_db = rastir.wrap(db_client, name="postgres",
                         include=["query", "execute"],
                         span_type="tool")
  • Supports sync + async methods
  • Preserves isinstance() behaviour
  • Prevents double-wrapping
  • Configurable span_type: infra, tool, llm, trace, agent, retrieval

Bedrock Guardrail Observability

Rastir automatically detects and tracks AWS Bedrock guardrails:

@llm
def call_bedrock(prompt: str):
    return bedrock.converse(
        modelId="anthropic.claude-3-sonnet",
        messages=[...],
        guardrailIdentifier="my-guardrail",  # auto-detected
        guardrailVersion="1",
    )

Produces metrics:

rastir_guardrail_requests_total{guardrail_id="my-guardrail",provider="bedrock"} 42
rastir_guardrail_violations_total{guardrail_action="GUARDRAIL_INTERVENED",model="claude-3"} 3

Guardrail labels are cardinality-guarded on both client and server side:

  • guardrail_category is validated against a bounded enum (CONTENT_POLICY, TOPIC_POLICY, etc.)
  • guardrail_action is validated against a bounded enum (GUARDRAIL_INTERVENED, NONE)
  • Unknown values are replaced with __cardinality_overflow__

Error Normalisation

Raw exception types are normalised into six fixed categories to prevent label explosion:

Category Example exceptions
timeout TimeoutError, httpx.ReadTimeout, openai.APITimeoutError
rate_limit RateLimitError, openai.RateLimitError, anthropic.RateLimitError
validation_error ValueError, TypeError, pydantic.ValidationError
provider_error openai.APIError, anthropic.APIStatusError, botocore.ClientError
internal_error RuntimeError, Exception
unknown Anything else

Key Metrics at a Glance

Metric Type What it tracks
rastir_llm_calls_total Counter LLM invocations by model, provider, agent
rastir_tokens_input_total Counter Input token consumption
rastir_tokens_output_total Counter Output token consumption
rastir_duration_seconds Histogram Latency with P50/P95/P99 + exemplars
rastir_tokens_per_call Histogram Token distribution per LLM call
rastir_tool_calls_total Counter Tool invocations by name, agent, model, provider
rastir_retrieval_calls_total Counter Retrieval operations by agent
rastir_errors_total Counter Failures by span type and normalised error type
rastir_guardrail_requests_total Counter LLM calls with guardrail config
rastir_guardrail_violations_total Counter Guardrail interventions by action/category
rastir_spans_sampled_total Counter Spans retained after sampling
rastir_spans_dropped_by_sampling_total Counter Spans dropped by sampling
rastir_backpressure_warnings_total Counter Queue soft-limit warnings
rastir_ingestion_rate Gauge Spans per second throughput
rastir_queue_utilization_percent Gauge Collector backpressure indicator

Full metrics reference → Server Documentation

Server Endpoints

Method Path Description
POST /v1/telemetry Ingest span batches
GET /metrics Prometheus exposition
GET /v1/traces Query trace store
GET /v1/traces/{trace_id} Get spans for a specific trace
GET /health Liveness probe
GET /ready Readiness probe (queue pressure)

Server Features

  • Sampling — probabilistic + error-always-retain + latency threshold (metrics always recorded regardless)
  • Backpressure — soft/hard queue limits with reject or drop-oldest mode
  • Rate limiting — per-IP and per-service RPM limits
  • Multi-tenant — inject tenant label from HTTP header
  • Exemplars — trace_id linked to histogram observations for Grafana → Jaeger drill-down
  • OTLP export — forward spans to Tempo, Jaeger, or any OTLP backend
  • Cardinality guards — per-dimension caps (model: 50, provider: 10, tool: 200, agent: 200, etc.)
  • Graceful shutdown — drains queue and flushes exporter before exit

Configuration

Configure via configure() call or environment variables:

configure(
    service="my-app",
    env="production",
    push_url="http://collector:8080/v1/telemetry",
    api_key="secret",
    batch_size=100,
    flush_interval=5,
)

Or equivalently:

export RASTIR_SERVICE=my-app
export RASTIR_ENV=production
export RASTIR_PUSH_URL=http://collector:8080/v1/telemetry

Full configuration reference → Configuration Documentation

Project Structure

src/rastir/
├── __init__.py          # Public API: configure, trace, agent, llm, tool, retrieval, wrap
├── config.py            # GlobalConfig, configure()
├── context.py           # Span & agent context (ContextVar-based)
├── decorators.py        # All decorator implementations + two-phase enrichment
├── wrapper.py           # rastir.wrap() generic object wrapper
├── spans.py             # SpanRecord data model
├── queue.py             # Bounded in-memory span queue
├── transport.py         # TelemetryClient + BackgroundExporter
├── adapters/            # 15 adapters: OpenAI, Azure, Anthropic, Bedrock, Gemini,
│                        #   Cohere, Mistral, Groq, LangChain, LangGraph, LlamaIndex, CrewAI
│   └── registry.py      # Adapter resolution pipeline + request-phase scanning
└── server/              # FastAPI collector
    ├── app.py           # Server factory, routes, lifespan
    ├── config.py        # Server configuration (YAML + env vars)
    ├── metrics.py       # MetricsRegistry — Prometheus counters/histograms/gauges
    ├── ingestion.py     # IngestionWorker — queue → record_span() → store/export
    └── trace_store.py   # In-memory trace store with LRU eviction

Development

pip install -e ".[all]"           # editable install with all extras
pytest                            # 232+ unit/mock tests, 36+ integration tests
ruff check src/ tests/            # linting

Grafana Dashboards

Rastir ships five pre-built Grafana dashboards in grafana/dashboards/:

Dashboard Description
LLM Performance Token usage, latency percentiles, throughput by model, error tracking
Agent & Tool Agent execution patterns, tool calls with model/provider context
Evaluation Eval runs/success/failures, scores by type and model, queue health
Guardrail Guardrail violations by category and model, request volumes
System Health Ingestion rate, queue pressure, memory, backpressure, OTLP export health

All dashboards include template variables for filtering by service, environment, model, provider, and agent. Import via Grafana UI or API.

Full dashboard documentation → Dashboards

Documentation

Full documentation at skamalj.github.io/rastir:

License

MIT — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rastir-0.1.0rc2.tar.gz (573.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rastir-0.1.0rc2-py3-none-any.whl (99.8 kB view details)

Uploaded Python 3

File details

Details for the file rastir-0.1.0rc2.tar.gz.

File metadata

  • Download URL: rastir-0.1.0rc2.tar.gz
  • Upload date:
  • Size: 573.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for rastir-0.1.0rc2.tar.gz
Algorithm Hash digest
SHA256 e0ee55b2f77cad39b10d7a120c414242521b710176f5361fe137594cb9083ac0
MD5 330c513e1075b6674962f0ea2c308541
BLAKE2b-256 3c39c1b39b11fc74e8a0037c9429ea8b095c52d0c9bc319fc64db422c43c2dd6

See more details on using hashes here.

File details

Details for the file rastir-0.1.0rc2-py3-none-any.whl.

File metadata

  • Download URL: rastir-0.1.0rc2-py3-none-any.whl
  • Upload date:
  • Size: 99.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for rastir-0.1.0rc2-py3-none-any.whl
Algorithm Hash digest
SHA256 56b3532e52b24b872a5b19e6638635704abe5faf512e1991ed5022b08c6f8f8a
MD5 06e965ef13da8b835f824dd6bc730937
BLAKE2b-256 c42ce500e6c2b9a2fe16fa6071380eea41904d80c45b83cc93414fc5ad2f179a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page