Skip to main content

LLM & Agent Observability — structured tracing, Prometheus metrics, and OpenTelemetry export via Python decorators

Project description

Rastir

Rastir

LLM & Agent Observability for Python
Structured tracing and Prometheus metrics via decorators — no monkey-patching, no vendor lock-in.

PyPI Python Docs License GitHub


Why Rastir?

Most LLM observability tools require SDK wrappers, monkey-patching, or vendor-specific clients. Rastir takes a different approach:

  • Decorators, not wrappers — add @llm, @agent, @tool to your existing functions. No code rewrites.
  • Adapters, not monkey-patches — Rastir inspects return values to extract model, tokens, and provider metadata. Works with any SDK version.
  • Self-hosted collector — a lightweight FastAPI server you own. Prometheus metrics out of the box, OTLP export to Tempo/Jaeger if you want it.
  • Zero external infrastructure — no database, no Redis, no Kafka. The collector is stateless and runs in a single container.
Your Python App                          Rastir Collector
┌──────────────────────┐     HTTP POST    ┌──────────────────────────┐
│  @agent              │ ───────────────▸ │  FastAPI ingestion       │
│    @llm (OpenAI)     │   span batches   │  ├─ Prometheus /metrics  │
│    @tool (search)    │                  │  ├─ Trace store /traces  │
│    @retrieval (RAG)  │                  │  └─ OTLP → Tempo/Jaeger  │
└──────────────────────┘                  └──────────────────────────┘
        decorators                              collector server

Supported Providers

Provider Auto-detection Tokens Model Streaming
OpenAI
Anthropic
AWS Bedrock
LangChain
LangGraph

Adapters are priority-ordered and composable: LangGraph → LangChain → OpenAI resolution happens automatically.

Installation

pip install rastir              # Client library (decorators + HTTP push)
pip install rastir[server]      # + Collector server (FastAPI, Prometheus, OTLP)
pip install rastir[all]         # Everything including dev tools

Quick Start

1. Instrument your code (3 lines to add)

from rastir import configure, agent, llm, tool, retrieval

configure(
    service="my-app",
    push_url="http://localhost:8080/v1/telemetry",
)

@agent(agent_name="research_agent")
def run_research(query: str) -> str:
    context = fetch_docs(query)
    return ask_llm(query, context)

@retrieval
def fetch_docs(query: str) -> list[str]:
    return vector_db.search(query)           # auto-tracked

@llm(model="gpt-4o", provider="openai")
def ask_llm(query: str, context: list[str]) -> str:
    return openai.chat(messages=[...])        # tokens & model extracted automatically

2. Start the collector

rastir-server                              # default: 0.0.0.0:8080
# or
docker run -p 8080:8080 rastir-server

3. Query metrics

curl http://localhost:8080/metrics          # Prometheus format
curl http://localhost:8080/v1/traces        # JSON trace store

That's it. Prometheus scrapes /metrics, you build Grafana dashboards, and optionally forward spans to Tempo or Jaeger via OTLP.

What you get in Prometheus

# Token usage by model
rastir_tokens_input_total{model="gpt-4o",provider="openai",agent="research_agent"} 1250
rastir_tokens_output_total{model="gpt-4o",provider="openai",agent="research_agent"} 380

# Latency percentiles
rastir_duration_seconds_bucket{span_type="llm",le="0.5"} 12
rastir_duration_seconds_bucket{span_type="llm",le="1.0"} 45

# Tool & retrieval call rates
rastir_tool_calls_total{tool_name="web_search",agent="research_agent"} 89
rastir_retrieval_calls_total{agent="research_agent"} 156

Nested Spans

Rastir automatically links parent–child relationships for agent call trees:

@agent(agent_name="supervisor")
def supervisor(task):
    plan = planner(task)            # nested agent
    return executor(plan)

@agent(agent_name="planner")
def planner(task):
    return ask_llm(task)            # nested LLM call

@llm(model="gpt-4o")
def ask_llm(prompt):
    return openai.chat(messages=[...])
supervisor (agent, 3200ms)
├── planner (agent, 1100ms)
│   └── ask_llm (llm, 980ms) → model=gpt-4o, tokens_in=150, tokens_out=85
└── executor (agent, 2000ms)
    ├── web_search (tool, 450ms)
    └── ask_llm (llm, 1200ms) → model=gpt-4o, tokens_in=320, tokens_out=200

Works with LangGraph

from langgraph.prebuilt import create_react_agent

app = create_react_agent(ChatOpenAI(model="gpt-4o-mini"), tools=[search, calc])

@agent(agent_name="react_agent")
def run(query: str):
    return app.invoke({"messages": [HumanMessage(query)]})
    # Rastir auto-detects LangGraph state → LangChain messages → OpenAI response
    # Extracts: model, tokens, tool calls, message counts — zero config

Key Metrics at a Glance

Metric Type What it tracks
rastir_llm_calls_total Counter LLM invocations by model, provider, agent
rastir_tokens_input_total Counter Input token consumption
rastir_tokens_output_total Counter Output token consumption
rastir_duration_seconds Histogram Latency with P50/P95/P99 + exemplars
rastir_tool_calls_total Counter Tool invocations by name and agent
rastir_errors_total Counter Failures by span type and error type
rastir_queue_size Gauge Collector backpressure indicator

Full metrics reference → Server Documentation

Server Endpoints

Method Path Description
POST /v1/telemetry Ingest span batches
GET /metrics Prometheus exposition
GET /v1/traces Query trace store
GET /health Liveness probe
GET /ready Readiness probe (queue pressure)

Configuration

Configure via configure() call or environment variables:

configure(
    service="my-app",
    env="production",
    push_url="http://collector:8080/v1/telemetry",
    api_key="secret",
    batch_size=100,
    flush_interval=5,
)

Or equivalently:

export RASTIR_SERVICE=my-app
export RASTIR_ENV=production
export RASTIR_PUSH_URL=http://collector:8080/v1/telemetry

Full configuration reference → Configuration Documentation

Project Structure

src/rastir/
├── __init__.py          # Public API: configure, trace, agent, llm, tool, retrieval
├── config.py            # GlobalConfig, configure()
├── context.py           # Span & agent context (ContextVar-based)
├── decorators.py        # All decorator implementations
├── spans.py             # SpanRecord data model
├── queue.py             # Bounded in-memory span queue
├── transport.py         # TelemetryClient + BackgroundExporter
├── adapters/            # Auto-detection for OpenAI, Anthropic, Bedrock, LangChain, LangGraph
└── server/              # FastAPI collector with Prometheus, trace store, OTLP export

Development

pip install -e ".[all]"           # editable install with all extras
pytest                            # 337 tests (unit + integration)
ruff check src/ tests/            # linting

Documentation

Full documentation at skamalj.github.io/rastir:

License

MIT — see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rastir-0.1.0b2.tar.gz (474.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rastir-0.1.0b2-py3-none-any.whl (55.3 kB view details)

Uploaded Python 3

File details

Details for the file rastir-0.1.0b2.tar.gz.

File metadata

  • Download URL: rastir-0.1.0b2.tar.gz
  • Upload date:
  • Size: 474.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for rastir-0.1.0b2.tar.gz
Algorithm Hash digest
SHA256 d444ca6be74fbcaddfad914bb285d385c7c343d27c7d0bd34025814735ad8afd
MD5 1325ab9859e5bfef8bbb590c974b08ac
BLAKE2b-256 74fcde82a79d5752ab0ed0b4947512e7fadb3769cde9e7cc3b042476a796a959

See more details on using hashes here.

File details

Details for the file rastir-0.1.0b2-py3-none-any.whl.

File metadata

  • Download URL: rastir-0.1.0b2-py3-none-any.whl
  • Upload date:
  • Size: 55.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for rastir-0.1.0b2-py3-none-any.whl
Algorithm Hash digest
SHA256 18fa32bd6c2b67c3114a9b0855117497fbf38e05f8edc7131fcf490bc7a8f648
MD5 a4c5546289cdd7c2ae715c6c9f1b0202
BLAKE2b-256 735d80930a62ba9b575f1ce0050eb25ccd0dceaedd9f37fa2dc1a62ff3353d6b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page