Skip to main content

Behavioral drift monitoring for AI agents.

Project description

Driftbase

Behavioral drift monitoring for AI agents: track runs locally, compare versions, get a drift score.

When you ship a new prompt, model, or tool set, agent behavior can change in subtle ways. Driftbase records each run (tool names, latency, outcome) in a local SQLite DB and lets you diff any two versions so you see a numeric drift score and per-dimension breakdown before or after deploy.


Quickstart

1. Install

pip install driftbase[local]

2. Add the decorator

# my_agent.py
from driftbase import track

@track(version="v1.0", environment="production")
def my_agent(user_input: str) -> str:
    # your agent logic
    return "done"

3. Run your agent

Generate some runs for at least two versions (e.g. change code and use version="v2.0" for new runs).

python -c "from my_agent import my_agent; my_agent('hello')"
# Run a few more times, then switch to v2.0 and run again.

4. Run diff

driftbase diff v1.0 v2.0

5. See output

You get a threshold panel, a metrics table, tool frequency diff, optional sequence shifts, and a root-cause hypothesis. Example (with comments indicating where rich applies color):

# Panel: red border if above threshold, green if within
┌─ ▲ ABOVE THRESHOLD ─────────────────────────────────────────────────┐
│ Drift score 0.34 is above threshold 0.20. Consider investigating...   │
└──────────────────────────────────────────────────────────────────────┘

# Table: Drift — v1.0 → v2.0
┌─────────────────┬──────────┬─────────┬────────┐
│ Metric          │ Baseline │ Current │ Delta  │
├─────────────────┼──────────┼─────────┼────────┤
│ Overall drift   │     0.00 │    0.34 │  +0.34 │  # red if ≥ threshold
│ Decision drift  │     0.00 │    0.22 │  +0.22 │
│ Latency drift   │     0.00 │    0.18 │  +0.18 │
│ Error drift     │     0.00 │    0.00 │  +0.00 │
└─────────────────┴──────────┴─────────┴────────┘

# Tool call frequency diff (top 20 tools, Δ % in green/red/dim)
# Optional: Top 3 sequence shifts, Root cause hypothesis panel
# Footer: Runs: v1.0 (n=50) → v2.0 (n=50) · No data left your machine

How it works

Runs are written to SQLite in a background thread so your app is not blocked. When you run driftbase diff, the CLI loads runs for the two versions, builds a behavioral fingerprint for each (tool distributions, latency percentiles, error rate), and computes a divergence score between them. The score and per-dimension deltas tell you how much behavior changed.


Privacy

  • Captured and stored locally: Tool call names and order, latency, token counts, error/retry counts, outcome label (e.g. resolved/error). No raw user or model content.
  • Hashed then discarded: A hash of the task input and a hash of the output structure are stored; the original text is not.
  • Never stored or read: Raw user messages, raw agent output, system prompts, API keys, user identifiers.

Use driftbase inspect --run last to see the exact breakdown for any run.


CLI reference

Command Description Example
versions List deployment versions and run counts driftbase versions
diff Compare two versions; optional last N vs baseline driftbase diff v1.0 v2.0 or driftbase diff v1.0 local --last 20
inspect Show what was captured/dropped for a run driftbase inspect --run last
report Generate markdown/JSON/HTML drift report driftbase report v1.0 v2.0 -o report.md
watch Live drift monitor against a baseline driftbase watch --against v1.0
push Send local runs to Driftbase platform API driftbase push (uses DRIFTBASE_API_URL, DRIFTBASE_API_KEY)

Frameworks supported

The @track() decorator auto-detects and captures from:

  • LangChain — tool calls via callbacks
  • LangGraph — same as LangChain
  • LlamaIndex — function_call and callback events
  • OpenAIchat.completions.create tool_calls and usage
  • Generic — any callable; times the call and optionally parses tool_calls from the return value

Docs and product

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

driftbase-0.1.1.tar.gz (52.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

driftbase-0.1.1-py3-none-any.whl (59.2 kB view details)

Uploaded Python 3

File details

Details for the file driftbase-0.1.1.tar.gz.

File metadata

  • Download URL: driftbase-0.1.1.tar.gz
  • Upload date:
  • Size: 52.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for driftbase-0.1.1.tar.gz
Algorithm Hash digest
SHA256 70bec3a9e40609803048a267fb9f1677c2fd45146b3e4ef8321bd67ae1738102
MD5 515e2887c90298c791899a58a9e3687b
BLAKE2b-256 a78e704fae91c5df57fe4f6703c9256f81a4d4dd6bc0db50beb6d8a1a8e9b465

See more details on using hashes here.

File details

Details for the file driftbase-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: driftbase-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 59.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for driftbase-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 db4027ac232144d64f5e08af8f0476af2de3ddc6e8d27cf38bbaa91a3c66fb7b
MD5 4d0c1966e5c3883569dae0564521a2dc
BLAKE2b-256 0986328fa9032126bf4ca4674c2fa020992951ddc73db0b5536678c40c412633

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page