Skip to main content

Behavioral drift monitoring for AI agents.

Project description

Driftbase

Behavioral drift monitoring for AI agents: track runs locally, compare versions, get a drift score.

When you ship a new prompt, model, or tool set, agent behavior can change in subtle ways. Driftbase records each run (tool names, latency, outcome) in a local SQLite DB and lets you diff any two versions so you see a numeric drift score and per-dimension breakdown before or after deploy.


Quickstart

1. Install

pip install driftbase[local]

2. Add the decorator

# my_agent.py
from driftbase import track

@track(version="v1.0", environment="production")
def my_agent(user_input: str) -> str:
    # your agent logic
    return "done"

3. Run your agent

Generate some runs for at least two versions (e.g. change code and use version="v2.0" for new runs).

python -c "from my_agent import my_agent; my_agent('hello')"
# Run a few more times, then switch to v2.0 and run again.

4. Run diff

driftbase diff v1.0 v2.0

5. See output

You get a threshold panel, a metrics table, tool frequency diff, optional sequence shifts, and a root-cause hypothesis. Example (with comments indicating where rich applies color):

# Panel: red border if above threshold, green if within
┌─ ▲ ABOVE THRESHOLD ─────────────────────────────────────────────────┐
│ Drift score 0.34 is above threshold 0.20. Consider investigating...   │
└──────────────────────────────────────────────────────────────────────┘

# Table: Drift — v1.0 → v2.0
┌─────────────────┬──────────┬─────────┬────────┐
│ Metric          │ Baseline │ Current │ Delta  │
├─────────────────┼──────────┼─────────┼────────┤
│ Overall drift   │     0.00 │    0.34 │  +0.34 │  # red if ≥ threshold
│ Decision drift  │     0.00 │    0.22 │  +0.22 │
│ Latency drift   │     0.00 │    0.18 │  +0.18 │
│ Error drift     │     0.00 │    0.00 │  +0.00 │
└─────────────────┴──────────┴─────────┴────────┘

# Tool call frequency diff (top 20 tools, Δ % in green/red/dim)
# Optional: Top 3 sequence shifts, Root cause hypothesis panel
# Footer: Runs: v1.0 (n=50) → v2.0 (n=50) · No data left your machine

How it works

Runs are written to SQLite in a background thread so your app is not blocked. When you run driftbase diff, the CLI loads runs for the two versions, builds a behavioral fingerprint for each (tool distributions, latency percentiles, error rate), and computes a divergence score between them. The score and per-dimension deltas tell you how much behavior changed.


Privacy

  • Captured and stored locally: Tool call names and order, latency, token counts, error/retry counts, outcome label (e.g. resolved/error). No raw user or model content.
  • Hashed then discarded: A hash of the task input and a hash of the output structure are stored; the original text is not.
  • Never stored or read: Raw user messages, raw agent output, system prompts, API keys, user identifiers.

Use driftbase inspect --run last to see the exact breakdown for any run.


CLI reference

Command Description Example
versions List deployment versions and run counts driftbase versions
diff Compare two versions; optional last N vs baseline driftbase diff v1.0 v2.0 or driftbase diff v1.0 local --last 20
inspect Show what was captured/dropped for a run driftbase inspect --run last
report Generate markdown/JSON/HTML drift report driftbase report v1.0 v2.0 -o report.md
watch Live drift monitor against a baseline driftbase watch --against v1.0
push Send local runs to Driftbase platform API driftbase push (uses DRIFTBASE_API_URL, DRIFTBASE_API_KEY)

Frameworks supported

The @track() decorator auto-detects and captures from:

  • LangChain — tool calls via callbacks
  • LangGraph — same as LangChain
  • LlamaIndex — function_call and callback events
  • OpenAIchat.completions.create tool_calls and usage
  • Generic — any callable; times the call and optionally parses tool_calls from the return value

Docs and product

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

driftbase-0.1.0.tar.gz (53.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

driftbase-0.1.0-py3-none-any.whl (59.2 kB view details)

Uploaded Python 3

File details

Details for the file driftbase-0.1.0.tar.gz.

File metadata

  • Download URL: driftbase-0.1.0.tar.gz
  • Upload date:
  • Size: 53.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for driftbase-0.1.0.tar.gz
Algorithm Hash digest
SHA256 17ed83c3022716ceec86d3a0caf96c2b586ff392407525750b12b90e0109140b
MD5 ce256a2609831b987ef0c364866c9dd3
BLAKE2b-256 90fcf099e51471c7b21f8fb3ddb09ef88e01991804cb7268b01076f08e487754

See more details on using hashes here.

File details

Details for the file driftbase-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: driftbase-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 59.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for driftbase-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b36a21639da12a76ea2112645978e5f96294c12145125b5b031bd6bb23c4879f
MD5 76f5ae1228df0173d8d3685b4c7d6790
BLAKE2b-256 edda0e06e20ce1f36fef19d8da5c45a20f865c17cae65f08ef73b27bd2be01bb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page