Skip to main content

Behavioral drift monitoring for AI agents.

Project description

Driftbase

Behavioral drift monitoring for AI agents: track runs locally, compare versions, get a drift score.

When you ship a new prompt, model, or tool set, agent behavior can change in subtle ways. Driftbase records each run (tool names, latency, outcome) in a local SQLite DB and lets you diff any two versions so you see a numeric drift score and per-dimension breakdown before or after deploy.


Quickstart

1. Install

pip install driftbase

2. Add the decorator

# my_agent.py
from driftbase import track

@track(version="v1.0", environment="production")
def my_agent(user_input: str) -> str:
    # your agent logic
    return "done"

3. Run your agent

Generate some runs for at least two versions (e.g. change code and use version="v2.0" for new runs).

python -c "from my_agent import my_agent; my_agent('hello')"
# Run a few more times, then switch to v2.0 and run again.

4. Run diff

driftbase diff v1.0 v2.0

5. See output

You get a threshold panel, a metrics table, tool frequency diff, optional sequence shifts, and a root-cause hypothesis. Example (with comments indicating where rich applies color):

# Panel: red border if above threshold, green if within
┌─ ▲ ABOVE THRESHOLD ─────────────────────────────────────────────────┐
│ Drift score 0.34 is above threshold 0.20. Consider investigating...   │
└──────────────────────────────────────────────────────────────────────┘

# Table: Drift — v1.0 → v2.0
┌─────────────────┬──────────┬─────────┬────────┐
│ Metric          │ Baseline │ Current │ Delta  │
├─────────────────┼──────────┼─────────┼────────┤
│ Overall drift   │     0.00 │    0.34 │  +0.34 │  # red if ≥ threshold
│ Decision drift  │     0.00 │    0.22 │  +0.22 │
│ Latency drift   │     0.00 │    0.18 │  +0.18 │
│ Error drift     │     0.00 │    0.00 │  +0.00 │
└─────────────────┴──────────┴─────────┴────────┘

# Tool call frequency diff (top 20 tools, Δ % in green/red/dim)
# Optional: Top 3 sequence shifts, Root cause hypothesis panel
# Footer: Runs: v1.0 (n=50) → v2.0 (n=50) · No data left your machine

How it works

Runs are written to SQLite in a background thread so your app is not blocked. When you run driftbase diff, the CLI loads runs for the two versions, builds a behavioral fingerprint for each (tool distributions, latency percentiles, error rate), and computes a divergence score between them. The score and per-dimension deltas tell you how much behavior changed.


Privacy

  • Captured and stored locally: Tool call names and order, latency, token counts, error/retry counts, outcome label (e.g. resolved/error). No raw user or model content.
  • Hashed then discarded: A hash of the task input and a hash of the output structure are stored; the original text is not.
  • Never stored or read: Raw user messages, raw agent output, system prompts, API keys, user identifiers.

Use driftbase inspect --run last to see the exact breakdown for any run.


CLI reference

Command Description Example
versions List deployment versions and run counts driftbase versions
diff Compare two versions; optional last N vs baseline driftbase diff v1.0 v2.0 or driftbase diff v1.0 local --last 20
inspect Show what was captured/dropped for a run driftbase inspect --run last
report Generate markdown/JSON/HTML drift report driftbase report v1.0 v2.0 -o report.md
watch Live drift monitor against a baseline driftbase watch --against v1.0
push Send local runs to Driftbase platform API driftbase push (uses DRIFTBASE_API_URL, DRIFTBASE_API_KEY)

Frameworks supported

The @track() decorator auto-detects and captures from:

  • LangChain — tool calls via callbacks
  • LangGraph — same as LangChain
  • LlamaIndex — function_call and callback events
  • OpenAIchat.completions.create tool_calls and usage
  • Generic — any callable; times the call and optionally parses tool_calls from the return value

Docs and product

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

driftbase-0.1.3.tar.gz (55.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

driftbase-0.1.3-py3-none-any.whl (60.8 kB view details)

Uploaded Python 3

File details

Details for the file driftbase-0.1.3.tar.gz.

File metadata

  • Download URL: driftbase-0.1.3.tar.gz
  • Upload date:
  • Size: 55.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for driftbase-0.1.3.tar.gz
Algorithm Hash digest
SHA256 5024a74495a38dacacb6c67966deffa4a08a5448c70215dc41f114bd2bef8fff
MD5 cc44c211d6a178da38307f9e746b051e
BLAKE2b-256 a9c0d892a3a655c92a76b695b4d16c3b13a8938c8dc820bd89f6b956e2034278

See more details on using hashes here.

File details

Details for the file driftbase-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: driftbase-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 60.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for driftbase-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 ceef372f6277ef827d57c1e08181af8c21db2656d67a6766fb1481b006c03f64
MD5 aa367ed7e95833e22fd3f22554c07be5
BLAKE2b-256 4508be029feb188f7336bbecf81fd6c3f5ce31f88d7762dca652f24501df45df

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page