Skip to main content

Behavioral drift detection for AI agents. Prevent regressions with automated drift checks in CI/CD.

Project description

Driftbase

Behavioral drift detection for AI agents using your Langfuse traces.

AI agents drift. A prompt update, a model swap, a RAG reindex — any of these can shift how your agent makes decisions, without triggering a single test failure.

Driftbase tells you when your agent changed, what caused it, and whether it got better or worse — by analyzing the traces you're already collecting in Langfuse.

pip install driftbase

Connect your Langfuse instance:

export LANGFUSE_PUBLIC_KEY=pk-lf-...
export LANGFUSE_SECRET_KEY=sk-lf-...

driftbase connect

Then, when something feels off:

driftbase diagnose
DRIFTBASE DIAGNOSTIC

  Behavioral shift detected 11 days ago (2026-03-20)
  Most likely cause: prompt change in release v2.1
  Affected:          escalation rate 4% → 19%, latency +1.2s

  Recommendation:    REVIEW before production deploy

No agent code changes. No instrumentation. Just instant answers from your existing Langfuse data.

PyPI version License: Apache 2.0 Python 3.10+


How it works

Driftbase is a drift detection layer on top of Langfuse. You already trace your agent with Langfuse — Driftbase reads those traces and detects behavioral drift.

1. You're already tracing with Langfuse

Your AI agent is instrumented with Langfuse (via LangChain, LangGraph, OpenAI, or any other framework). Traces flow into Langfuse automatically.

2. Connect Driftbase to Langfuse

Driftbase pulls historical traces from Langfuse and stores them locally for analysis:

driftbase connect

This imports your traces into a local SQLite database (~/.driftbase/runs.db). All analysis runs on your machine. No data leaves your environment.

3. Detect drift

When something feels wrong:

driftbase diagnose

Scans your full trace history, detects behavioral shifts, and correlates them with version changes.

Compare explicit versions:

driftbase diff v1.0 v2.0

Produces a statistical drift score and a deployment verdict (SHIP / MONITOR / REVIEW / BLOCK).

View behavioral history:

driftbase history

Shows how your agent's behavior evolved over time — which epochs were stable, which shifted, and what changed at each breakpoint.


Core Value Proposition

What You Get Why It Matters
60-second wow moment Run driftbase demo --offline to see drift detection on synthetic data with zero dependencies
Zero cold start Start detecting drift from day 1 using your existing Langfuse traces — no SDK to add, no baseline to collect
GitHub Action integration Automatic drift checks on every PR with rich, color-coded reports posted as comments
Self-calibrating drift scores Weights and thresholds learn from your labeled deployments — the more you use it, the better it gets
Root cause pinpointing Correlates drift with version changes and surfaces the most likely cause with confidence level
100% local-first All data stays on your machine in SQLite — no cloud required, GDPR-compliant by design
Framework-agnostic Works with any framework already traced in Langfuse or LangSmith — LangChain, OpenAI, CrewAI, custom agents
Progressive confidence Starts working with just 15 runs, full statistical power at 50+ runs per version

60-Second Demo (No Dependencies)

Want to see drift detection in action before connecting your own traces?

pip install driftbase
driftbase demo --offline

This generates synthetic agent runs showing realistic behavioral drift scenarios and walks you through the core commands. 100% offline, zero external dependencies.


The 5-Minute Quickstart

1. Install

pip install driftbase

2. Set Langfuse credentials

export LANGFUSE_PUBLIC_KEY=pk-lf-...
export LANGFUSE_SECRET_KEY=sk-lf-...
export LANGFUSE_HOST=https://cloud.langfuse.com  # optional

Get your keys from Langfuse Settings → API Keys.

3. Import traces

# Auto-detect and import
driftbase connect

# Or specify project explicitly
driftbase connect langfuse --project my-agent --limit 1000

4. Detect drift

# Automatic drift detection
driftbase diagnose

# Compare specific versions
driftbase diff v1.0 v2.0

# View behavioral history
driftbase history

That's it. You're detecting drift in 5 minutes using traces you already have.

See examples/langfuse-quickstart for a complete walkthrough.


What Driftbase analyzes

Driftbase computes drift across 12 behavioral dimensions:

  1. Decision drift — Changes in outcome distribution (resolved/escalated/error)
  2. Tool sequence — Pattern changes in tool usage order
  3. Tool distribution — Frequency changes in which tools are called
  4. Latency — p95 latency shifts
  5. Error rate — Proportion of failed runs
  6. Retry rate — How often the agent retries operations
  7. Loop depth — Changes in iterative reasoning patterns
  8. Verbosity ratio — Output length relative to input
  9. Output length — Total token count in responses
  10. Time to first tool — How quickly the agent starts using tools
  11. Semantic drift — Heuristic clustering of output semantics
  12. Tool transitions — Changes in tool-to-tool call patterns

Each dimension is weighted based on your agent's inferred use case (e.g., customer support vs. code generation).


CLI Commands

Core Commands

# Connect to Langfuse and import traces
driftbase connect

# Detect drift automatically across all versions
driftbase diagnose

# Compare two specific versions
driftbase diff v1.0 v2.0

# View behavioral history over time
driftbase history

# Interactive setup guide
driftbase init

Advanced Commands

# Inspect individual runs
driftbase inspect <run-id>

# Export drift report as JSON
driftbase export --format json --output report.json

# Set up behavioral budgets
driftbase budgets set --dimension error_rate --threshold 0.05

# Prune old runs to save space
driftbase prune --before 2026-01-01

# Health check
driftbase doctor

CI/CD Integration

Driftbase integrates seamlessly into deployment pipelines to catch behavioral regressions before production.

Output Formats

# Rich terminal output (default)
driftbase diff v1.0 v2.0

# JSON for programmatic consumption
driftbase diff v1.0 v2.0 --format=json

# Markdown for PR comments
driftbase diff v1.0 v2.0 --format=markdown

Exit Codes

  • Exit 0: SHIP or MONITOR verdicts (safe to deploy)
  • Exit 1: REVIEW or BLOCK verdicts (manual review required)

Quick Start: GitHub Actions

# .github/workflows/drift-check.yml
name: Drift Check

on: [pull_request]

jobs:
  drift:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - run: pip install driftbase
      - run: driftbase diff v1.2.3 v1.3.0 --ci
        env:
          DRIFTBASE_DB_PATH: ./runs.db

The --ci flag enables:

  • JSON output
  • Non-zero exit on drift
  • Compact formatting

Detailed Verdict Analysis

After a diff completes, use driftbase explain to see the full breakdown:

# Explain most recent verdict
driftbase explain

# Explain specific verdict by ID
driftbase explain abc-123-def

Shows:

  • Top 3 contributing dimensions with evidence
  • Confidence intervals and significance markers
  • Minimum Detectable Effects (MDEs)
  • Rollback target (for REVIEW/BLOCK verdicts)

PR Comment Integration

Post drift reports directly to pull requests:

- name: Generate drift report
  run: |
    OUTPUT=$(driftbase diff v1 v2 --format=markdown)
    echo "report<<EOF" >> $GITHUB_OUTPUT
    echo "$OUTPUT" >> $GITHUB_OUTPUT
    echo "EOF" >> $GITHUB_OUTPUT

- uses: actions/github-script@v7
  with:
    script: |
      github.rest.issues.createComment({
        issue_number: context.issue.number,
        owner: context.repo.owner,
        repo: context.repo.name,
        body: `${{ steps.drift.outputs.report }}`
      })

Result: GitHub-flavored markdown table with top contributors, MDEs, and rollback targets.

Rollback on Regression

VERDICT=$(driftbase diff v1 v2 --format=json | jq -r .verdict)
ROLLBACK=$(driftbase diff v1 v2 --format=json | jq -r .rollback_target)

if [ "$VERDICT" = "BLOCK" ]; then
  echo "Behavioral regression detected. Rolling back to $ROLLBACK"
  kubectl set image deployment/agent agent=$ROLLBACK
  exit 1
fi

See docs/ci-integration.md for GitLab CI, CircleCI, and advanced patterns.


Use Cases

1. Pre-Deploy Drift Gate (GitHub Action)

Add .github/workflows/drift-check.yml:

name: Drift Check

on:
  pull_request:
    branches: [main]

permissions:
  pull-requests: write
  contents: read

jobs:
  drift-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Run Driftbase drift check
        uses: driftbase-labs/driftbase-python/github-action@v1
        with:
          baseline-version: main
          current-version: ${{ github.head_ref }}
          fail-on-review: true
          github-token: ${{ secrets.GITHUB_TOKEN }}

Posts a color-coded drift report as a PR comment with verdict (SHIP/MONITOR/REVIEW/BLOCK) and dimension breakdown.

See github-action/README.md for full documentation.

2. Post-Deploy Monitoring

#!/bin/bash
# Daily drift check (cron: 0 9 * * *)

export LANGFUSE_PUBLIC_KEY=...
export LANGFUSE_SECRET_KEY=...

driftbase connect --since $(date -d '1 day ago' +%Y-%m-%d)
driftbase diagnose --alert-on-drift

3. Incident Response

When users report unexpected agent behavior:

# Pull latest traces and diagnose
driftbase connect --since 2026-03-01
driftbase diagnose

# Inspect specific problematic run
driftbase inspect <run-id>

# Compare current vs. last known good
driftbase diff v2.0-stable v2.1-current

Configuration

Driftbase works out of the box with zero configuration. Optional settings:

# Set custom DB path
export DRIFTBASE_DB_PATH=/path/to/runs.db

# Set default Langfuse host
export LANGFUSE_HOST=https://your-instance.com

# Configure cost tracking
export DRIFTBASE_RATE_PROMPT_1M=2.50
export DRIFTBASE_RATE_COMPLETION_1M=10.00

# Reproducibility and sampling (Phase 1 correctness features)
export DRIFTBASE_SEED=42                    # Random seed for reproducible drift reports (default: 42)
export DRIFTBASE_FINGERPRINT_LIMIT=5000     # Max runs per fingerprint (default: 5000)
export DRIFTBASE_BOOTSTRAP_ITERS=500        # Bootstrap iterations for confidence intervals (default: 500)

See docs/configuration.md for advanced settings.


Architecture

┌──────────────────────────────────────────────────────────────┐
│  YOUR AI AGENT                                               │
│  (instrumented with Langfuse via any framework)              │
└────────────────┬─────────────────────────────────────────────┘
                 │
                 │ traces
                 ▼
┌──────────────────────────────────────────────────────────────┐
│  LANGFUSE                                                    │
│  (observability platform)                                    │
└────────────────┬─────────────────────────────────────────────┘
                 │
                 │ driftbase connect
                 ▼
┌──────────────────────────────────────────────────────────────┐
│  DRIFTBASE                                                   │
│  ├─ Local SQLite DB (runs, fingerprints, epochs)            │
│  ├─ Drift analysis engine (12 dimensions)                   │
│  ├─ Baseline calibrator (auto-weights + thresholds)         │
│  ├─ Anomaly detector (multivariate outliers)                │
│  └─ Verdict engine (SHIP/MONITOR/REVIEW/BLOCK)              │
└──────────────────────────────────────────────────────────────┘

Key principle: Driftbase is NOT a tracing tool. It's a drift detection layer that reads existing traces from Langfuse.


Roadmap

Completed:

  • Langfuse connector with incremental sync
  • LangSmith connector
  • 12-dimension drift analysis
  • Progressive weight learning from labeled deployments
  • Statistical confidence tiers (TIER1/TIER2/TIER3)
  • GitHub Action with standalone + cloud modes
  • MCP server for Claude Desktop integration
  • 60-second offline demo

Deferred (requires Cloud API):

  • Privacy-first telemetry
  • Opt-in data contribution for moat building

Future:

  • Arize connector
  • Generic OTEL ingestion
  • Slack/PagerDuty alerting
  • Web dashboard (Cloud tier)

Development

# Clone repo
git clone https://github.com/driftbase-labs/driftbase-python
cd driftbase-python

# Install in editable mode with dev dependencies
pip install -e '.[dev]'

# Run tests
pytest tests/

# Run linter
ruff check .
ruff format .

FAQ

Do I need to change my agent code?

No. Driftbase reads existing Langfuse traces. Your agent continues using Langfuse exactly as before.

Where is my data stored?

All analysis runs locally. Traces are stored in ~/.driftbase/runs.db (SQLite). Nothing leaves your machine unless you explicitly push to a remote backend (Pro tier feature).

What if I don't have Langfuse yet?

Set up Langfuse first: langfuse.com/docs/get-started. It takes ~10 minutes to instrument your agent with Langfuse, then you can use Driftbase.

What if I don't have historical traces?

Use driftbase testset generate to create synthetic baseline data, or start collecting traces now and compare future versions.

How often should I sync?

  • Development: After every agent change
  • Production: Daily or on-deploy via CI/CD

Does this work with LangSmith?

Yes! Driftbase supports both Langfuse and LangSmith. Use:

driftbase connect langsmith --project my-agent

Arize and generic OTEL support are planned for future releases.

Is this free?

Yes. The OSS SDK is free forever. We'll offer a Pro tier (hosted web dashboard, real-time alerting, team features) in the future, but the local CLI will always be free.


Support


License

Apache 2.0. See LICENSE.


Built with ❤️ for AI engineers who want to ship with confidence.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

driftbase-0.12.1rc1.tar.gz (429.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

driftbase-0.12.1rc1-py3-none-any.whl (273.7 kB view details)

Uploaded Python 3

File details

Details for the file driftbase-0.12.1rc1.tar.gz.

File metadata

  • Download URL: driftbase-0.12.1rc1.tar.gz
  • Upload date:
  • Size: 429.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for driftbase-0.12.1rc1.tar.gz
Algorithm Hash digest
SHA256 14753abb6ccfb6e9031307728b31e5c89de6ba4dfc8c981daa87631bccf064ac
MD5 066f5cf034d2b5a42a10a03101595d4b
BLAKE2b-256 cf1fcf7369e35d6ad697a1bc14038ec13fc0566dbbb6024ecfa52432e0fc9b6c

See more details on using hashes here.

Provenance

The following attestation bundles were made for driftbase-0.12.1rc1.tar.gz:

Publisher: publish.yml on driftbase-labs/driftbase-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file driftbase-0.12.1rc1-py3-none-any.whl.

File metadata

  • Download URL: driftbase-0.12.1rc1-py3-none-any.whl
  • Upload date:
  • Size: 273.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for driftbase-0.12.1rc1-py3-none-any.whl
Algorithm Hash digest
SHA256 00a16223b365dd453eda524a1025c7dad07d9c7f162bc015bb35adbb11d979fd
MD5 43c1d3d6bd1789d73afa1f64a63f973e
BLAKE2b-256 6f13f04ef1e27837f70217eea19f82ee9bbec84f2f81ffcf1a3e68d8607d46cb

See more details on using hashes here.

Provenance

The following attestation bundles were made for driftbase-0.12.1rc1-py3-none-any.whl:

Publisher: publish.yml on driftbase-labs/driftbase-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page