Skip to main content

Behavioral drift detection for AI agents. Prevent regressions with automated drift checks in CI/CD.

Project description

Driftbase

Behavioral drift detection for AI agents using your Langfuse traces.

AI agents drift. A prompt update, a model swap, a RAG reindex — any of these can shift how your agent makes decisions, without triggering a single test failure.

Driftbase tells you when your agent changed, what caused it, and whether it got better or worse — by analyzing the traces you're already collecting in Langfuse.

pip install driftbase

Connect your Langfuse instance:

export LANGFUSE_PUBLIC_KEY=pk-lf-...
export LANGFUSE_SECRET_KEY=sk-lf-...

driftbase connect

Then, when something feels off:

driftbase diagnose
DRIFTBASE DIAGNOSTIC

  Behavioral shift detected 11 days ago (2026-03-20)
  Most likely cause: prompt change in release v2.1
  Affected:          escalation rate 4% → 19%, latency +1.2s

  Recommendation:    REVIEW before production deploy

No agent code changes. No instrumentation. Just instant answers from your existing Langfuse data.

PyPI version License: Apache 2.0 Python 3.10+


How it works

Driftbase is a drift detection layer on top of Langfuse. You already trace your agent with Langfuse — Driftbase reads those traces and detects behavioral drift.

1. You're already tracing with Langfuse

Your AI agent is instrumented with Langfuse (via LangChain, LangGraph, OpenAI, or any other framework). Traces flow into Langfuse automatically.

2. Connect Driftbase to Langfuse

Driftbase pulls historical traces from Langfuse and stores them locally for analysis:

driftbase connect

This imports your traces into a local SQLite database (~/.driftbase/runs.db). All analysis runs on your machine. No data leaves your environment.

3. Detect drift

When something feels wrong:

driftbase diagnose

Scans your full trace history, detects behavioral shifts, and correlates them with version changes.

Compare explicit versions:

driftbase diff v1.0 v2.0

Produces a statistical drift score and a deployment verdict (SHIP / MONITOR / REVIEW / BLOCK).

View behavioral history:

driftbase history

Shows how your agent's behavior evolved over time — which epochs were stable, which shifted, and what changed at each breakpoint.


Core Value Proposition

What You Get Why It Matters
60-second wow moment Run driftbase demo --offline to see drift detection on synthetic data with zero dependencies
Zero cold start Start detecting drift from day 1 using your existing Langfuse traces — no SDK to add, no baseline to collect
GitHub Action integration Automatic drift checks on every PR with rich, color-coded reports posted as comments
Self-calibrating drift scores Weights and thresholds learn from your labeled deployments — the more you use it, the better it gets
Root cause pinpointing Correlates drift with version changes and surfaces the most likely cause with confidence level
100% local-first All data stays on your machine in SQLite — no cloud required, GDPR-compliant by design
Framework-agnostic Works with any framework already traced in Langfuse or LangSmith — LangChain, OpenAI, CrewAI, custom agents
Progressive confidence Starts working with just 15 runs, full statistical power at 50+ runs per version

60-Second Demo (No Dependencies)

Want to see drift detection in action before connecting your own traces?

pip install driftbase
driftbase demo --offline

This generates synthetic agent runs showing realistic behavioral drift scenarios and walks you through the core commands. 100% offline, zero external dependencies.


The 5-Minute Quickstart

1. Install

pip install driftbase

2. Set Langfuse credentials

export LANGFUSE_PUBLIC_KEY=pk-lf-...
export LANGFUSE_SECRET_KEY=sk-lf-...
export LANGFUSE_HOST=https://cloud.langfuse.com  # optional

Get your keys from Langfuse Settings → API Keys.

3. Import traces

# Auto-detect and import
driftbase connect

# Or specify project explicitly
driftbase connect langfuse --project my-agent --limit 1000

4. Detect drift

# Automatic drift detection
driftbase diagnose

# Compare specific versions
driftbase diff v1.0 v2.0

# View behavioral history
driftbase history

That's it. You're detecting drift in 5 minutes using traces you already have.

See examples/langfuse-quickstart for a complete walkthrough.


What Driftbase analyzes

Driftbase computes drift across 12 behavioral dimensions:

  1. Decision drift — Changes in outcome distribution (resolved/escalated/error)
  2. Tool sequence — Pattern changes in tool usage order
  3. Tool distribution — Frequency changes in which tools are called
  4. Latency — p95 latency shifts
  5. Error rate — Proportion of failed runs
  6. Retry rate — How often the agent retries operations
  7. Loop depth — Changes in iterative reasoning patterns
  8. Verbosity ratio — Output length relative to input
  9. Output length — Total token count in responses
  10. Time to first tool — How quickly the agent starts using tools
  11. Semantic drift — Heuristic clustering of output semantics
  12. Tool transitions — Changes in tool-to-tool call patterns

Each dimension is weighted based on your agent's inferred use case (e.g., customer support vs. code generation).


Feedback Loop

Driftbase learns from your drift verdicts. When you dismiss a drift alert as expected behavior, the system automatically downweights that dimension for future comparisons on that agent.

# Dismiss a verdict (downweight dimensions)
driftbase feedback <verdict_id> --dismiss \
  --reason "Tool sequence change is intentional" \
  --dimensions "decision_drift,tool_sequence"

# View weight adjustments for an agent
driftbase feedback <agent_id> --impact

Weight decay formula: Each dismissal reduces a dimension's weight by 30%:

  • 1 dismiss → 70% of original weight
  • 2 dismisses → 49%
  • 3 dismisses → 34%
  • Floor: never below 10% (prevents complete suppression)

Per-agent isolation: Feedback for Agent A doesn't affect Agent B. Each agent learns independently.

See docs/feedback.md for full guide.


Observability

Driftbase emits drift scores as OTLP-compatible metrics for integration with Prometheus, Grafana, Datadog, etc.

# Metrics are emitted automatically on every diff
driftbase diff v1 v2

# Metrics written to ~/.driftbase/metrics.json
cat ~/.driftbase/metrics.json

Metrics emitted:

  • driftbase.drift.composite - overall drift score
  • driftbase.drift.{dimension} - per-dimension scores (12 dimensions)
  • driftbase.verdict - numeric (0=SHIP, 1=MONITOR, 2=REVIEW, 3=BLOCK)
  • driftbase.confidence_tier - statistical confidence (1/2/3)

Prometheus integration: Use Node Exporter textfile collector to scrape metrics.

Grafana dashboard: PromQL queries for drift over time, verdict distribution, high-drift dimensions.

See docs/otlp-metrics.md for Prometheus, Grafana, and Datadog integration guides.


CLI Commands

Core Commands

# Connect to Langfuse and import traces
driftbase connect

# Detect drift automatically across all versions
driftbase diagnose

# Compare two specific versions
driftbase diff v1.0 v2.0

# View behavioral history over time
driftbase history

# Interactive setup guide
driftbase init

Advanced Commands

# Inspect individual runs
driftbase inspect <run-id>

# Export drift report as JSON
driftbase export --format json --output report.json

# Set up behavioral budgets
driftbase budgets set --dimension error_rate --threshold 0.05

# Prune old runs to save space
driftbase prune --before 2026-01-01

# Health check
driftbase doctor

CI/CD Integration

Driftbase integrates seamlessly into deployment pipelines to catch behavioral regressions before production.

Output Formats

# Rich terminal output (default)
driftbase diff v1.0 v2.0

# JSON for programmatic consumption
driftbase diff v1.0 v2.0 --format=json

# Markdown for PR comments
driftbase diff v1.0 v2.0 --format=markdown

Exit Codes

  • Exit 0: SHIP or MONITOR verdicts (safe to deploy)
  • Exit 1: REVIEW or BLOCK verdicts (manual review required)

Quick Start: GitHub Actions

# .github/workflows/drift-check.yml
name: Drift Check

on: [pull_request]

jobs:
  drift:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - run: pip install driftbase
      - run: driftbase diff v1.2.3 v1.3.0 --ci
        env:
          DRIFTBASE_DB_PATH: ./runs.db

The --ci flag enables:

  • JSON output
  • Non-zero exit on drift
  • Compact formatting

Detailed Verdict Analysis

After a diff completes, use driftbase explain to see the full breakdown:

# Explain most recent verdict
driftbase explain

# Explain specific verdict by ID
driftbase explain abc-123-def

Shows:

  • Top 3 contributing dimensions with evidence
  • Confidence intervals and significance markers
  • Minimum Detectable Effects (MDEs)
  • Rollback target (for REVIEW/BLOCK verdicts)

PR Comment Integration

Post drift reports directly to pull requests:

- name: Generate drift report
  run: |
    OUTPUT=$(driftbase diff v1 v2 --format=markdown)
    echo "report<<EOF" >> $GITHUB_OUTPUT
    echo "$OUTPUT" >> $GITHUB_OUTPUT
    echo "EOF" >> $GITHUB_OUTPUT

- uses: actions/github-script@v7
  with:
    script: |
      github.rest.issues.createComment({
        issue_number: context.issue.number,
        owner: context.repo.owner,
        repo: context.repo.name,
        body: `${{ steps.drift.outputs.report }}`
      })

Result: GitHub-flavored markdown table with top contributors, MDEs, and rollback targets.

Rollback on Regression

VERDICT=$(driftbase diff v1 v2 --format=json | jq -r .verdict)
ROLLBACK=$(driftbase diff v1 v2 --format=json | jq -r .rollback_target)

if [ "$VERDICT" = "BLOCK" ]; then
  echo "Behavioral regression detected. Rolling back to $ROLLBACK"
  kubectl set image deployment/agent agent=$ROLLBACK
  exit 1
fi

See docs/ci-integration.md for GitLab CI, CircleCI, and advanced patterns.


Use Cases

1. Pre-Deploy Drift Gate (GitHub Action)

Add .github/workflows/drift-check.yml:

name: Drift Check

on:
  pull_request:
    branches: [main]

permissions:
  pull-requests: write
  contents: read

jobs:
  drift-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Run Driftbase drift check
        uses: driftbase-labs/driftbase-python/github-action@v1
        with:
          baseline-version: main
          current-version: ${{ github.head_ref }}
          fail-on-review: true
          github-token: ${{ secrets.GITHUB_TOKEN }}

Posts a color-coded drift report as a PR comment with verdict (SHIP/MONITOR/REVIEW/BLOCK) and dimension breakdown.

See github-action/README.md for full documentation.

2. Post-Deploy Monitoring

#!/bin/bash
# Daily drift check (cron: 0 9 * * *)

export LANGFUSE_PUBLIC_KEY=...
export LANGFUSE_SECRET_KEY=...

driftbase connect --since $(date -d '1 day ago' +%Y-%m-%d)
driftbase diagnose --alert-on-drift

3. Incident Response

When users report unexpected agent behavior:

# Pull latest traces and diagnose
driftbase connect --since 2026-03-01
driftbase diagnose

# Inspect specific problematic run
driftbase inspect <run-id>

# Compare current vs. last known good
driftbase diff v2.0-stable v2.1-current

Configuration

Driftbase works out of the box with zero configuration. Optional settings:

# Set custom DB path
export DRIFTBASE_DB_PATH=/path/to/runs.db

# Set default Langfuse host
export LANGFUSE_HOST=https://your-instance.com

# Configure cost tracking
export DRIFTBASE_RATE_PROMPT_1M=2.50
export DRIFTBASE_RATE_COMPLETION_1M=10.00

# Reproducibility and sampling (Phase 1 correctness features)
export DRIFTBASE_SEED=42                    # Random seed for reproducible drift reports (default: 42)
export DRIFTBASE_FINGERPRINT_LIMIT=5000     # Max runs per fingerprint (default: 5000)
export DRIFTBASE_BOOTSTRAP_ITERS=500        # Bootstrap iterations for confidence intervals (default: 500)

See docs/configuration.md for advanced settings.


Architecture

┌──────────────────────────────────────────────────────────────┐
│  YOUR AI AGENT                                               │
│  (instrumented with Langfuse via any framework)              │
└────────────────┬─────────────────────────────────────────────┘
                 │
                 │ traces
                 ▼
┌──────────────────────────────────────────────────────────────┐
│  LANGFUSE                                                    │
│  (observability platform)                                    │
└────────────────┬─────────────────────────────────────────────┘
                 │
                 │ driftbase connect
                 ▼
┌──────────────────────────────────────────────────────────────┐
│  DRIFTBASE                                                   │
│  ├─ Local SQLite DB (runs, fingerprints, epochs)            │
│  ├─ Drift analysis engine (12 dimensions)                   │
│  ├─ Baseline calibrator (auto-weights + thresholds)         │
│  ├─ Anomaly detector (multivariate outliers)                │
│  └─ Verdict engine (SHIP/MONITOR/REVIEW/BLOCK)              │
└──────────────────────────────────────────────────────────────┘

Key principle: Driftbase is NOT a tracing tool. It's a drift detection layer that reads existing traces from Langfuse.


Roadmap

Completed:

  • Langfuse connector with incremental sync
  • LangSmith connector
  • 12-dimension drift analysis
  • Progressive weight learning from labeled deployments
  • Statistical confidence tiers (TIER1/TIER2/TIER3)
  • GitHub Action with standalone + cloud modes
  • MCP server for Claude Desktop integration
  • 60-second offline demo

Deferred (requires Cloud API):

  • Privacy-first telemetry
  • Opt-in data contribution for moat building

Future:

  • Arize connector
  • Generic OTEL ingestion
  • Slack/PagerDuty alerting
  • Web dashboard (Cloud tier)

Development

# Clone repo
git clone https://github.com/driftbase-labs/driftbase-python
cd driftbase-python

# Install in editable mode with dev dependencies
pip install -e '.[dev]'

# Run tests
pytest tests/

# Run linter
ruff check .
ruff format .

FAQ

Do I need to change my agent code?

No. Driftbase reads existing Langfuse traces. Your agent continues using Langfuse exactly as before.

Where is my data stored?

All analysis runs locally. Traces are stored in ~/.driftbase/runs.db (SQLite). Nothing leaves your machine unless you explicitly push to a remote backend (Pro tier feature).

What if I don't have Langfuse yet?

Set up Langfuse first: langfuse.com/docs/get-started. It takes ~10 minutes to instrument your agent with Langfuse, then you can use Driftbase.

What if I don't have historical traces?

Use driftbase testset generate to create synthetic baseline data, or start collecting traces now and compare future versions.

How often should I sync?

  • Development: After every agent change
  • Production: Daily or on-deploy via CI/CD

Does this work with LangSmith?

Yes! Driftbase supports both Langfuse and LangSmith. Use:

driftbase connect langsmith --project my-agent

Arize and generic OTEL support are planned for future releases.

Is this free?

Yes. The OSS SDK is free forever. We'll offer a Pro tier (hosted web dashboard, real-time alerting, team features) in the future, but the local CLI will always be free.


Support


License

Apache 2.0. See LICENSE.


Built with ❤️ for AI engineers who want to ship with confidence.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

driftbase-0.15.1.tar.gz (474.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

driftbase-0.15.1-py3-none-any.whl (293.8 kB view details)

Uploaded Python 3

File details

Details for the file driftbase-0.15.1.tar.gz.

File metadata

  • Download URL: driftbase-0.15.1.tar.gz
  • Upload date:
  • Size: 474.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for driftbase-0.15.1.tar.gz
Algorithm Hash digest
SHA256 825658a4ab75e47353a32875226c95326bb9fd7779b68e4449d2033f1554f550
MD5 a295a101c9ee3073863964bddccc420c
BLAKE2b-256 30ed33b72d77b9dfba94d2a0a1864a5d9fd17cedd6f4dabd84d60363283f7de9

See more details on using hashes here.

Provenance

The following attestation bundles were made for driftbase-0.15.1.tar.gz:

Publisher: publish.yml on driftbase-labs/driftbase-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file driftbase-0.15.1-py3-none-any.whl.

File metadata

  • Download URL: driftbase-0.15.1-py3-none-any.whl
  • Upload date:
  • Size: 293.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for driftbase-0.15.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8de52f1893115a0f02f09941522ffb4c6e045cc2e073ff5cf86c23e5f964a7f0
MD5 3a420bbbe7a80e06d6db3b51e1ab3f8c
BLAKE2b-256 9e329f622a3b031ea2bc897b0d01fa79c03c26ae5904d15b5959e9ee8bfcddf8

See more details on using hashes here.

Provenance

The following attestation bundles were made for driftbase-0.15.1-py3-none-any.whl:

Publisher: publish.yml on driftbase-labs/driftbase-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page