Skip to main content

CI for AI agents - behavioral fingerprinting and drift detection

Project description

Spooled — Behavioral CI for AI Agents

One prompt edit quietly turned this customer-support agent into a refund machine. Spooled caught it on the PR.

A PM asks for "a more helpful tone for frustrated customers." An engineer adds one sentence to the system prompt: "Resolve their issue when possible." Unit tests pass. The reviewer approves. The PR is ready to merge.

But the LLM now interprets "resolve" liberally. On complaint tickets, the agent stops escalating refund requests to humans and starts issuing refunds itself. The structure changed even though the prompt looked harmless.

Spooled diffs the agent's behavior against the committed baseline and posts this on the PR:

🚨 Merge blocked: agent now calls `issue_refund`

This tool was never observed in the baseline. It appears in
2 of 5 traces in this PR (~40%).

Triggered by a one-sentence change to the system prompt.

Caught content-blind — Spooled compared tool graphs, not language. It never saw a customer message or an LLM response.

Run it yourself in 60 seconds

pip install spooled-ai
spooled demo

Runs the entire scenario in your terminal — no API key, no setup, no files left behind. The variant agent differs from the baseline by exactly one line in the system prompt. The code is otherwise identical.

What It Does

Capture — wraps your LLM client and records the structural fingerprint of every agent run: which tools were called, in what order, how many times. Content-blind by architecture — prompts, customer data, and AI responses never leave your infrastructure.

Compare — diffs the current run against a committed baseline. Shows exactly what changed: tools added, tools removed, sequence reordered, token usage shifted.

Gate — posts a PR comment with the human-readable consequence as the headline. Blocks the merge if the policy says so. Resolution instructions included.

Install

pip install spooled-ai

Quick Start

import spooled
from spooled.wrappers import wrap_openai
from openai import OpenAI

spooled.init(agent_id="my_agent")
client = wrap_openai(OpenAI())

# Your existing agent code — unchanged. Every LLM call and tool call is captured automatically.
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Analyze this deal"}],
)

spooled.shutdown()

That's it. Every tool call is captured. The trace is saved to .spooled/traces/. The hash chain signs every interaction at capture time.

CI Integration

# .github/workflows/spooled.yml
- name: Generate traces
  run: python ci_runner.py
  env:
    OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

- name: Spooled behavioral check
  run: |
    pip install spooled-ai
    spooled ci compare .spooled/traces/*.jsonl \
      --baseline .github/baselines \
      --policy spooled-policy.yml \
      --enable-blocking

Example PR comment:

## ❌ Spooled Behavioral CI: FAIL
> Spooled Score: 59/100 (D) 🔴

> [!CAUTION]
> ## 🚨 Merge blocked: agent now calls `issue_refund`
>
> This tool was **never observed in the baseline**. It appears in
> **2 of 5** traces in this PR (~40%).

**5** traces analyzed  |  ✅ **3** passed  |  ❌ **2** policy failures

### Trace Results
| Agent          | Fingerprint     | Status        | Score |
|----------------|-----------------|---------------|-------|
| support_agent  | `4d893b5cef...` | ⚠️ Behavior change | 59 |

<details>
  <summary>🔧 Tool Changes (2 traces)</summary>

  - `issue_refund` added
  - `escalate_to_human` removed
</details>

What Spooled Catches

Change type Example Unit tests Spooled
Prompt tweak "Be concise" drops compliance tools ✅ Pass Behavior change
Model swap Model drops sanctions screening ✅ Pass Behavior change
Tool deprecation Agent proceeds without critical data ✅ Pass Behavior change
KB refresh Ticket response path changes ✅ Pass Behavior change
Schema migration Field rename breaks detection ✅ Pass Behavior change
Upstream degradation Retry paths appear in fingerprint ✅ Pass Behavior change

Content-Blind Architecture

Spooled never captures prompts, customer data, or AI responses. Only structural metadata: tool names, call sequence, token counts, timing, plus installation metadata (Spooled version, OS, active hook names, detected framework module names). This is enforced in code — content is stripped before the trace reaches disk. See docs/threat-model.md.

Supported Libraries

LLM Providers (explicit wrappers):

  • OpenAI (sync/async, streaming)
  • Anthropic (sync/async, streaming)

HTTP & Cloud (auto-instrumented via hooks):

  • AWS Bedrock
  • requests, httpx, aiohttp

Frameworks (callback handlers):

  • LangChain, LlamaIndex, AutoGen, CrewAI, LangGraph

Documentation

License

Proprietary.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spooled_ai-0.6.0.tar.gz (249.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spooled_ai-0.6.0-py3-none-any.whl (297.3 kB view details)

Uploaded Python 3

File details

Details for the file spooled_ai-0.6.0.tar.gz.

File metadata

  • Download URL: spooled_ai-0.6.0.tar.gz
  • Upload date:
  • Size: 249.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for spooled_ai-0.6.0.tar.gz
Algorithm Hash digest
SHA256 4473a447c838f12652df0d1c3c6ba055b09231beabc50791c455f0ad0a0d718e
MD5 21cc5ef3d075809f77995c431ca59bda
BLAKE2b-256 b0010064ee0f5768223bb9aebd70c9a93e761ec3a41e0cf0b6e1770138ef001e

See more details on using hashes here.

File details

Details for the file spooled_ai-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: spooled_ai-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 297.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for spooled_ai-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e1b38c050ac06a39613ff8200fc39a49a9af54db5f1e0a8fe9f162f69f447e47
MD5 7b96098f66ceaa082d6717faa3c68b84
BLAKE2b-256 dedf2b7dfdb2ad1f6bc3011dc0d1fa0b561ab9a44c76d4da41f78f068a0038e6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page