Skip to main content

CI for AI agents - behavioral fingerprinting and drift detection

Project description

Spooled — Behavioral CI for AI Agents

One prompt edit quietly turned this customer-support agent into a refund machine. Spooled caught it on the PR.

A PM asks for "a more helpful tone for frustrated customers." An engineer adds one sentence to the system prompt: "Resolve their issue when possible." Unit tests pass. The reviewer approves. The PR is ready to merge.

But the LLM now interprets "resolve" liberally. On complaint tickets, the agent stops escalating refund requests to humans and starts issuing refunds itself. The structure changed even though the prompt looked harmless.

Spooled diffs the agent's behavior against the committed baseline and posts this on the PR:

🚨 Merge blocked: agent now calls `issue_refund`

This tool was never observed in the baseline. It appears in
2 of 5 traces in this PR (~40%).

Triggered by a one-sentence change to the system prompt.

Caught content-blind — Spooled compared tool graphs, not language. It never saw a customer message or an LLM response.

Run it yourself in 60 seconds

pip install spooled-ai
spooled demo

Runs the entire scenario in your terminal — no API key, no setup, no files left behind. The variant agent differs from the baseline by exactly one line in the system prompt. The code is otherwise identical.

What It Does

Capture — wraps your LLM client and records the structural fingerprint of every agent run: which tools were called, in what order, how many times. Content-blind by architecture — prompts, customer data, and AI responses never leave your infrastructure.

Compare — diffs the current run against a committed baseline. Shows exactly what changed: tools added, tools removed, sequence reordered, token usage shifted.

Gate — posts a PR comment with the human-readable consequence as the headline. Blocks the merge if the policy says so. Resolution instructions included.

Install

pip install spooled-ai

Quick Start

import spooled
from spooled.wrappers import wrap_openai
from openai import OpenAI

spooled.init(agent_id="my_agent")
client = wrap_openai(OpenAI())

response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Analyze this deal"}],
    tools=MY_TOOLS,
)

spooled.shutdown()

That's it. Every tool call is captured. The trace is saved to .spooled/traces/. The hash chain signs every interaction at capture time.

CI Integration

# .github/workflows/spooled.yml
- name: Generate traces
  run: python ci_runner.py
  env:
    OPENAI_API_KEY: ${{ secrets.OPENAI_API_KEY }}

- name: Spooled behavioral check
  run: |
    pip install spooled-ai
    spooled ci compare .spooled/traces/*.jsonl \
      --baseline .github/baselines \
      --policy spooled-policy.yml \
      --enable-blocking

Example PR comment:

## ❌ Spooled Behavioral CI: FAIL
> Spooled Score: 59/100 (D) 🔴

> [!CAUTION]
> ## 🚨 Merge blocked: agent now calls `issue_refund`
>
> This tool was **never observed in the baseline**. It appears in
> **2 of 5** traces in this PR (~40%).

**5** traces analyzed  |  ✅ **3** passed  |  ❌ **2** policy failures

### Trace Results
| Agent          | Fingerprint     | Status        | Score |
|----------------|-----------------|---------------|-------|
| support_agent  | `4d893b5cef...` | ⚠️ Behavior change | 59 |

<details>
  <summary>🔧 Tool Changes (2 traces)</summary>

  - `issue_refund` added
  - `escalate_to_human` removed
</details>

What Spooled Catches

Change type Example Unit tests Spooled
Prompt tweak "Be concise" drops compliance tools ✅ Pass Behavior change
Model swap Model drops sanctions screening ✅ Pass Behavior change
Tool deprecation Agent proceeds without critical data ✅ Pass Behavior change
KB refresh Ticket response path changes ✅ Pass Behavior change
Schema migration Field rename breaks detection ✅ Pass Behavior change
Upstream degradation Retry paths appear in fingerprint ✅ Pass Behavior change

Content-Blind Architecture

Spooled never captures prompts, customer data, or AI responses. Only structural metadata: tool names, call sequence, token counts, timing. This is enforced in code — content is stripped before the trace reaches disk.

Supported Libraries

LLM Providers (explicit wrappers):

  • OpenAI (sync/async, streaming)
  • Anthropic (sync/async, streaming)

HTTP & Cloud (auto-instrumented via hooks):

  • AWS Bedrock
  • requests, httpx, aiohttp

Frameworks (callback handlers):

  • LangChain, LlamaIndex, AutoGen, CrewAI, LangGraph

Documentation

License

Proprietary.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spooled_ai-0.5.1.tar.gz (243.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spooled_ai-0.5.1-py3-none-any.whl (288.5 kB view details)

Uploaded Python 3

File details

Details for the file spooled_ai-0.5.1.tar.gz.

File metadata

  • Download URL: spooled_ai-0.5.1.tar.gz
  • Upload date:
  • Size: 243.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for spooled_ai-0.5.1.tar.gz
Algorithm Hash digest
SHA256 ab91577f1177b301024234c65c97615ffcce9a378e0f579cc1b91bb87ca52dca
MD5 68c0b0d7114620c10ac4cd17daf85c1d
BLAKE2b-256 d00daceb49604449dd1cd7d1d6e7c546a1d6da92422708e7d09b556deb55be09

See more details on using hashes here.

File details

Details for the file spooled_ai-0.5.1-py3-none-any.whl.

File metadata

  • Download URL: spooled_ai-0.5.1-py3-none-any.whl
  • Upload date:
  • Size: 288.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for spooled_ai-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 44160568c86435f18931b9a1e2871f1931ff93d7cc49e0c2370440866d4556ff
MD5 9666f89e865e3ce0349eb4cae81bd40e
BLAKE2b-256 5b979595ce4272c5a6426cdc01b009525792c34cfc3b232d3e0c39db0baa6fdd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page