Skip to main content

Prevents untrusted data from triggering consequential actions in your agent.

Project description

Agent Sleuth

Prevents untrusted data from triggering consequential actions in your agent.

Agent Sleuth is an in-process information-flow-control (IFC) library for LLM agents. It stops untrusted data (web pages, email bodies, tool outputs, retrieved documents) from driving consequential actions (sending email, writing files, posting to external services).

The mechanism is value-level provenance lineage tracked at the tool-I/O boundarynot taint-tracking through the model's forward pass. When an untrusted tool returns data, we fingerprint the specific values in it. When a later consequential ("sink") call's arguments carry those fingerprinted values — verbatim or via structured-field tracking — that is a deterministic, classifier-free provenance edge. A small policy fires: untrusted-origin value reaching a non-allowlisted external sink → block or confirm.

  • Deterministic, not a classifier. The guarantee is a value-lineage match, never an LLM judging intent.
  • Zero extra LLM calls on the common path.
  • Drop-in. Three lines, zero changes to your agent.
  • Audit-mode first. Observe for a week, then switch to enforce.

Install

pip install agent_sleuth                 # core, zero agent-framework deps
pip install 'agent_sleuth[langchain]'    # + LangChain callback handler
pip install 'agent_sleuth[config]'       # + YAML config loading
pip install 'agent_sleuth[dev]'          # + pytest/ruff

Three-line integration (raw / custom agent)

from agent_sleuth import Sleuth

sleuth = Sleuth(
    untrusted=["read_email", "fetch_url", "search_web"],
    consequential=["send_email", "write_file", "post_slack"],
    destinations=["me@myco.com"],   # your own channels = trusted egress
    mode="audit",                   # → "enforce" once you trust it
)
sleuth.reset(query="summarize my emails and send a report to my boss")

# wrap your tools (or pass sleuth.handler to a LangChain agent — see below)
fetch_url  = sleuth.track(fetch_url)
send_email = sleuth.track(send_email)

# ... run your agent ...
print(sleuth.report())

You can also skip the explicit lists entirely — Sleuth() uses name-based defaults (tools containing read/fetch/search/get/... are untrusted; send/write/post/delete/... are consequential).

LangChain (zero changes to your agent)

from agent_sleuth import Sleuth

sleuth = Sleuth(agent=your_langchain_agent, mode="audit")
result = sleuth.run("summarize my emails and send a report to my boss")
print(sleuth.report())

Sleuth.run() resets taint state, stashes the trusted query, and attaches the IFCCallbackHandler to your agent — no edits to your chain.

What a caught attack looks like

BLOCKED: send_email() called with tainted inputs
  Taint source: fetch_url (step 2, untrusted)
  Injected value detected in argument: to="attacker@evil.com"
  Lineage: fetch_url (step 2) → value "attacker@evil.com" → send_email.to
  Destination: attacker@evil.com (not allowlisted)
  Reason: untrusted-origin value reached a consequential sink
  Action: blocked, call halted

Modes

  • audit (default): detect + log + render the trace; never block.
  • enforce: raise TaintViolationError and halt the offending sink call.
  • confirm: surface the violation to a callback for an allow/deny decision before dispatch.

Honest coverage envelope (v0)

Sound on the verbatim/structured-exfil class. Zero extra LLM calls on the common path. Drop-in. Laundering (base64/paraphrase of a secret) and pure control-flow hijack (a sink call whose arguments carry no untrusted bytes) are explicitly out of scope for v0 — documented non-goals, not bugs. Control-flow integrity (the plan-allowlist) and a configurable allow/denylist with deny-over-allow precedence land in v1.

Attack class v0
Verbatim exfiltration (untrusted value appears literally in sink arg) ✅ deterministic
Structured exfiltration (untrusted field → sink field) ✅ deterministic
Legit egress to your own channel (destination allowlist) ✅ allowed (no false positive)
Control-flow hijack (out-of-plan sink, no untrusted bytes) ❌ v1 (plan-allowlist)
Laundering (base64 / paraphrase / transform) ❌ v2+ (opt-in quarantine)

Benchmark

PYTHONPATH=. python benchmarks/agentdojo/run.py

A self-contained reproduction of AgentDojo-style indirect-injection tasks (real AgentDojo needs a live LLM + API keys; see the harness docstring for the thin real-AgentDojo wiring). Reports ASR (attack success rate) and utility per mode.

Develop

pip install -e '.[dev,langchain,config]'
pytest

See AGENT_SLEUTH_ARCHITECTURE.MD for the full design.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_sleuth-0.0.1.tar.gz (36.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_sleuth-0.0.1-py3-none-any.whl (22.6 kB view details)

Uploaded Python 3

File details

Details for the file agent_sleuth-0.0.1.tar.gz.

File metadata

  • Download URL: agent_sleuth-0.0.1.tar.gz
  • Upload date:
  • Size: 36.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for agent_sleuth-0.0.1.tar.gz
Algorithm Hash digest
SHA256 a8a2fe4362ccdfd7d6e14fd869572a81353ee3347666d783ed2ea4b7dcc5235b
MD5 43c450787c622e235d5338bdb1acecf6
BLAKE2b-256 21ae1049e17d090ed515629579c45233bab0c91a80a78a6711525fd2fc9c4e7e

See more details on using hashes here.

File details

Details for the file agent_sleuth-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: agent_sleuth-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 22.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for agent_sleuth-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 18d84cd31b0f75940180e99c4537ad2e8a137dc1134e8d7dbef88c0876c5083a
MD5 96a87f609e82c8fbb43263a077c10d2c
BLAKE2b-256 afdd0b2efa5335b727d309f7dc13ec1fd31ce7801f4641a7964404e59fa599d7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page