Prevents untrusted data from triggering consequential actions in your agent.

These details have not been verified by PyPI

Project links

Project description

Agent Sleuth

Prevents untrusted data from triggering consequential actions in your agent.

Agent Sleuth is an in-process information-flow-control (IFC) library for LLM agents. It stops untrusted data (web pages, email bodies, tool outputs, retrieved documents) from driving consequential actions (sending email, writing files, posting to external services).

The mechanism is value-level provenance lineage tracked at the tool-I/O boundary — not taint-tracking through the model's forward pass. When an untrusted tool returns data, we fingerprint the specific values in it. When a later consequential ("sink") call's arguments carry those fingerprinted values — verbatim or via structured-field tracking — that is a deterministic, classifier-free provenance edge. A small policy fires: untrusted-origin value reaching a non-allowlisted external sink → block or confirm.

Deterministic, not a classifier. The guarantee is a value-lineage match, never an LLM judging intent.
Zero extra LLM calls on the common path.
Drop-in. Three lines, zero changes to your agent.
Audit-mode first. Observe for a week, then switch to enforce.

Install

pip install agent_sleuth                 # core, zero agent-framework deps
pip install 'agent_sleuth[langchain]'    # + LangChain callback handler
pip install 'agent_sleuth[config]'       # + YAML config loading
pip install 'agent_sleuth[dev]'          # + pytest/ruff

Three-line integration (raw / custom agent)

from agent_sleuth import Sleuth

sleuth = Sleuth(
    untrusted=["read_email", "fetch_url", "search_web"],
    consequential=["send_email", "write_file", "post_slack"],
    destinations=["me@myco.com"],   # your own channels = trusted egress
    mode="audit",                   # → "enforce" once you trust it
)
sleuth.reset(query="summarize my emails and send a report to my boss")

# wrap your tools (or pass sleuth.handler to a LangChain agent — see below)
fetch_url  = sleuth.track(fetch_url)
send_email = sleuth.track(send_email)

# ... run your agent ...
print(sleuth.report())

You can also skip the explicit lists entirely — Sleuth() uses name-based defaults (tools containing read/fetch/search/get/... are untrusted; send/write/post/delete/... are consequential).

LangChain (zero changes to your agent)

from agent_sleuth import Sleuth

sleuth = Sleuth(agent=your_langchain_agent, mode="audit")
result = sleuth.run("summarize my emails and send a report to my boss")
print(sleuth.report())

Sleuth.run() resets taint state, stashes the trusted query, and attaches the IFCCallbackHandler to your agent — no edits to your chain.

What a caught attack looks like

BLOCKED: send_email() called with tainted inputs
  Taint source: fetch_url (step 2, untrusted)
  Injected value detected in argument: to="attacker@evil.com"
  Lineage: fetch_url (step 2) → value "attacker@evil.com" → send_email.to
  Destination: attacker@evil.com (not allowlisted)
  Reason: untrusted-origin value reached a consequential sink
  Action: blocked, call halted

Modes

audit (default): detect + log + render the trace; never block.
enforce: raise TaintViolationError and halt the offending sink call.
confirm: surface the violation to a callback for an allow/deny decision before dispatch.

Honest coverage envelope (v0)

Sound on the verbatim/structured-exfil class. Zero extra LLM calls on the common path. Drop-in. Laundering (base64/paraphrase of a secret) and pure control-flow hijack (a sink call whose arguments carry no untrusted bytes) are explicitly out of scope for v0 — documented non-goals, not bugs. Control-flow integrity (the plan-allowlist) and a configurable allow/denylist with deny-over-allow precedence land in v1.

Attack class	v0
Verbatim exfiltration (untrusted value appears literally in sink arg)	✅ deterministic
Structured exfiltration (untrusted field → sink field)	✅ deterministic
Legit egress to your own channel (destination allowlist)	✅ allowed (no false positive)
Control-flow hijack (out-of-plan sink, no untrusted bytes)	❌ v1 (plan-allowlist)
Laundering (base64 / paraphrase / transform)	❌ v2+ (opt-in quarantine)

Benchmark

PYTHONPATH=. python benchmarks/agentdojo/run.py

A self-contained reproduction of AgentDojo-style indirect-injection tasks (real AgentDojo needs a live LLM + API keys; see the harness docstring for the thin real-AgentDojo wiring). Reports ASR (attack success rate) and utility per mode.

Develop

pip install -e '.[dev,langchain,config]'
pytest

See AGENT_SLEUTH_ARCHITECTURE.MD for the full design.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jul 6, 2026

0.0.1

Jun 17, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_sleuth-0.1.0.tar.gz (42.3 kB view details)

Uploaded Jul 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

agent_sleuth-0.1.0-py3-none-any.whl (26.0 kB view details)

Uploaded Jul 6, 2026 Python 3

File details

Details for the file agent_sleuth-0.1.0.tar.gz.

File metadata

Download URL: agent_sleuth-0.1.0.tar.gz
Upload date: Jul 6, 2026
Size: 42.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for agent_sleuth-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`1efd650bec68dcb410b27a5bac6d1528f85504a4d93f35862842e3097ae4e8a8`
MD5	`d66165091556614788e62f564596cba9`
BLAKE2b-256	`3d1eb540d645b775f49a5e94fe48bc17d09f8ac04e51b5b5ad2292ad1d132b46`

See more details on using hashes here.

File details

Details for the file agent_sleuth-0.1.0-py3-none-any.whl.

File metadata

Download URL: agent_sleuth-0.1.0-py3-none-any.whl
Upload date: Jul 6, 2026
Size: 26.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.0

File hashes

Hashes for agent_sleuth-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`01a00b00ae40d59600734fda287e5418988937f2d0d7222c4e1e14b801191d08`
MD5	`1847b0a5733c0ed9703933e455afe04b`
BLAKE2b-256	`8a76d474ab01eee715f01dacc9214c94325286075defc74e8d3f9f710e5617ba`

See more details on using hashes here.

agent-sleuth 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Agent Sleuth

Install

Three-line integration (raw / custom agent)

LangChain (zero changes to your agent)

What a caught attack looks like

Modes

Honest coverage envelope (v0)

Benchmark

Develop

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes