Lightweight taint tracking for LLM pipelines — label secrets at entry, block them at unsafe sinks
Project description
llm-taint
Lightweight taint tracking for LLM pipelines.
Label secrets (API keys, tokens, passwords) at the point they enter your system. Any attempt to send a tainted value to an unsafe sink — logs, HTTP responses, tool outputs — raises an exception immediately, before the data ever leaves.
import os
from llm_taint import taint, check_sink, scrub
api_key = taint(os.environ["OPENAI_API_KEY"], label="openai_api_key")
# TaintedStr is a transparent str subclass — works everywhere str does
assert isinstance(api_key, str)
assert api_key == os.environ["OPENAI_API_KEY"]
# This raises TaintViolationError: secret 'openai_api_key' reached sink 'log'
check_sink(api_key, sink="log")
# Safe representation for logging
print(scrub(api_key)) # "[REDACTED:openai_api_key]"
Zero required dependencies. Pure Python stdlib.
Why this matters for LLM applications
LLM applications are uniquely exposed to secret leakage:
- Tool outputs are injected directly into the model context — a tainted value in a tool result means the key is in the model's input window.
- Error messages from failed API calls often contain the request headers, including auth tokens.
- Logging in async agent loops is verbose by necessity; one f-string away from leaking a key.
- Prompt injection attacks may try to exfiltrate secrets by causing them to appear in generated text.
Classical taint tracking from compiler security research, applied to the LLM stack.
Installation
pip install llm-taint
Usage
Labeling secrets
from llm_taint import taint
# At startup / config load — before any processing
openai_key = taint(os.environ["OPENAI_API_KEY"], label="openai_api_key")
db_password = taint(config["db_password"], label="db_password")
Checking sinks
from llm_taint import check_sink
# Before logging any value that might be tainted
user_input = request.json["message"]
check_sink(user_input, sink="log") # safe if untainted
# Before including values in tool results
check_sink(tool_output, sink="tool_result") # raises if tainted
# Unsafe sinks (raise on tainted input):
# "log", "http_response", "tool_result", "error_message", "websocket"
# Safe sinks (always allowed):
# "llm_prompt", "vault", "encrypted"
Scrubbing for safe output
from llm_taint import scrub, scrub_dict
# Single value
logger.info("Using key: %s", scrub(api_key)) # "Using key: [REDACTED:openai_api_key]"
# Whole config dict — safe to log
safe_config = scrub_dict({"api_key": api_key, "model": "gpt-4"})
logger.debug("Config: %s", safe_config)
Automatic log scrubbing
Install the filter once at startup — all log records are scrubbed automatically from that point on:
from llm_taint.logger import install_taint_filter
install_taint_filter() # call before any logging
import logging
logger = logging.getLogger("myapp")
api_key = taint("sk-abc123", label="openai_key")
logger.info("Using key: %s", api_key)
# Output: "Using key: [REDACTED:openai_key]"
Environment variable tainting
The POSIX problem: on Linux/macOS, os.environ stores bytes internally and strips the TaintedStr subclass on every read. Use taint_env_secrets + get_tainted_env to work around this:
import os
from llm_taint import taint_env_secrets, get_tainted_env
# Call once at startup
taint_env_secrets(dict(os.environ))
# Later — use get_tainted_env instead of os.environ for sensitive vars
key = get_tainted_env("OPENAI_API_KEY")
assert isinstance(key, TaintedStr) # True, even on Linux/macOS
taint_env_secrets automatically taints 25+ common secret env var names (OpenAI, Anthropic, AWS, Stripe, database URLs, etc.). Add your own:
from llm_taint import add_secret_env_key
add_secret_env_key("MY_COMPANY_API_KEY")
Registering custom sinks
from llm_taint import add_safe_sink, add_unsafe_sink
add_unsafe_sink("kafka_topic") # treat as unsafe
add_safe_sink("hsm_module") # treat as safe
How it works
TaintedStr is a str subclass that carries a _taint_label attribute. It is transparent to all normal string operations — isinstance, equality, concatenation, formatting — but the label travels with it.
os.environ["API_KEY"] ──taint()──▶ TaintedStr("sk-...", label="api_key")
│
┌───────────────────┼───────────────────┐
▼ ▼ ▼
safe sink unsafe sink scrub()
(vault/encrypted) (log/http) "[REDACTED:...]"
✓ allowed ✗ TaintViolation
The POSIX env registry (_env_taint_registry) is an in-process dict that survives the os.environ bytes round-trip — it's the authoritative source for tainted env vars on Linux/macOS.
Built-in unsafe sinks
| Sink | Rationale |
|---|---|
log |
Secrets must never appear in log files |
http_response |
Secrets must never be returned to callers |
tool_result |
Tool outputs are injected into model context |
error_message |
Error strings often end up in logs or responses |
websocket |
Streaming output to clients |
Running tests
pip install llm-taint[dev]
pytest
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_taint-0.1.0.tar.gz.
File metadata
- Download URL: llm_taint-0.1.0.tar.gz
- Upload date:
- Size: 15.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: Hatch/1.16.5 cpython/3.12.3 HTTPX/0.28.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bfd2aa99bd6b1e441c0b1e10d79ae94d78c03c84855a70bc6f51e9bb6c40a1aa
|
|
| MD5 |
17070399c02c1ea953fcc1577cf8eebc
|
|
| BLAKE2b-256 |
f057d4306f098ce5bb935acd8e8b76ecf0cd27562d5393384ab0287122e8b3cd
|
File details
Details for the file llm_taint-0.1.0-py3-none-any.whl.
File metadata
- Download URL: llm_taint-0.1.0-py3-none-any.whl
- Upload date:
- Size: 10.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: Hatch/1.16.5 cpython/3.12.3 HTTPX/0.28.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
922705480d7365cb9c046289bd2e775c597d326fba16df16ed4742e2ad6fffdf
|
|
| MD5 |
19c579606ed31299dcb40be3a9c7a782
|
|
| BLAKE2b-256 |
8046cd5f6e391850de46f0d01ce85995b394f14692d673f2071b44caca19f797
|