What your LLM sees, stays safe.
Project description
envault
What your LLM sees, stays safe.
Every time you paste code into an LLM, send a document to an API, or build an agent that reads files — there is a non-zero chance that API keys, database passwords, and personal data travel with the content. This happens constantly and mostly by accident.
envault sits between your application and any LLM API. Before text leaves your system, it detects and redacts sensitive content — replacing secrets with typed, numbered placeholders stored in an ephemeral in-memory vault. The process is fully reversible: if you need original values back, the vault restores them.
"postgres://admin:hunter2@prod.db.internal:5432/users"
↓ envault
"postgres://[DB_USER_1]:[DB_PASSWORD_1]@[HOSTNAME_1]:5432/[DB_NAME_1]"
Why envault?
- Zero infrastructure — pure Python, no external services, no database, no cloud dependency
- Reversible — redaction is stateful; original values can be restored from the vault after the LLM responds
- Model-agnostic — works with Anthropic, OpenAI, or any LLM SDK
- Layered detection — fast regex patterns first, Shannon entropy heuristics second, optional spaCy NLP third
- Non-blocking — envault informs and redacts; the caller decides what to do next
- Three surfaces — use it as a Python library, a shell pipe, or an MCP server for agent pipelines
Installation
# Core library + CLI (no optional deps)
pip install envaultx
# With the Anthropic wrapper
pip install "envault[anthropic]"
# With the OpenAI wrapper
pip install "envault[openai]"
# With NLP-based PII detection (spaCy)
pip install "envault[nlp]"
python -m spacy download en_core_web_sm
# With encrypted vault serialization
pip install "envault[crypto]"
# Everything
pip install "envault[all]"
Requires Python 3.9+.
Quickstart
Python library
from envault import Envault
ev = Envault()
# Inspect what would be redacted — without modifying anything
result = ev.scan("My key is sk-proj-abc123XYZdef456GHI789jkl012MN")
print(result.risk_level) # "high"
print(result.has_secrets) # True
print(result.summary) # {"OPENAI_KEY": 1}
# Redact — replace secrets with typed placeholders
safe, vault = ev.redact(
"Connect to postgres://admin:hunter2@prod.db.internal:5432/users"
)
# safe → "Connect to postgres://[DB_USER_1]:[DB_PASSWORD_1]@[HOSTNAME_1]:5432/[DB_NAME_1]"
# vault → Vault(4 entries)
# Send `safe` to the LLM...
# Restore — put original values back from an LLM response
restored = ev.restore("The password [DB_PASSWORD_1] is weak.", vault)
# → "The password hunter2 is weak."
# Sanitize — one-way, no vault, no restoration possible
clean = ev.sanitize("Contact me at alice@example.com or +1-555-867-5309")
# → "Contact me at [REDACTED] or [REDACTED]"
Drop-in SDK wrappers
Replace your existing Anthropic or OpenAI client with AnthropicVault / OpenAIVault. The API is identical — envault intercepts every message before it leaves your process.
import anthropic
from envault.wrappers import AnthropicVault
client = AnthropicVault(
anthropic_client=anthropic.Anthropic(api_key="..."),
on_detection=lambda result: print(f"Redacted: {result.summary}"),
)
response = client.messages.create(
model="claude-sonnet-4-6",
max_tokens=1024,
messages=[{
"role": "user",
"content": "Review this config: DB_PASSWORD=hunter2 HOST=prod.internal"
}],
)
# Secrets were automatically redacted before the API call.
# Access the vault from the response if you need to restore values:
vault = response._vault
import openai
from envault.wrappers import OpenAIVault
client = OpenAIVault(openai_client=openai.OpenAI(api_key="..."))
# Usage is identical to openai.OpenAI
CLI
Pipe any text through envault in a shell script or CI/CD pipeline.
# Scan a file and report detections (exits 1 if secrets found)
envault scan --file config.py
envault scan --file .env --format json
# Redact via pipe — safe to feed directly into another tool
cat config.py | envault redact --stdin | llm-tool --prompt "explain this"
# Save the vault so you can restore later
envault redact --file document.txt --output safe.txt --vault-out vault.enc --vault-password "$SESSION_KEY"
# Restore placeholders in an LLM response
envault restore --vault vault.enc --vault-password "$SESSION_KEY" --file response.txt
# One-way sanitization (no vault written)
cat user_data.csv | envault redact --stdin --sanitize > safe_data.csv
# List all built-in detection patterns
envault patterns
envault patterns --format json
Exit codes: 0 = clean / success · 1 = secrets detected · 2 = I/O error · 3 = config error · 4 = invalid arguments
MCP server
Run envault as an MCP server so any agent or tool that speaks the Model Context Protocol can call it directly.
envault mcp # stdio transport (Claude Desktop, default)
envault mcp --transport http --port 8080 # HTTP transport
Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):
{
"mcpServers": {
"envault": {
"command": "envault",
"args": ["mcp"]
}
}
}
Available MCP tools:
| Tool | Description |
|---|---|
redact_text |
Redact secrets from text, returns a session_id for later restoration |
scan_text |
Inspect detections without modifying the text |
scan_external_content |
Stricter scan for web/file content — also detects prompt injection |
restore_text |
Restore placeholders in an LLM response using a session_id |
list_vault_sessions |
List active in-memory vault sessions |
clear_vault_session |
Permanently clear a session from memory |
Sessions are in-memory only and expire after 1 hour of inactivity.
What envault detects
Layer 1 — Pattern matching (always on)
Regex detection of well-known secret formats. Every match includes a confidence score.
| Category | Example | Confidence |
|---|---|---|
OPENAI_KEY |
sk-proj-... |
1.0 |
ANTHROPIC_KEY |
sk-ant-api03-... |
1.0 |
AWS_ACCESS_KEY |
AKIA... |
1.0 |
AWS_SECRET_KEY |
40-char base64 adjacent to AWS context | 0.95 |
GITHUB_TOKEN |
ghp_..., ghs_... |
1.0 |
STRIPE_KEY |
sk_live_..., sk_test_... |
1.0 |
STRIPE_WEBHOOK |
whsec_... |
1.0 |
GOOGLE_API_KEY |
AIza... |
1.0 |
SLACK_TOKEN |
xoxb-..., xoxp-... |
1.0 |
SLACK_WEBHOOK |
https://hooks.slack.com/services/... |
1.0 |
SENDGRID_KEY |
SG.... |
1.0 |
TWILIO_SID |
AC + 32 hex chars |
1.0 |
JWT_TOKEN |
Three base64url segments | 0.95 |
PRIVATE_KEY_PEM |
-----BEGIN * PRIVATE KEY----- |
1.0 |
DB_CONNECTION_STRING |
postgres://user:pass@host/db |
0.95 |
DB_USER / DB_PASSWORD |
Components of a connection string | 0.9 / 0.95 |
BEARER_TOKEN |
Authorization: Bearer ... |
0.9 |
BASIC_AUTH |
Authorization: Basic ... |
0.95 |
EMAIL |
RFC 5322 email addresses | 0.85 |
PHONE_NUMBER |
E.164 and common national formats | 0.8 |
CREDIT_CARD |
Luhn-validated 13–19 digit sequences | 0.95 |
SSN |
US Social Security Numbers | 0.95 |
IBAN |
International Bank Account Numbers | 0.9 |
IP_ADDRESS_PRIVATE |
10.x, 192.168.x, 172.16-31.x |
0.7 |
HOSTNAME_INTERNAL |
.internal, .local, .corp hostnames |
0.75 |
Layer 2 — Entropy heuristics (always on)
Detects secrets that don't match known formats by computing Shannon entropy. Confidence is adjusted by context:
- Variable name contains
key,secret,token,password→ +0.2 - Right-hand side of an assignment → +0.1
- Looks like a UUID → −0.3 (likely not a secret)
Layer 3 — NLP / spaCy (opt-in)
Enable with nlp=True or --nlp flag. Requires pip install "envault[nlp]".
Detects PII in natural language prose: names in personal context, physical addresses, phone numbers and emails that Layer 1 misses due to non-standard formatting.
Prompt injection detection
scan_external() and the scan_external_content MCP tool apply stricter rules for content fetched from external sources (web pages, uploaded files, API responses). Any content containing imperative AI-directed instructions ("ignore previous instructions", "you are now", "forget everything") is flagged as PROMPT_INJECTION.
How the vault works
Original text: "api_key = sk-proj-abc123..."
↓ redact()
Redacted text: "api_key = [OPENAI_KEY_1]"
Vault (in-memory): { "[OPENAI_KEY_1]": "sk-proj-abc123..." }
LLM response: "The key [OPENAI_KEY_1] has been rotated."
↓ restore()
Restored: "The key sk-proj-abc123... has been rotated."
- The vault lives only in memory — nothing is ever written to disk by envault itself
- The same secret value always maps to the same placeholder within a session (deduplication)
- Placeholders are typed and numbered:
[OPENAI_KEY_1],[EMAIL_3],[DB_PASSWORD_1] - For cross-process use, serialize the vault to encrypted bytes (requires
envault[crypto]):
# Encrypt
encrypted = vault.to_encrypted_bytes(password="session-secret")
# Decrypt in another process
from envault import Vault
restored_vault = Vault.from_encrypted_bytes(encrypted, password="session-secret")
Configuration
envault looks for config in this order:
ENVAULT_CONFIGenvironment variable (path to a TOML file).envault.tomlin the current directory~/.config/envault/config.toml
Copy .envault.toml.example to .envault.toml to get started:
[detection]
threshold = 0.5 # redact anything with confidence >= this
nlp = false # enable spaCy NLP layer
exclude_categories = [] # e.g. ["IP_ADDRESS_PRIVATE", "PHONE_NUMBER"]
[entropy]
min_string_length = 20
base64_threshold = 4.5
hex_threshold = 3.5
[mcp]
transport = "stdio"
port = 8080
session_timeout_minutes = 60
All values can be overridden with environment variables:
| Variable | Effect |
|---|---|
ENVAULT_THRESHOLD |
Confidence threshold (e.g. 0.7) |
ENVAULT_NLP |
Enable NLP layer (true / false) |
ENVAULT_EXCLUDE |
Comma-separated categories to skip |
ENVAULT_MCP_PORT |
MCP HTTP port |
What envault is not
- Not a security gateway — envault redacts and reports; it does not block requests
- Not persistent storage — the vault is session-scoped and in-memory only
- Not a secrets manager — use Vault, AWS Secrets Manager, etc. for production secret storage
- Not a git scanner — use TruffleHog or gitleaks for history scanning
- Not a network proxy — envault operates on text in your process, not at the network layer
Development
git clone https://github.com/pratiksonigra/envault
cd envault
pip install -e ".[dev,all]"
pytest
pytest --cov=envault --cov-report=term-missing
Tests are organized per module in tests/. Fixtures in tests/fixtures/ cover clean samples, known-secret samples (with secrets at documented positions), and edge cases (Unicode, empty input, long lines).
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file envaultx-0.1.0.tar.gz.
File metadata
- Download URL: envaultx-0.1.0.tar.gz
- Upload date:
- Size: 33.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fdcf83ddb9ff35b8bef9e1f6fb79732149f8f44cc6db9d5d4a5a462e36676904
|
|
| MD5 |
1f5ae987d1a6fee598f90d95edbe95cf
|
|
| BLAKE2b-256 |
30815861fbda3e83a5e4b0f449d08e719c0055b0b4490026527c3b671444aa31
|
File details
Details for the file envaultx-0.1.0-py3-none-any.whl.
File metadata
- Download URL: envaultx-0.1.0-py3-none-any.whl
- Upload date:
- Size: 20.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
20259c270011abcd44789c7f236eaff3eafeb3a97975e1283f3247c6dc2f075f
|
|
| MD5 |
66c425dccccd12e6b4a5df57110e4f48
|
|
| BLAKE2b-256 |
cb28f8f1bafefc292931a2f50835350a82866b37af04252f00aefb3557f7e57c
|