Skip to main content

What your LLM sees, stays safe.

Project description

envault

What your LLM sees, stays safe.

Every time you paste code into an LLM, send a document to an API, or build an agent that reads files — there is a non-zero chance that API keys, database passwords, and personal data travel with the content. This happens constantly and mostly by accident.

envault sits between your application and any LLM API. Before text leaves your system, it detects and redacts sensitive content — replacing secrets with typed, numbered placeholders stored in an ephemeral in-memory vault. The process is fully reversible: if you need original values back, the vault restores them.

"postgres://admin:hunter2@prod.db.internal:5432/users"
                           ↓ envault
"postgres://[DB_USER_1]:[DB_PASSWORD_1]@[HOSTNAME_1]:5432/[DB_NAME_1]"

Why envault?

  • Zero infrastructure — pure Python, no external services, no database, no cloud dependency
  • Reversible — redaction is stateful; original values can be restored from the vault after the LLM responds
  • Model-agnostic — works with Anthropic, OpenAI, or any LLM SDK
  • Layered detection — fast regex patterns first, Shannon entropy heuristics second, optional spaCy NLP third
  • Non-blocking — envault informs and redacts; the caller decides what to do next
  • Three surfaces — use it as a Python library, a shell pipe, or an MCP server for agent pipelines

Installation

# Core library + CLI (no optional deps)
pip install envaultx

# With the Anthropic wrapper
pip install "envault[anthropic]"

# With the OpenAI wrapper
pip install "envault[openai]"

# With NLP-based PII detection (spaCy)
pip install "envault[nlp]"
python -m spacy download en_core_web_sm

# With encrypted vault serialization
pip install "envault[crypto]"

# Everything
pip install "envault[all]"

Requires Python 3.9+.


Quickstart

Python library

from envault import Envault

ev = Envault()

# Inspect what would be redacted — without modifying anything
result = ev.scan("My key is sk-proj-abc123XYZdef456GHI789jkl012MN")
print(result.risk_level)    # "high"
print(result.has_secrets)   # True
print(result.summary)       # {"OPENAI_KEY": 1}
# Redact — replace secrets with typed placeholders
safe, vault = ev.redact(
    "Connect to postgres://admin:hunter2@prod.db.internal:5432/users"
)
# safe  → "Connect to postgres://[DB_USER_1]:[DB_PASSWORD_1]@[HOSTNAME_1]:5432/[DB_NAME_1]"
# vault → Vault(4 entries)

# Send `safe` to the LLM...

# Restore — put original values back from an LLM response
restored = ev.restore("The password [DB_PASSWORD_1] is weak.", vault)
# → "The password hunter2 is weak."
# Sanitize — one-way, no vault, no restoration possible
clean = ev.sanitize("Contact me at alice@example.com or +1-555-867-5309")
# → "Contact me at [REDACTED] or [REDACTED]"

Drop-in SDK wrappers

Replace your existing Anthropic or OpenAI client with AnthropicVault / OpenAIVault. The API is identical — envault intercepts every message before it leaves your process.

import anthropic
from envault.wrappers import AnthropicVault

client = AnthropicVault(
    anthropic_client=anthropic.Anthropic(api_key="..."),
    on_detection=lambda result: print(f"Redacted: {result.summary}"),
)

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{
        "role": "user",
        "content": "Review this config: DB_PASSWORD=hunter2 HOST=prod.internal"
    }],
)
# Secrets were automatically redacted before the API call.
# Access the vault from the response if you need to restore values:
vault = response._vault
import openai
from envault.wrappers import OpenAIVault

client = OpenAIVault(openai_client=openai.OpenAI(api_key="..."))
# Usage is identical to openai.OpenAI

CLI

Pipe any text through envault in a shell script or CI/CD pipeline.

# Scan a file and report detections (exits 1 if secrets found)
envault scan --file config.py
envault scan --file .env --format json

# Redact via pipe — safe to feed directly into another tool
cat config.py | envault redact --stdin | llm-tool --prompt "explain this"

# Save the vault so you can restore later
envault redact --file document.txt --output safe.txt --vault-out vault.enc --vault-password "$SESSION_KEY"

# Restore placeholders in an LLM response
envault restore --vault vault.enc --vault-password "$SESSION_KEY" --file response.txt

# One-way sanitization (no vault written)
cat user_data.csv | envault redact --stdin --sanitize > safe_data.csv

# List all built-in detection patterns
envault patterns
envault patterns --format json

Exit codes: 0 = clean / success · 1 = secrets detected · 2 = I/O error · 3 = config error · 4 = invalid arguments

MCP server

Run envault as an MCP server so any agent or tool that speaks the Model Context Protocol can call it directly.

envault mcp                              # stdio transport (Claude Desktop, default)
envault mcp --transport http --port 8080 # HTTP transport

Claude Desktop config (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "envault": {
      "command": "envault",
      "args": ["mcp"]
    }
  }
}

Available MCP tools:

Tool Description
redact_text Redact secrets from text, returns a session_id for later restoration
scan_text Inspect detections without modifying the text
scan_external_content Stricter scan for web/file content — also detects prompt injection
restore_text Restore placeholders in an LLM response using a session_id
list_vault_sessions List active in-memory vault sessions
clear_vault_session Permanently clear a session from memory

Sessions are in-memory only and expire after 1 hour of inactivity.


What envault detects

Layer 1 — Pattern matching (always on)

Regex detection of well-known secret formats. Every match includes a confidence score.

Category Example Confidence
OPENAI_KEY sk-proj-... 1.0
ANTHROPIC_KEY sk-ant-api03-... 1.0
AWS_ACCESS_KEY AKIA... 1.0
AWS_SECRET_KEY 40-char base64 adjacent to AWS context 0.95
GITHUB_TOKEN ghp_..., ghs_... 1.0
STRIPE_KEY sk_live_..., sk_test_... 1.0
STRIPE_WEBHOOK whsec_... 1.0
GOOGLE_API_KEY AIza... 1.0
SLACK_TOKEN xoxb-..., xoxp-... 1.0
SLACK_WEBHOOK https://hooks.slack.com/services/... 1.0
SENDGRID_KEY SG.... 1.0
TWILIO_SID AC + 32 hex chars 1.0
JWT_TOKEN Three base64url segments 0.95
PRIVATE_KEY_PEM -----BEGIN * PRIVATE KEY----- 1.0
DB_CONNECTION_STRING postgres://user:pass@host/db 0.95
DB_USER / DB_PASSWORD Components of a connection string 0.9 / 0.95
BEARER_TOKEN Authorization: Bearer ... 0.9
BASIC_AUTH Authorization: Basic ... 0.95
EMAIL RFC 5322 email addresses 0.85
PHONE_NUMBER E.164 and common national formats 0.8
CREDIT_CARD Luhn-validated 13–19 digit sequences 0.95
SSN US Social Security Numbers 0.95
IBAN International Bank Account Numbers 0.9
IP_ADDRESS_PRIVATE 10.x, 192.168.x, 172.16-31.x 0.7
HOSTNAME_INTERNAL .internal, .local, .corp hostnames 0.75

Layer 2 — Entropy heuristics (always on)

Detects secrets that don't match known formats by computing Shannon entropy. Confidence is adjusted by context:

  • Variable name contains key, secret, token, password → +0.2
  • Right-hand side of an assignment → +0.1
  • Looks like a UUID → −0.3 (likely not a secret)

Layer 3 — NLP / spaCy (opt-in)

Enable with nlp=True or --nlp flag. Requires pip install "envault[nlp]".

Detects PII in natural language prose: names in personal context, physical addresses, phone numbers and emails that Layer 1 misses due to non-standard formatting.

Prompt injection detection

scan_external() and the scan_external_content MCP tool apply stricter rules for content fetched from external sources (web pages, uploaded files, API responses). Any content containing imperative AI-directed instructions ("ignore previous instructions", "you are now", "forget everything") is flagged as PROMPT_INJECTION.


How the vault works

Original text:    "api_key = sk-proj-abc123..."
                              ↓ redact()
Redacted text:    "api_key = [OPENAI_KEY_1]"

Vault (in-memory): { "[OPENAI_KEY_1]": "sk-proj-abc123..." }

LLM response:     "The key [OPENAI_KEY_1] has been rotated."
                              ↓ restore()
Restored:         "The key sk-proj-abc123... has been rotated."
  • The vault lives only in memory — nothing is ever written to disk by envault itself
  • The same secret value always maps to the same placeholder within a session (deduplication)
  • Placeholders are typed and numbered: [OPENAI_KEY_1], [EMAIL_3], [DB_PASSWORD_1]
  • For cross-process use, serialize the vault to encrypted bytes (requires envault[crypto]):
# Encrypt
encrypted = vault.to_encrypted_bytes(password="session-secret")

# Decrypt in another process
from envault import Vault
restored_vault = Vault.from_encrypted_bytes(encrypted, password="session-secret")

Configuration

envault looks for config in this order:

  1. ENVAULT_CONFIG environment variable (path to a TOML file)
  2. .envault.toml in the current directory
  3. ~/.config/envault/config.toml

Copy .envault.toml.example to .envault.toml to get started:

[detection]
threshold = 0.5                        # redact anything with confidence >= this
nlp = false                            # enable spaCy NLP layer
exclude_categories = []                # e.g. ["IP_ADDRESS_PRIVATE", "PHONE_NUMBER"]

[entropy]
min_string_length = 20
base64_threshold = 4.5
hex_threshold = 3.5

[mcp]
transport = "stdio"
port = 8080
session_timeout_minutes = 60

All values can be overridden with environment variables:

Variable Effect
ENVAULT_THRESHOLD Confidence threshold (e.g. 0.7)
ENVAULT_NLP Enable NLP layer (true / false)
ENVAULT_EXCLUDE Comma-separated categories to skip
ENVAULT_MCP_PORT MCP HTTP port

What envault is not

  • Not a security gateway — envault redacts and reports; it does not block requests
  • Not persistent storage — the vault is session-scoped and in-memory only
  • Not a secrets manager — use Vault, AWS Secrets Manager, etc. for production secret storage
  • Not a git scanner — use TruffleHog or gitleaks for history scanning
  • Not a network proxy — envault operates on text in your process, not at the network layer

Development

git clone https://github.com/pratiksonigra/envault
cd envault
pip install -e ".[dev,all]"
pytest
pytest --cov=envault --cov-report=term-missing

Tests are organized per module in tests/. Fixtures in tests/fixtures/ cover clean samples, known-secret samples (with secrets at documented positions), and edge cases (Unicode, empty input, long lines).


License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

envaultx-0.1.0.tar.gz (33.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

envaultx-0.1.0-py3-none-any.whl (20.4 kB view details)

Uploaded Python 3

File details

Details for the file envaultx-0.1.0.tar.gz.

File metadata

  • Download URL: envaultx-0.1.0.tar.gz
  • Upload date:
  • Size: 33.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for envaultx-0.1.0.tar.gz
Algorithm Hash digest
SHA256 fdcf83ddb9ff35b8bef9e1f6fb79732149f8f44cc6db9d5d4a5a462e36676904
MD5 1f5ae987d1a6fee598f90d95edbe95cf
BLAKE2b-256 30815861fbda3e83a5e4b0f449d08e719c0055b0b4490026527c3b671444aa31

See more details on using hashes here.

File details

Details for the file envaultx-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: envaultx-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 20.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for envaultx-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 20259c270011abcd44789c7f236eaff3eafeb3a97975e1283f3247c6dc2f075f
MD5 66c425dccccd12e6b4a5df57110e4f48
BLAKE2b-256 cb28f8f1bafefc292931a2f50835350a82866b37af04252f00aefb3557f7e57c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page