Secure, LGPD-compliant middleware for protecting sensitive data in LLM prompts
Project description
Aegis Vault 🛡️
Secure, LGPD-compliant middleware for protecting sensitive data in LLM prompts. Aegis Vault automatically detects, redacts, and encrypts sensitive information before it reaches LLM APIs, ensuring compliance with data protection regulations.
✨ Features
- 🔍 Automatic detection of sensitive data (CPF, CNPJ, emails, etc.)
- 🔒 Secure encryption of sensitive information
- 🛡️ Protection against prompt injection and data leaks
- 🔄 Easy restoration of original content in LLM responses
- 🚀 Simple integration with any LLM workflow
- 🇧🇷 Optimized for Brazilian data protection (LGPD)
🔐 What It Does
Aegis Vault provides a secure middleware layer between your application and LLMs:
- Detects sensitive data using regex patterns and NER (Named Entity Recognition)
- Redacts and encrypts PII before sending to LLMs
- Securely stores encrypted data in a local vault
- Restores redacted content in LLM responses
- Blocks malicious inputs including prompt injection and DoS patterns
- LGPD-compliant with special focus on Brazilian Portuguese data
📦 Installation
Install using pip:
pip install aegis-vault
For development with additional tools:
pip install 'aegis-vault[dev]'
🚀 Quick Start
Basic Usage with System Prompts
When integrating with LLMs, it's crucial to include a system prompt that instructs the model to preserve vault markers. Here's how to do it:
from aegis_vault import VaultGPT
# Initialize with a custom system prompt
vault = VaultGPT(
encryption_key="your-secure-key",
system_prompt="""
You are processing text with sensitive information.
IMPORTANT: Preserve all <<VAULT_X>> markers exactly as they appear.
Never modify, remove, or reorder these markers in your responses.
""".strip()
)
def query_llm(prompt, system_prompt=None):
"""Example function to call an LLM API"""
# In a real implementation, you would call your LLM API here
# For example, with OpenAI:
# response = openai.ChatCompletion.create(
# model="gpt-3.5-turbo",
# messages=[
# {"role": "system", "content": system_prompt or ""},
# {"role": "user", "content": prompt}
# ]
# )
# return response.choices[0].message['content']
# For demonstration, just return a mock response
return f"Processed your request. Detected sensitive data: {prompt}"
# The secure_chat method will automatically handle redaction and restoration
response = vault.secure_chat(
"My email is user@example.com and my SSN is 123-45-6789",
query_llm
)
print(response)
Basic Usage
from aegis_vault import VaultGPT
# Initialize the vault with default settings
vault = VaultGPT()
# Process a prompt securely
def my_llm_function(prompt):
# This is where you would call your actual LLM
return f"Processed: {prompt}"
# Sensitive data will be automatically detected and protected
response = vault.secure_chat(
"Meu CPF é 123.456.789-00 e meu email é usuario@exemplo.com.br",
my_llm_function
)
print(response)
# Output: Processed: Meu CPF é 123.456.789-00 e meu email é usuario@exemplo.com.br
Advanced Usage
from aegis_vault import VaultGPT
# Initialize with custom encryption key
vault = VaultGPT(encryption_key="my-secret-key-123")
# Redact sensitive information from text
redacted = vault.redact_prompt(
"Por favor, envie um email para usuario@exemplo.com informando sobre o CPF 123.456.789-00"
)
print(f"Redacted: {redacted}")
# Output: Redacted: Por favor, envie um email para <<VAULT_0>> informando sobre o CPF <<VAULT_1>>
# Restore original content
restored = vault.restore_content(redacted)
print(f"Restored: {restored}")
# Output: Restored: Por favor, envie um email para usuario@exemplo.com informando sobre o CPF 123.456.789-00
📚 Usage Guide
System Prompt Best Practices
When working with LLMs, it's important to include clear instructions about handling vault markers. Here's a recommended approach:
- Be Explicit: Clearly state that the markers (
<<VAULT_X>>) are special and must be preserved - Provide Clear Rules: Give specific instructions about not modifying, removing, or reordering the markers
- Include Examples: Show examples of correct and incorrect behavior
- Make it Stand Out: Use formatting (like ALL CAPS or emojis) to draw attention to these instructions
Example system prompt:
system_prompt = """
You are a helpful assistant that processes text containing sensitive information.
IMPORTANT: The user's message may contain special markers like <<VAULT_0>>, <<VAULT_1>>, etc.
These markers represent redacted sensitive information.
RULES:
1. NEVER modify, remove, or reorder these markers in your response
2. Return all markers exactly as they appear in the input
3. If you need to refer to the redacted content, use the marker itself
4. Do not try to guess what the markers represent
5. If unsure, respond with the markers unchanged
""".strip()
vault = VaultGPT(system_prompt=system_prompt)
Initialization Options
from aegis_vault import VaultGPT
# Basic initialization (auto-generates encryption key)
vault = VaultGPT()
# With custom encryption key
vault = VaultGPT(encryption_key="your-32-char-secret-key")
# Disable NER for better performance if not needed
vault = VaultGPT(use_ner=False)
# Lazy load spaCy model (load only when needed)
vault = VaultGPT(load_spacy=False)
# Later, when needed:
# vault._load_spacy_model("pt_core_news_sm")
Secure Chat Integration
def query_llm(prompt):
"""Example function to simulate LLM API call"""
# In a real scenario, this would call your LLM API
return f"LLM Response to: {prompt}"
# Process sensitive prompts securely
response = vault.secure_chat(
"Meus dados são: CPF 123.456.789-00, email: usuario@exemplo.com",
query_llm
)
print(response)
Advanced Features
Custom Patterns
# Add custom patterns for sensitive data
vault.PATTERNS['credit_card'] = r'\b(?:\d[ -]*?){13,16}\b'
# Add custom malicious patterns to block
vault.MALICIOUS_PATTERNS.append(r'shutdown\s+computer')
Vault Management
# Export vault data (can include encryption key if needed)
json_data = vault.export_vault(include_key=False) # Don't include key in exports by default
# Save vault to file (encrypted)
vault.save_vault_to_file("secure_vault.json", include_key=False)
# Load vault from file
new_vault = VaultGPT()
new_vault.load_vault_from_file("secure_vault.json")
# Note: You'll need to set the encryption key separately if it wasn't included
💡 Use Cases
- Healthcare: Protect patient data in medical AI applications
- Finance: Secure financial information in banking chatbots
- Legal: Ensure client confidentiality in legal document processing
- Customer Support: Protect customer information in support chatbots
- Enterprise: Maintain LGPD compliance in corporate AI systems
📄 License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aegis_vault-0.1.1.tar.gz.
File metadata
- Download URL: aegis_vault-0.1.1.tar.gz
- Upload date:
- Size: 17.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
995bf5a3015cfd4c9f67ce18a327e83b0f6f5e63be16356703f5a47054d2fd78
|
|
| MD5 |
f46b07c218bd932eda183997b463f652
|
|
| BLAKE2b-256 |
9b62bd18e0972e897d53171caf03922f230d9d46192aa61487321a20cd9d4262
|
File details
Details for the file aegis_vault-0.1.1-py3-none-any.whl.
File metadata
- Download URL: aegis_vault-0.1.1-py3-none-any.whl
- Upload date:
- Size: 13.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.10.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
00b1f7ae7b5e8428be7da7b5655617aa8788337d9b62ebb3df2fea1106c0778c
|
|
| MD5 |
7368ac8d6d46130dc20fba6524f65a92
|
|
| BLAKE2b-256 |
df02657ad646559d271153c657c058fe32fc97482dc776decfc772227c3ab54c
|