Skip to main content

Secure, LGPD-compliant middleware for protecting sensitive data in LLM prompts

Project description

Aegis Vault 🛡️

PyPI version License Python Version

Secure, LGPD-compliant middleware for protecting sensitive data in LLM prompts. Aegis Vault automatically detects, redacts, and encrypts sensitive information before it reaches LLM APIs, ensuring compliance with data protection regulations.

✨ Features

  • 🔍 Automatic detection of sensitive data (CPF, CNPJ, emails, etc.)
  • 🔒 Secure encryption of sensitive information
  • 🛡️ Protection against prompt injection and data leaks
  • 🔄 Easy restoration of original content in LLM responses
  • 🚀 Simple integration with any LLM workflow
  • 🇧🇷 Optimized for Brazilian data protection (LGPD)

🔐 What It Does

Aegis Vault provides a secure middleware layer between your application and LLMs:

  • Detects sensitive data using regex patterns and NER (Named Entity Recognition)
  • Redacts and encrypts PII before sending to LLMs
  • Securely stores encrypted data in a local vault
  • Restores redacted content in LLM responses
  • Blocks malicious inputs including prompt injection and DoS patterns
  • LGPD-compliant with special focus on Brazilian Portuguese data

📦 Installation

Install using pip:

pip install aegis-vault

For development with additional tools:

pip install 'aegis-vault[dev]'

🚀 Quick Start

Basic Usage with System Prompts

When integrating with LLMs, it's crucial to include a system prompt that instructs the model to preserve vault markers. Here's how to do it:

from aegis_vault import VaultGPT

# Initialize with a custom system prompt
vault = VaultGPT(
    encryption_key="your-secure-key",
    system_prompt="""
    You are processing text with sensitive information.
    
    IMPORTANT: Preserve all <<VAULT_X>> markers exactly as they appear.
    Never modify, remove, or reorder these markers in your responses.
    """.strip()
)

def query_llm(prompt, system_prompt=None):
    """Example function to call an LLM API"""
    # In a real implementation, you would call your LLM API here
    # For example, with OpenAI:
    # response = openai.ChatCompletion.create(
    #     model="gpt-3.5-turbo",
    #     messages=[
    #         {"role": "system", "content": system_prompt or ""},
    #         {"role": "user", "content": prompt}
    #     ]
    # )
    # return response.choices[0].message['content']
    
    # For demonstration, just return a mock response
    return f"Processed your request. Detected sensitive data: {prompt}"

# The secure_chat method will automatically handle redaction and restoration
response = vault.secure_chat(
    "My email is user@example.com and my SSN is 123-45-6789",
    query_llm
)
print(response)

Basic Usage

from aegis_vault import VaultGPT

# Initialize the vault with default settings
vault = VaultGPT()

# Process a prompt securely
def my_llm_function(prompt):
    # This is where you would call your actual LLM
    return f"Processed: {prompt}"

# Sensitive data will be automatically detected and protected
response = vault.secure_chat(
    "Meu CPF é 123.456.789-00 e meu email é usuario@exemplo.com.br",
    my_llm_function
)

print(response)
# Output: Processed: Meu CPF é 123.456.789-00 e meu email é usuario@exemplo.com.br

Advanced Usage

from aegis_vault import VaultGPT

# Initialize with custom encryption key
vault = VaultGPT(encryption_key="my-secret-key-123")

# Redact sensitive information from text
redacted = vault.redact_prompt(
    "Por favor, envie um email para usuario@exemplo.com informando sobre o CPF 123.456.789-00"
)
print(f"Redacted: {redacted}")
# Output: Redacted: Por favor, envie um email para <<VAULT_0>> informando sobre o CPF <<VAULT_1>>

# Restore original content
restored = vault.restore_content(redacted)
print(f"Restored: {restored}")
# Output: Restored: Por favor, envie um email para usuario@exemplo.com informando sobre o CPF 123.456.789-00

📚 Usage Guide

System Prompt Best Practices

When working with LLMs, it's important to include clear instructions about handling vault markers. Here's a recommended approach:

  1. Be Explicit: Clearly state that the markers (<<VAULT_X>>) are special and must be preserved
  2. Provide Clear Rules: Give specific instructions about not modifying, removing, or reordering the markers
  3. Include Examples: Show examples of correct and incorrect behavior
  4. Make it Stand Out: Use formatting (like ALL CAPS or emojis) to draw attention to these instructions

Example system prompt:

system_prompt = """
You are a helpful assistant that processes text containing sensitive information.

IMPORTANT: The user's message may contain special markers like <<VAULT_0>>, <<VAULT_1>>, etc.
These markers represent redacted sensitive information.

RULES:
1. NEVER modify, remove, or reorder these markers in your response
2. Return all markers exactly as they appear in the input
3. If you need to refer to the redacted content, use the marker itself
4. Do not try to guess what the markers represent
5. If unsure, respond with the markers unchanged
""".strip()

vault = VaultGPT(system_prompt=system_prompt)

Initialization Options

from aegis_vault import VaultGPT

# Basic initialization (auto-generates encryption key)
vault = VaultGPT()

# With custom encryption key
vault = VaultGPT(encryption_key="your-32-char-secret-key")

# Disable NER for better performance if not needed
vault = VaultGPT(use_ner=False)

# Lazy load spaCy model (load only when needed)
vault = VaultGPT(load_spacy=False)
# Later, when needed:
# vault._load_spacy_model("pt_core_news_sm")

Secure Chat Integration

def query_llm(prompt):
    """Example function to simulate LLM API call"""
    # In a real scenario, this would call your LLM API
    return f"LLM Response to: {prompt}"

# Process sensitive prompts securely
response = vault.secure_chat(
    "Meus dados são: CPF 123.456.789-00, email: usuario@exemplo.com",
    query_llm
)
print(response)

Advanced Features

Custom Patterns

# Add custom patterns for sensitive data
vault.PATTERNS['credit_card'] = r'\b(?:\d[ -]*?){13,16}\b'

# Add custom malicious patterns to block
vault.MALICIOUS_PATTERNS.append(r'shutdown\s+computer')

Vault Management

# Export vault data (can include encryption key if needed)
json_data = vault.export_vault(include_key=False)  # Don't include key in exports by default

# Save vault to file (encrypted)
vault.save_vault_to_file("secure_vault.json", include_key=False)

# Load vault from file
new_vault = VaultGPT()
new_vault.load_vault_from_file("secure_vault.json")
# Note: You'll need to set the encryption key separately if it wasn't included

💡 Use Cases

  • Healthcare: Protect patient data in medical AI applications
  • Finance: Secure financial information in banking chatbots
  • Legal: Ensure client confidentiality in legal document processing
  • Customer Support: Protect customer information in support chatbots
  • Enterprise: Maintain LGPD compliance in corporate AI systems

📄 License

This project is licensed under the Apache License 2.0 - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aegis_vault-0.1.1.tar.gz (17.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

aegis_vault-0.1.1-py3-none-any.whl (13.9 kB view details)

Uploaded Python 3

File details

Details for the file aegis_vault-0.1.1.tar.gz.

File metadata

  • Download URL: aegis_vault-0.1.1.tar.gz
  • Upload date:
  • Size: 17.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for aegis_vault-0.1.1.tar.gz
Algorithm Hash digest
SHA256 995bf5a3015cfd4c9f67ce18a327e83b0f6f5e63be16356703f5a47054d2fd78
MD5 f46b07c218bd932eda183997b463f652
BLAKE2b-256 9b62bd18e0972e897d53171caf03922f230d9d46192aa61487321a20cd9d4262

See more details on using hashes here.

File details

Details for the file aegis_vault-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: aegis_vault-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 13.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.12

File hashes

Hashes for aegis_vault-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 00b1f7ae7b5e8428be7da7b5655617aa8788337d9b62ebb3df2fea1106c0778c
MD5 7368ac8d6d46130dc20fba6524f65a92
BLAKE2b-256 df02657ad646559d271153c657c058fe32fc97482dc776decfc772227c3ab54c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page