On-premise PII detection and masking for AI applications
Project description
Aegis SDK
On-premise PII detection and masking for AI applications.
Aegis enables enterprise customers to protect sensitive data before it reaches AI tools, while ensuring data never leaves customer infrastructure. The platform separates:
- Control Plane (Aegis Cloud): License management, policy configuration, metrics dashboard
- Data Plane (Customer Infrastructure): Detection, masking, decision-making
Installation
pip install aegis-sdk
With LLM integrations:
pip install aegis-sdk[openai] # OpenAI support
pip install aegis-sdk[anthropic] # Anthropic/Claude support
pip install aegis-sdk[langchain] # LangChain support
pip install aegis-sdk[all] # All integrations
Quick Start
Simple Text Processing
from aegis_sdk import Aegis
aegis = Aegis()
# Process text before sending to AI tools
result = aegis.process(
text="Contact john@example.com at 555-123-4567",
destination="AI_TOOL"
)
print(result.decision) # ALLOWED_WITH_MASKING
print(result.masked_content) # Contact j***@example.com at XXX-XXX-4567
OpenAI Integration (Drop-in Replacement)
from aegis_sdk import AegisOpenAI
client = AegisOpenAI(
api_key="sk-...",
aegis_license_key="aegis_lic_..."
)
# Use exactly like the OpenAI client - PII is masked automatically
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": user_input}]
)
Large File Processing (200MB+)
from aegis_sdk import StreamingProcessor
processor = StreamingProcessor(chunk_size_mb=10)
result = processor.process_file(
input_path="customer_data.csv",
output_path="customer_data_masked.csv",
destination="VENDOR",
on_progress=lambda b, t, c: print(f"Processed {b / 1e6:.1f} MB")
)
print(f"Decision: {result.decision}")
print(f"Processed: {result.mb_processed:.1f} MB")
# Memory usage: constant ~30MB regardless of file size
Features
| Feature | Description |
|---|---|
| Pattern Detection | Email, phone, SSN, credit card (Luhn validated), API secrets, IBAN, PHI keywords |
| Format-Preserving Masking | john@example.com → j***@example.com |
| Policy-Based Decisions | ALLOWED, ALLOWED_WITH_MASKING, BLOCKED |
| Streaming Processing | 200MB+ files with constant 30MB memory |
| LLM Integrations | OpenAI, Anthropic, LangChain drop-in wrappers |
| GDPR Compliance | Metadata-only audit mode, no PII stored |
| Offline Support | 7-day grace period, air-gapped deployment |
Decision Logic
| Destination | PII Only | PHI Only | PHI + PII | API Secrets |
|---|---|---|---|---|
| AI_TOOL | Mask | Allow | Block | Mask |
| VENDOR | Mask | Block | Block | Mask |
| CUSTOMER | Allow | Allow | Allow | Block |
Core API
Aegis Class
from aegis_sdk import Aegis
aegis = Aegis(
license_key=None, # Optional license key for cloud features
include_samples=True, # Set False for GDPR compliance
policy_config=None # Custom policy configuration
)
# Full processing
result = aegis.process(text, destination="AI_TOOL")
# Detection only
detected = aegis.detect(text) # Returns list[DetectedItem]
# Masking only
masked = aegis.mask(text) # Returns masked string
# Policy evaluation
decision = aegis.evaluate(destination, detected)
ProcessingResult
result.decision # Decision enum: ALLOWED, ALLOWED_WITH_MASKING, BLOCKED
result.summary # Human-readable explanation
result.detected # List of DetectedItem
result.masked_content # Masked text (None if blocked)
result.suggested_fix # Suggested action (for blocked content)
result.bytes_processed # Number of bytes processed
result.is_blocked # True if decision is BLOCKED
Detection Types
| Type | Description | Example |
|---|---|---|
EMAIL |
Email addresses | john@example.com |
PHONE |
Phone numbers | 555-123-4567 |
SSN |
Social Security Numbers | 123-45-6789 |
CREDIT_CARD |
Credit cards (Luhn validated) | 4111-1111-1111-1111 |
API_SECRET |
API keys and secrets | sk-abc123... |
IBAN |
International Bank Account Numbers | DE89370400440532013000 |
PHI_KEYWORD |
Protected Health Information | patient, diagnosis |
Batch Processing
StreamingProcessor (Large Files)
Process files of any size with constant memory usage (~30MB).
from aegis_sdk import StreamingProcessor
processor = StreamingProcessor(
chunk_size_mb=10, # Process in 10MB chunks
policy_config=None, # Custom policy
include_samples=True # Include detection samples
)
result = processor.process_file(
input_path="large_file.txt",
output_path="large_file_masked.txt", # Optional
destination="AI_TOOL",
on_progress=lambda b, t, c: print(f"{b/1e6:.1f} MB"),
stop_on_block=False # Continue even if content would be blocked
)
# Result contains aggregated detections across all chunks
print(f"Chunks: {result.chunks_processed}")
print(f"Bytes: {result.bytes_processed}")
print(f"Decision: {result.decision}")
CSVStreamProcessor
Specialized processor for CSV files with column-level detection.
from aegis_sdk import CSVStreamProcessor
processor = CSVStreamProcessor()
# Auto-detect which columns contain PII
column_types = processor.detect_columns("data.csv", sample_rows=100)
# {0: ["EMAIL"], 2: ["PHONE", "SSN"]}
# Process specific columns
result = processor.process(
input_path="data.csv",
output_path="data_masked.csv",
destination="VENDOR",
has_header=True,
columns_to_mask=[0, 2, 5] # Only mask these columns
)
LLM Integrations
LLM Gateway (Base)
from aegis_sdk import AegisLLMGateway
gateway = AegisLLMGateway(
policy_config=None, # Custom policy
enable_audit=True, # Enable local audit
audit_path="audit.log"
)
# Mask a single prompt
result = gateway.mask_prompt(
prompt="My email is john@example.com",
destination="AI_TOOL",
reversible=True # Enable unmask_response later
)
if result.is_blocked:
raise Exception(result.block_reason)
print(result.masked_text) # My email is j***@example.com
# Mask chat messages
masked_messages, detected = gateway.mask_messages(
messages=[
{"role": "system", "content": "You are helpful"},
{"role": "user", "content": "Contact john@example.com"}
],
roles_to_mask=["user"] # Only mask user messages
)
# Unmask LLM response (if reversible=True was used)
original = gateway.unmask_response(response_text)
OpenAI Wrapper
from aegis_sdk import AegisOpenAI
client = AegisOpenAI(
api_key="sk-...",
aegis_config={"custom": "policy"},
destination="AI_TOOL",
block_on_pii=True # Raise error if PII detected
)
# Drop-in replacement for OpenAI client
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": user_input}]
)
# PII is automatically masked before sending to OpenAI
Anthropic Wrapper
from aegis_sdk import AegisAnthropic
client = AegisAnthropic(
api_key="sk-ant-...",
aegis_config=None
)
response = client.messages.create(
model="claude-3-opus-20240229",
max_tokens=1024,
messages=[{"role": "user", "content": user_input}]
)
LangChain Integration
from aegis_sdk.llm.langchain_wrapper import (
AegisLangChainCallback,
create_aegis_chain
)
from langchain_openai import ChatOpenAI
# Option 1: Use callback handler
callback = AegisLangChainCallback(destination="AI_TOOL")
llm = ChatOpenAI(callbacks=[callback])
# Option 2: Create pre-wrapped chain
chain = create_aegis_chain(
llm=ChatOpenAI(),
destination="AI_TOOL"
)
result = chain.invoke({"input": user_message})
License Management
Online License Validation
from aegis_sdk import LicenseManager
manager = LicenseManager(
license_key="aegis_lic_...",
cache_dir=None, # Default: ~/.aegis
cache_ttl=86400, # 24 hour cache
grace_period_days=7, # Offline grace period
api_endpoint="https://api.aegispreflight.com/v1"
)
# Validate license (uses cache if available)
info = manager.validate()
print(f"Org: {info.org_id}")
print(f"Expires: {info.expires}")
# Get policy configuration
policy = manager.get_policy()
# Check validity
if manager.is_valid():
print("License is valid")
Offline/Air-Gapped Mode
from aegis_sdk import OfflineLicenseManager
manager = OfflineLicenseManager(
license_file="/path/to/aegis_license.json"
)
info = manager.validate()
# Works without any network access
Metrics Reporting
Cloud Metrics (Async, Non-Blocking)
from aegis_sdk import MetricsReporter
reporter = MetricsReporter(
license_key="aegis_lic_...",
batch_size=100, # Flush after 100 events
flush_interval=300, # Or every 5 minutes
enabled=True
)
# Record processing results (non-blocking)
reporter.record(
decision="ALLOWED_WITH_MASKING",
bytes_processed=1024,
detected_types=["EMAIL", "PHONE"],
detected_counts={"EMAIL": 2, "PHONE": 1},
destination="AI_TOOL",
duration_ms=15.5
)
# Or from a ProcessingResult
reporter.record_from_result(result, destination="AI_TOOL")
# Manual flush before shutdown
reporter.flush()
reporter.stop()
Important: No customer data is ever sent to cloud. Only aggregated counts and types.
Local Metrics (Air-Gapped)
from aegis_sdk import LocalMetricsCollector
collector = LocalMetricsCollector(
output_path="/var/log/aegis/metrics.jsonl",
rotation_size_mb=100
)
collector.record(
decision="ALLOWED_WITH_MASKING",
bytes_processed=1024,
detected_types=["EMAIL"],
detected_counts={"EMAIL": 1}
)
# Get summary
summary = collector.get_summary()
print(f"Total checks: {summary['total_checks']}")
print(f"Bytes scanned: {summary['bytes_scanned']}")
Audit Logging
Standard Audit Log
from aegis_sdk import AuditLog
audit = AuditLog(
log_path="/var/log/aegis/audit.jsonl",
metadata_only=False, # Store samples (set True for GDPR)
rotation_days=30, # Rotate after 30 days
rotation_size_mb=100, # Or after 100MB
compress_rotated=True, # Gzip old logs
retention_days=365, # Delete logs older than 1 year
verify_chain=True # Enable hash chain verification
)
# Log a processing event
audit.log_processing(
decision="ALLOWED_WITH_MASKING",
destination="AI_TOOL",
detected=[{"type": "EMAIL", "count": 2}],
bytes_processed=1024,
source="batch_job",
user_id="user123",
session_id="sess_abc",
custom_fields={"job_id": "job_456"}
)
# Query audit log (returns iterator for memory efficiency)
entries = list(audit.query(
start_time=datetime(2024, 1, 1),
end_time=datetime(2024, 12, 31),
decisions=["BLOCKED"],
limit=100
))
# Verify log integrity
is_valid, error_msg = audit.verify_integrity()
if not is_valid:
print(f"Chain integrity error: {error_msg}")
GDPR Audit Log
Specialized audit log that never stores PII samples.
from aegis_sdk import GDPRAuditLog
audit = GDPRAuditLog(
log_path="/var/log/aegis/gdpr_audit.jsonl"
)
# metadata_only is forced True - no samples ever stored
# Log stores: {"type": "EMAIL", "count": 5}
# NOT: {"type": "EMAIL", "sample": "j***@example.com"}
# Generate GDPR data subject report
report = audit.get_data_subject_report(
user_id="user123",
start_time=datetime(2024, 1, 1)
)
print(f"Processing events: {report['total_events']}")
print(f"Data types processed: {report['detection_types']}")
Configuration
Custom Policy
from aegis_sdk import Aegis, PolicyEngine
custom_policy = {
"destinations": {
"AI_TOOL": {
"allowed": [],
"masked": ["EMAIL", "PHONE", "CREDIT_CARD"],
"blocked": ["SSN", "PHI_KEYWORD"]
},
"INTERNAL": {
"allowed": ["EMAIL", "PHONE"],
"masked": [],
"blocked": ["CREDIT_CARD", "SSN"]
}
}
}
aegis = Aegis(policy_config=custom_policy)
# Or
policy = PolicyEngine(custom_policy)
Environment Variables
AEGIS_LICENSE_KEY=aegis_lic_... # License key
AEGIS_CACHE_DIR=~/.aegis # Cache directory
AEGIS_API_ENDPOINT=https://api.aegispreflight.com/v1
AEGIS_OFFLINE_MODE=false # Enable offline mode
AEGIS_METRICS_ENABLED=true # Enable metrics reporting
Performance
| Operation | Latency | Memory |
|---|---|---|
| Text processing (1KB) | 5-15ms | ~10MB |
| File processing (200MB) | 20-30s | 30MB constant |
| LLM prompt masking | 5-15ms | ~10MB |
| Metrics recording | <1ms (async) | ~5MB buffer |
Error Handling
from aegis_sdk import (
AegisError,
AegisBlockedError,
LicenseValidationError,
PolicyError
)
try:
result = aegis.process(text, destination="AI_TOOL")
except AegisBlockedError as e:
print(f"Content blocked: {e.message}")
print(f"Detected: {e.detected}")
except LicenseValidationError as e:
print(f"License error: {e}")
except PolicyError as e:
print(f"Policy error: {e}")
except AegisError as e:
print(f"General error: {e}")
Complete Example
from aegis_sdk import (
Aegis,
AegisOpenAI,
MetricsReporter,
GDPRAuditLog,
LicenseManager
)
import time
# Initialize components
license_mgr = LicenseManager("aegis_lic_...")
if not license_mgr.is_valid():
raise Exception("Invalid license")
metrics = MetricsReporter("aegis_lic_...")
audit = GDPRAuditLog("/var/log/aegis/audit.jsonl")
# Create OpenAI client with Aegis protection
client = AegisOpenAI(
api_key="sk-...",
aegis_config=license_mgr.get_policy()
)
# Process user request
user_input = "My email is john@example.com, help me draft a response"
start = time.time()
try:
response = client.chat.completions.create(
model="gpt-4",
messages=[{"role": "user", "content": user_input}]
)
duration_ms = (time.time() - start) * 1000
# Record metrics (async, non-blocking)
metrics.record(
decision="ALLOWED_WITH_MASKING",
bytes_processed=len(user_input),
detected_types=["EMAIL"],
detected_counts={"EMAIL": 1},
duration_ms=duration_ms
)
# Log to audit (for compliance)
audit.log_processing(
decision="ALLOWED_WITH_MASKING",
destination="AI_TOOL",
detected=[{"type": "EMAIL", "count": 1}],
bytes_processed=len(user_input),
user_id="user123"
)
print(response.choices[0].message.content)
except AegisBlockedError as e:
print(f"Request blocked: {e.message}")
audit.log_processing(
decision="BLOCKED",
destination="AI_TOOL",
detected=e.detected,
bytes_processed=len(user_input),
user_id="user123"
)
# Cleanup
metrics.stop()
License
Proprietary - Aegis Preflight
For licensing inquiries, contact sales@aegispreflight.com
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file aegis_sdk-0.1.1.tar.gz.
File metadata
- Download URL: aegis_sdk-0.1.1.tar.gz
- Upload date:
- Size: 199.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.9 {"installer":{"name":"uv","version":"0.9.9"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8d321433d1417946c64034abba963133828ed34b02d1eabf8d3e397d4012c2e3
|
|
| MD5 |
44dc82b61d1cb37e054ea773bf72c38e
|
|
| BLAKE2b-256 |
fc8729c601d6d3c9d4c24e26334ae2d3a4693d20b87c394f85d49ffada6d18a0
|
File details
Details for the file aegis_sdk-0.1.1-py3-none-any.whl.
File metadata
- Download URL: aegis_sdk-0.1.1-py3-none-any.whl
- Upload date:
- Size: 50.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.9 {"installer":{"name":"uv","version":"0.9.9"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e49bb317016973ea46747e384a017f9fc3eab4da836668d5cc595c6a0365521
|
|
| MD5 |
72cdb5e63564006d71382e1607901ebe
|
|
| BLAKE2b-256 |
1051db3e93c004ab7324fdbc7076ad761a6400a02bb1a8af4ba79cf33101c7a2
|