Privacy-first text redaction using local LLM models with rule generation capabilities
Project description
LLM Redact
Privacy-first text redaction using local LLM models. Automatically detect and redact sensitive information like names, emails, phone numbers, and more.
Features
- 🔒 Privacy-first - Uses local LLM models, no data sent to external services
- 🚀 Simple API - One-liner redaction:
llm_redact.mask(text) - 💾 Smart Caching - SQLite database for caching and history
- 🔧 Configurable - Custom rules, models, and database connections
- 📊 Tracking - Full history and analytics of redaction operations
Installation
pip install llm-redact
Quick Start
import llm_redact
# Simple redaction
result = llm_redact.mask("Hi, I'm John Doe from john@example.com")
print(result.redacted_text)
# Output: "Hi, I'm |_NAME_A1B2C3D4_| from |_EMAIL_E5F6G7H8_|"
print(result.replacements)
# Output: [
# Replacement(original_text="John Doe", replacement_text="|_NAME_A1B2C3D4_|"),
# Replacement(original_text="john@example.com", replacement_text="|_EMAIL_E5F6G7H8_|")
# ]
# Note: Placeholders contain unique IDs and can be stored in database for restoration
# Each placeholder like |_NAME_A1B2C3D4_| maps to original text via database lookup
Configuration
Environment Variables
# LLM Host (default: http://localhost:8000)
export LLM_REDACT_LLM_HOST_URL=http://localhost:8000
# Database (default: sqlite:///llm_redact.db)
export LLM_REDACT_DATABASE_URL=sqlite:///my_redact.db
# Model (default: gemma3:1b)
export LLM_REDACT_DEFAULT_MODEL=gemma3:1b
# Caching (default: True)
export LLM_REDACT_ENABLE_CACHING=true
Custom Database
import llm_redact
# Use PostgreSQL
llm_redact.configure_client(
database_url="postgresql://user:pass@localhost/redact_db"
)
# Use custom LLM host
llm_redact.configure_client(
llm_host_url="http://my-llm-server:8000"
)
Advanced Usage
Custom Rules
from llm_redact import RedactionRule
custom_rules = [
RedactionRule(
name="Replace SSN with [SSN]",
description="Social Security Numbers",
data_type="SSN"
),
RedactionRule(
name="Replace addresses with [ADDRESS]",
description="Physical addresses",
data_type="ADDRESS"
)
]
result = llm_redact.mask(
"My SSN is 123-45-6789 and I live at 123 Main St",
rules=custom_rules
)
Using the Client Directly
from llm_redact import LLMRedactClient
client = LLMRedactClient(
llm_host_url="http://localhost:8000",
database_url="sqlite:///custom.db"
)
result = client.mask("Sensitive text here")
# Get history
history = client.get_history(limit=10)
# Create custom rules
rule = client.create_rule(
name="Replace API keys with [API_KEY]",
description="API keys and tokens"
)
Prerequisites
-
LLM Host Server: Run the llm-redact host server locally:
# Install and run the LLM host ollama serve ollama pull gemma3:1b # Run llm-redact host server python -m llm_redact_host
-
Database: SQLite (default) or any SQLAlchemy-supported database
Supported Redaction Types
- Personal names →
|_NAME_XXXX_| - Email addresses →
|_EMAIL_XXXX_| - Phone numbers →
|_PHONE_XXXX_| - Countries →
|_COUNTRY_XXXX_| - Universities →
|_UNIVERSITY_XXXX_| - Job titles →
|_JOB_TITLE_XXXX_| - Addresses →
|_ADDRESS_XXXX_| - Social Security Numbers →
|_SSN_XXXX_| - Credit card numbers →
|_CREDIT_CARD_XXXX_|
Where XXXX is a unique 8-character hash ID for each piece of data.
API Reference
llm_redact.mask(text, rules=None, model=None)
Redact sensitive information from text.
Parameters:
text(str): Text to redactrules(list, optional): Custom redaction rulesmodel(str, optional): LLM model to use
Returns: RedactionResult object
RedactionResult
original_text: Original input textredacted_text: Text with sensitive data redactedreplacements: List of replacements madeis_redacted: Whether any redactions were madeprocessing_time_ms: Processing time in millisecondscached: Whether result was from cache
License
MIT License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_redact-0.1.1.tar.gz.
File metadata
- Download URL: llm_redact-0.1.1.tar.gz
- Upload date:
- Size: 19.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.13.5 Darwin/22.4.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
61db1eed1179007a0ad25cda31c6b086088d49632b1902ec9d142906570937a6
|
|
| MD5 |
ac6c14c4877a4d042539fb9a26eae010
|
|
| BLAKE2b-256 |
3963c44473eded2d91d75e640c57293ddc8fb821a88f880c3cdf4c8f0d42ac65
|
File details
Details for the file llm_redact-0.1.1-py3-none-any.whl.
File metadata
- Download URL: llm_redact-0.1.1-py3-none-any.whl
- Upload date:
- Size: 27.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.1.3 CPython/3.13.5 Darwin/22.4.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0c08d26a8cdeb469c09addfb1731ae1f8bdd8ada4e2e732a7f187b9c8fcc9a6b
|
|
| MD5 |
2ac6d5f387128241d4b3098aae6710ad
|
|
| BLAKE2b-256 |
b94690776b1b60c99e65eb5db90e8b28fd157d94698b416c1fef60248b496c0b
|