Skip to main content

Privacy-first text redaction using local LLM models with rule generation capabilities

Project description

LLM Redact

Privacy-first text redaction using local LLM models. Automatically detect and redact sensitive information like names, emails, phone numbers, and more.

Features

  • 🔒 Privacy-first - Uses local LLM models, no data sent to external services
  • 🚀 Simple API - One-liner redaction: llm_redact.mask(text)
  • 💾 Smart Caching - SQLite database for caching and history
  • 🔧 Configurable - Custom rules, models, and database connections
  • 📊 Tracking - Full history and analytics of redaction operations

Installation

pip install llm-redact

Quick Start

import llm_redact

# Simple redaction
result = llm_redact.mask("Hi, I'm John Doe from john@example.com")
print(result.redacted_text)
# Output: "Hi, I'm |_NAME_A1B2C3D4_| from |_EMAIL_E5F6G7H8_|"

print(result.replacements)
# Output: [
#   Replacement(original_text="John Doe", replacement_text="|_NAME_A1B2C3D4_|"),
#   Replacement(original_text="john@example.com", replacement_text="|_EMAIL_E5F6G7H8_|")
# ]

# Note: Placeholders contain unique IDs and can be stored in database for restoration
# Each placeholder like |_NAME_A1B2C3D4_| maps to original text via database lookup

Configuration

Environment Variables

# LLM Host (default: http://localhost:8000)
export LLM_REDACT_LLM_HOST_URL=http://localhost:8000

# Database (default: sqlite:///llm_redact.db)
export LLM_REDACT_DATABASE_URL=sqlite:///my_redact.db

# Model (default: gemma3:1b)
export LLM_REDACT_DEFAULT_MODEL=gemma3:1b

# Caching (default: True)
export LLM_REDACT_ENABLE_CACHING=true

Custom Database

import llm_redact

# Use PostgreSQL
llm_redact.configure_client(
    database_url="postgresql://user:pass@localhost/redact_db"
)

# Use custom LLM host
llm_redact.configure_client(
    llm_host_url="http://my-llm-server:8000"
)

Advanced Usage

Custom Rules

from llm_redact import RedactionRule

custom_rules = [
    RedactionRule(
        name="Replace SSN with [SSN]", 
        description="Social Security Numbers",
        data_type="SSN"
    ),
    RedactionRule(
        name="Replace addresses with [ADDRESS]", 
        description="Physical addresses",
        data_type="ADDRESS"
    )
]

result = llm_redact.mask(
    "My SSN is 123-45-6789 and I live at 123 Main St",
    rules=custom_rules
)

Using the Client Directly

from llm_redact import LLMRedactClient

client = LLMRedactClient(
    llm_host_url="http://localhost:8000",
    database_url="sqlite:///custom.db"
)

result = client.mask("Sensitive text here")

# Get history
history = client.get_history(limit=10)

# Create custom rules
rule = client.create_rule(
    name="Replace API keys with [API_KEY]",
    description="API keys and tokens"
)

Prerequisites

  1. LLM Host Server: Run the llm-redact host server locally:

    # Install and run the LLM host
    ollama serve
    ollama pull gemma3:1b
    
    # Run llm-redact host server
    python -m llm_redact_host
    
  2. Database: SQLite (default) or any SQLAlchemy-supported database

Supported Redaction Types

  • Personal names → |_NAME_XXXX_|
  • Email addresses → |_EMAIL_XXXX_|
  • Phone numbers → |_PHONE_XXXX_|
  • Countries → |_COUNTRY_XXXX_|
  • Universities → |_UNIVERSITY_XXXX_|
  • Job titles → |_JOB_TITLE_XXXX_|
  • Addresses → |_ADDRESS_XXXX_|
  • Social Security Numbers → |_SSN_XXXX_|
  • Credit card numbers → |_CREDIT_CARD_XXXX_|

Where XXXX is a unique 8-character hash ID for each piece of data.

API Reference

llm_redact.mask(text, rules=None, model=None)

Redact sensitive information from text.

Parameters:

  • text (str): Text to redact
  • rules (list, optional): Custom redaction rules
  • model (str, optional): LLM model to use

Returns: RedactionResult object

RedactionResult

  • original_text: Original input text
  • redacted_text: Text with sensitive data redacted
  • replacements: List of replacements made
  • is_redacted: Whether any redactions were made
  • processing_time_ms: Processing time in milliseconds
  • cached: Whether result was from cache

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_redact-0.1.1.tar.gz (19.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_redact-0.1.1-py3-none-any.whl (27.8 kB view details)

Uploaded Python 3

File details

Details for the file llm_redact-0.1.1.tar.gz.

File metadata

  • Download URL: llm_redact-0.1.1.tar.gz
  • Upload date:
  • Size: 19.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.13.5 Darwin/22.4.0

File hashes

Hashes for llm_redact-0.1.1.tar.gz
Algorithm Hash digest
SHA256 61db1eed1179007a0ad25cda31c6b086088d49632b1902ec9d142906570937a6
MD5 ac6c14c4877a4d042539fb9a26eae010
BLAKE2b-256 3963c44473eded2d91d75e640c57293ddc8fb821a88f880c3cdf4c8f0d42ac65

See more details on using hashes here.

File details

Details for the file llm_redact-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: llm_redact-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 27.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.13.5 Darwin/22.4.0

File hashes

Hashes for llm_redact-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0c08d26a8cdeb469c09addfb1731ae1f8bdd8ada4e2e732a7f187b9c8fcc9a6b
MD5 2ac6d5f387128241d4b3098aae6710ad
BLAKE2b-256 b94690776b1b60c99e65eb5db90e8b28fd157d94698b416c1fef60248b496c0b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page