Privacy-first text anonymization tool with enterprise-grade accuracy for removing PII from documents

These details have not been verified by PyPI

Project description

🕵️ Anon - Privacy-First Text Anonymizer

A powerful, offline-first text anonymization tool that removes personal identifiable information (PII) from text while keeping all data on your machine. Built with enterprise-grade accuracy using spaCy NER models and Microsoft Presidio.

✨ Features

🔒 100% Offline - All processing happens on your machine
🎯 High Accuracy - Advanced NER using spaCy large models + Presidio
🔐 Secure Always-Redact - Custom sensitive terms stored securely in ~/.anonymizer
🖥️ Multiple Interfaces - Modern GUI, Web API, and CLI
🚀 Background Processing - CLIs run detached with proper logging
📦 Easy Installation - One-command install with automatic model setup
🏢 Cross-Platform - Windows, macOS, and Linux support

🚀 Quick Start

Installation

pip install simple-anonymizer

The installation will automatically download the required spaCy model (en_core_web_lg) for optimal accuracy.

GUI Application

Launch the modern GUI interface:

anon-gui

✅ The GUI runs in background - you can close the terminal after launch

📝 Logs available at ~/.anonymizer/gui_YYYYMMDD_HHMMSS.log

Web Interface

Start the web server:

anon-web start

✅ Server runs in background - accessible at http://127.0.0.1:8080

📝 Comprehensive logging and process management

Web Server Management

# Start server (custom host/port)
anon-web start --host 0.0.0.0 --port 5000

# Check server status
anon-web status

# View recent logs
anon-web logs

# Stop server
anon-web stop

# Clean old log files (preserves always-redact settings)
anon-web clean

Always-Redact Management

Securely manage custom sensitive terms that should always be anonymized:

# Add terms to always-redact list
anon-web add-redact "CompanyName"
anon-web add-redact "ProjectCodename"

# Remove terms from always-redact list
anon-web remove-redact "ProjectCodename"

# List all always-redacted terms
anon-web list-redact

🔐 Security Features:

Terms stored securely in ~/.anonymizer/always_redact.txt
Not visible in GUI or web interfaces (add/remove only)
Persists across all anonymization operations
Case-insensitive matching with duplicate prevention

Python API

from anonymizer_core import redact

# Basic anonymization
result = redact("John Doe works at Microsoft in Seattle.")
print(result.text)
# Output: "<REDACTED> works at <REDACTED> in <REDACTED>."

# Always-redact terms are automatically applied
# (managed via CLI commands shown above)
result = redact("Contact john@acme.com about AcmeProject details.")
print(result.text)
# Output: "Contact <REDACTED> about <REDACTED> details."
# (if "AcmeProject" was added to always-redact list)

🔐 Data Security & Privacy

Always-Redact Terms

Secure Storage: Custom sensitive terms are stored in ~/.anonymizer/always_redact.txt
No Shipping: The file is created locally on first use, never shipped with the package
Privacy-First: Terms are not exposed through GUI or web interfaces
CLI-Only Access: Terms can only be viewed via command line for security
Persistent: Settings survive application updates and log cleanups

File Locations

# User data directory
~/.anonymizer/
├── always_redact.txt         # Your custom sensitive terms
├── gui_YYYYMMDD_HHMMSS.log  # GUI application logs
└── web_server_*.log         # Web server logs

Data Flow

Input Text → Standard PII Detection (emails, phones, etc.)
Input Text → Always-Redact Terms (your custom words)
Combined Results → Final Anonymized Output

🔧 Advanced Usage

GUI Features

Modern Interface: Clean, intuitive design with real-time processing
Secure Term Management: Add/remove always-redact terms without exposure
File Processing: Load and save text files directly
Background Processing: Non-blocking anonymization with progress indicators

Web API Features

RESTful Endpoints: Standard HTTP API for integration
File Upload: Process text files via web interface
JSON Response: Structured output with metadata
Health Checks: Monitor service status programmatically

CLI Management

Process Control: Start/stop/status for web server
Log Management: View and clean application logs
Term Management: Secure always-redact term administration
Background Operation: All services run detached from terminal

🛠️ Technical Details

Anonymization Engine

Multi-Tier Processing: Pattern-based → Always-redact → NER fallback
Position Tracking: Prevents overlapping redactions for accuracy
Case Insensitive: Always-redact terms match regardless of case
Word Boundaries: Only complete words are redacted (not partial matches)

Supported Entity Types

Emails: john@example.com
URLs: https://example.com
IP Addresses: 192.168.1.1
Phone Numbers: +1-555-123-4567
Custom Terms: Your always-redact list
Names: Via NER when available
Organizations: Via NER when available

📋 Examples & Use Cases

Basic Anonymization

from anonymizer_core import redact

text = "Please contact John Smith at john.smith@acme.com or call +1-555-0123."
result = redact(text)
print(result.text)
# Output: "Please contact <REDACTED> at <REDACTED> or call <REDACTED>."

Company-Specific Anonymization

# Set up company-specific terms
anon-web add-redact "AcmeCorp"
anon-web add-redact "ProjectTitan"
anon-web add-redact "confidential"

# Now these terms are always redacted
python -c "
from anonymizer_core import redact
text = 'AcmeCorp confidential: ProjectTitan budget is 500K'
print(redact(text).text)
"
# Output: "<REDACTED> <REDACTED>: <REDACTED> budget is 500K"

Enterprise Integration

# Configure once via CLI
# anon-web add-redact "YourCompanyName"
# anon-web add-redact "YourProduct"

# Use in your application
from anonymizer_core import redact

def process_support_ticket(ticket_text):
    """Anonymize support tickets before logging."""
    result = redact(ticket_text)
    return result.text

# All company-specific terms are automatically redacted
anonymized = process_support_ticket(
    "Customer john@email.com reported YourProduct crashed on YourCompanyName servers."
)
print(anonymized)
# Output: "Customer <REDACTED> reported <REDACTED> crashed on <REDACTED> servers."

Batch Processing

# Set up your terms once
anon-web add-redact "SensitiveTerm1"
anon-web add-redact "SensitiveTerm2"

# Process multiple files - terms persist across all operations
for file in *.txt; do
    python -c "
from anonymizer_core import redact
with open('$file', 'r') as f:
    content = f.read()
with open('anonymized_$file', 'w') as f:
    f.write(redact(content).text)
    "
done

Security Audit

# List all configured terms (CLI only for security)
anon-web list-redact

# Remove terms that are no longer sensitive
anon-web remove-redact "OldProjectName"

# Clean logs while preserving term configuration
anon-web clean

🚨 Security Best Practices

Always-Redact Configuration

Review Regularly: Audit your always-redact terms periodically
Principle of Least Privilege: Only add terms that truly need redaction
Team Coordination: Ensure team members know which terms are configured
Backup: Consider backing up ~/.anonymizer/always_redact.txt securely

Production Deployment

Isolated Environment: Deploy in secure, isolated environments
Log Management: Regularly clean logs with anon-web clean
Access Control: Restrict CLI access to authorized personnel only
Monitor Usage: Review anonymization logs for compliance

📊 CLI Command Reference

Server Management

anon-web start [--host HOST] [--port PORT]  # Start web server
anon-web stop                                # Stop web server  
anon-web status                              # Check server status
anon-web logs                                # View recent logs
anon-web clean                               # Clean old logs (preserve settings)

Always-Redact Management

anon-web add-redact "TERM"                   # Add term to always-redact list
anon-web remove-redact "TERM"                # Remove term from list
anon-web list-redact                         # List all terms (CLI only)

GUI Launch

anon-gui                                     # Launch GUI application

🔍 Troubleshooting

Common Issues

Terms not being redacted?

Verify term was added: anon-web list-redact
Check exact spelling and case sensitivity
Ensure word boundaries (partial matches won't work)

GUI/Web not reflecting new terms?

This is by design for security
Terms are automatically applied during anonymization
Use CLI list-redact to verify configuration

Server won't start?

Check if port is already in use: anon-web status
Try different port: anon-web start --port 8081
Check logs: anon-web logs

Performance issues?

Clean old logs: anon-web clean
For large texts, consider batch processing
Restart services if needed: anon-web stop && anon-web start

Need help? Check the logs in ~/.anonymizer/ for detailed error information.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.1.18

Jul 23, 2025

0.1.17

Jul 21, 2025

0.1.16

Jul 21, 2025

0.1.15

Jul 21, 2025

0.1.14

Jul 21, 2025

This version

0.1.13

Jul 21, 2025

0.1.12

Jul 21, 2025

0.1.11

Jul 21, 2025

0.1.10

Jul 21, 2025

0.1.9

Jul 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simple_anonymizer-0.1.13.tar.gz (334.9 kB view details)

Uploaded Jul 21, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

simple_anonymizer-0.1.13-py3-none-any.whl (56.9 kB view details)

Uploaded Jul 21, 2025 Python 3

File details

Details for the file simple_anonymizer-0.1.13.tar.gz.

File metadata

Download URL: simple_anonymizer-0.1.13.tar.gz
Upload date: Jul 21, 2025
Size: 334.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.3 CPython/3.13.5 Darwin/24.5.0

File hashes

Hashes for simple_anonymizer-0.1.13.tar.gz
Algorithm	Hash digest
SHA256	`812a89a602bc526e183b561c87ea576a742e5496800e021ea83fff222d7b15a7`
MD5	`2d15b10e639e2ce7c7c813968d059083`
BLAKE2b-256	`d1be62be8c25978e1cb21b2cf8173cabe969b2b3187cf2abd63c7eac41f759ad`

See more details on using hashes here.

File details

Details for the file simple_anonymizer-0.1.13-py3-none-any.whl.

File metadata

Download URL: simple_anonymizer-0.1.13-py3-none-any.whl
Upload date: Jul 21, 2025
Size: 56.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.3 CPython/3.13.5 Darwin/24.5.0

File hashes

Hashes for simple_anonymizer-0.1.13-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f77ac3fc72fd4afb31c9203a0ee00ab45f156336e23b62b6201eed2491af0930`
MD5	`01c2cf9485a02857cee67a3369bb2416`
BLAKE2b-256	`59a8ab74e0a1b5e96aa00367062fe2ad60a32935760b812fce399ea383b7539d`

See more details on using hashes here.

simple-anonymizer 0.1.13

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

🕵️ Anon - Privacy-First Text Anonymizer

✨ Features

🚀 Quick Start

Installation

GUI Application

Web Interface

Web Server Management

Always-Redact Management

Python API

🔐 Data Security & Privacy

Always-Redact Terms

File Locations

Data Flow

🔧 Advanced Usage

GUI Features

Web API Features

CLI Management

🛠️ Technical Details

Anonymization Engine

Supported Entity Types

📋 Examples & Use Cases

Basic Anonymization

Company-Specific Anonymization

Enterprise Integration

Batch Processing

Security Audit

🚨 Security Best Practices

Always-Redact Configuration

Production Deployment

📊 CLI Command Reference

Server Management

Always-Redact Management

GUI Launch

🔍 Troubleshooting

Common Issues

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes