Cloud-powered PII detection and masking with local fallback support.
Project description
AnotiAI PII Masker - Cloud-Powered Privacy Protection
A lightweight Python package for detecting and masking personally identifiable information (PII) in text using cloud-based AI models with optional local fallback.
๐ Features
- โ๏ธ Cloud-Powered: Uses state-of-the-art AI models hosted on RunPod for maximum accuracy
- โก Lightning Fast: ~2-3 seconds inference time (after model warm-up)
- ๐ก Intelligent: Combines multiple detection approaches (rule-based, ML, transformers)
- ๐ Reversible: Mask and unmask PII while preserving data structure
- ๐ก๏ธ Privacy-First: No data storage - all processing is ephemeral
- ๐ฆ Lightweight: Minimal dependencies for cloud mode (~10MB vs ~5GB local)
- ๐ง Flexible: Support for both cloud and local inference modes
- ๐ฏ Simple API: Just provide your user API key - no complex setup required
๐ง Installation
Cloud Mode (Recommended)
pip install anotiai-pii-masker
Local Mode (Full Dependencies)
pip install anotiai-pii-masker[local]
Development
pip install anotiai-pii-masker[dev]
๐ Quick Start
Cloud Inference (Default)
from anotiai_pii_masker import WhosePIIGuardian
# Simple setup - just provide your user API key
guardian = WhosePIIGuardian(
user_api_key="your_jwt_api_key"
)
# Mask PII in text
text = "Hi, I'm John Doe and my email is john.doe@company.com"
result = guardian.mask_text(text)
print(f"Original: {text}")
print(f"Masked: {result['masked_text']}")
# Output: "Hi, I'm [REDACTED_NAME_1] and my email is [REDACTED_EMAIL_1]"
# Unmask when needed
unmask_result = guardian.unmask_text(result['masked_text'], result['pii_map'])
print(f"Unmasked: {unmask_result['unmasked_text']}")
Local Inference (Fallback)
# Requires pip install anotiai-pii-masker[local]
guardian = WhosePIIGuardian(local_mode=True)
# Same API as cloud mode
result = guardian.mask_text(text)
print(f"Masked: {result['masked_text']}")
Cloud with Local Fallback
# Automatically falls back to local if cloud fails
guardian = WhosePIIGuardian(
user_api_key="your_jwt_api_key",
local_fallback=True
)
๐ Getting API Credentials
- Get your JWT API key from AnotiAI
- No RunPod setup required - credentials are handled automatically
- Simple usage - just provide your user API key
# That's it! No complex setup needed
guardian = WhosePIIGuardian(user_api_key="your_jwt_api_key")
๐ Advanced Usage
Detection Only
# Get detected entities without masking
result = guardian.detect_pii(text)
print(f"Found {result['entities_found']} PII entities")
for entity in result['pii_results']:
print(f"- {entity['type']}: {entity['value']} (confidence: {entity['confidence']})")
Confidence Thresholds
# Adjust sensitivity (0.0 = very sensitive, 1.0 = very strict)
result = guardian.mask_text(text, confidence_threshold=0.8)
Token Usage Tracking
# All methods return detailed token usage information
result = guardian.mask_text(text)
print(f"Input tokens: {result['usage']['input_tokens']}")
print(f"Output tokens: {result['usage']['output_tokens']}")
print(f"Total tokens: {result['usage']['total_tokens']}")
# Usage tracking for unmasking
unmask_result = guardian.unmask_text(masked_text, pii_map)
print(f"Unmasked tokens: {unmask_result['usage']['output_tokens']}")
Error Handling
from anotiai_pii_masker import WhosePIIGuardian, APIError
try:
guardian = WhosePIIGuardian(user_api_key="your_jwt_api_key")
result = guardian.mask_text(text)
except APIError as e:
print(f"API Error: {e}")
except Exception as e:
print(f"Error: {e}")
Health Check
# Check if the service is healthy
health = guardian.health_check()
print(f"Service status: {health['status']}")
๐๏ธ Architecture
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโ
โ Your App โ โ anotiai-pii- โ โ RunPod Cloud โ
โ โโโโโถโ masker โโโโโถโ โ
โ guardian.mask() โ โ (lightweight) โ โ GPU Models โ
โโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโ โ โข DeBERTa โ
โ โข RoBERTa โ
โ โข Presidio โ
โ โข spaCy โ
โโโโโโโโโโโโโโโโโโโ
๐ง Technical Implementation
Token Counting Algorithm
The package uses a sophisticated token counting system:
def count_tokens(data: Any) -> int:
"""
Calculates token count based on character length:
- Strings: Character count
- Dicts/Lists: JSON serialization length
- Other types: String representation length
"""
Usage Tracking Flow
- Input Processing: Counts original text characters
- Output Processing: Counts masked text + PII map JSON
- Total Calculation: Sums input and output tokens
- Billing Integration: Returns structured usage data
API Response Format
{
"masked_text": "My name is [REDACTED_NAME_1]",
"pii_map": {"__TOKEN_1__": {...}},
"entities_found": 1,
"confidence_threshold": 0.5,
"usage": {
"input_tokens": 15, # Original text length
"output_tokens": 25, # Masked text + PII map JSON
"total_tokens": 40 # Total for billing
}
}
๐ Supported PII Types
- Personal: Names, dates of birth, addresses
- Contact: Email addresses, phone numbers, URLs
- Financial: Credit card numbers, bank accounts
- Government: SSNs, passport numbers, license numbers
- Healthcare: Medical license numbers
- Technical: IP addresses, crypto addresses
๐ Security & Privacy
- No Data Storage: All processing is ephemeral
- Encrypted Transit: HTTPS/TLS for all API communications
- Reversible Masking: Original data can be restored when needed
- Configurable Thresholds: Adjust sensitivity based on your needs
๐จ Migration from v1.x
Version 2.0 introduces cloud-first architecture with simplified API. To migrate:
# v1.x (local only)
from anotiai_pii_masker import WhosePIIGuardian
guardian = WhosePIIGuardian()
masked_text, pii_map = guardian.mask_text(text)
# v2.x (cloud-first with simplified API)
guardian = WhosePIIGuardian(user_api_key="your_jwt_api_key")
result = guardian.mask_text(text)
masked_text = result['masked_text']
pii_map = result['pii_map']
๐ Performance
| Mode | Setup Time | Inference Time | Memory Usage | Accuracy |
|---|---|---|---|---|
| Cloud | ~1s | ~2-3s | ~50MB | 99.5% |
| Local | ~30s | ~5-10s | ~8GB | 99.5% |
๐ Token Usage & Billing
The package provides comprehensive token usage tracking for accurate billing and monitoring:
Automatic Token Counting
- Input tokens: Counted from original text
- Output tokens: Counted from masked text + PII map
- Total tokens: Sum of input and output tokens
- JSON serialization: PII maps are counted as JSON character length
Usage Examples
# Masking with token tracking
result = guardian.mask_text("My name is John Doe")
print(f"Input: {result['usage']['input_tokens']} tokens")
print(f"Output: {result['usage']['output_tokens']} tokens")
print(f"Total: {result['usage']['total_tokens']} tokens")
# Unmasking with token tracking
unmask_result = guardian.unmask_text(masked_text, pii_map)
print(f"Restored: {unmask_result['usage']['output_tokens']} tokens")
Billing Integration
# Track usage across multiple operations
total_input_tokens = 0
total_output_tokens = 0
for text in texts:
result = guardian.mask_text(text)
total_input_tokens += result['usage']['input_tokens']
total_output_tokens += result['usage']['output_tokens']
print(f"Total processed: {total_input_tokens + total_output_tokens} tokens")
๐ฏ Key Benefits
- Simplified Setup: Just provide your JWT API key - no complex RunPod configuration
- Automatic Fallback: Seamlessly switches to local mode if cloud is unavailable
- Production Ready: Battle-tested on RunPod Serverless infrastructure
- Cost Effective: Pay-per-use pricing with no idle costs
- Enterprise Grade: Built for scale with proper error handling and monitoring
- Usage Tracking: Comprehensive token counting for accurate billing
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Documentation
- API Reference: Complete API documentation with examples
- Quick Reference: Developer quick reference guide
- GitHub README: Main documentation
๐ Support
- Issues: GitHub Issues
- Email: ask@anotiai.com
Protect your users' privacy with AnotiAI PII Masker ๐ก๏ธ
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file anotiai_pii_masker-0.0.2.tar.gz.
File metadata
- Download URL: anotiai_pii_masker-0.0.2.tar.gz
- Upload date:
- Size: 45.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4ebac134025aef228eb822b202db32de77f84216ca4fc60202a0844231bde3f6
|
|
| MD5 |
b87c69691150ef16fc5cbcbe2a6f750f
|
|
| BLAKE2b-256 |
3f009bc1eeb5737dd40b9cf0b73eda4172509a8624821b2894fc35239daf83ce
|
File details
Details for the file anotiai_pii_masker-0.0.2-py3-none-any.whl.
File metadata
- Download URL: anotiai_pii_masker-0.0.2-py3-none-any.whl
- Upload date:
- Size: 57.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
577f44282233c3d341a1472134b1f57f9c64ed6d88c5ae142c0629da558f848e
|
|
| MD5 |
2e0e254d41b1e1c3ec294052be4693aa
|
|
| BLAKE2b-256 |
8ad285bcaff32da0eecac18df25c5fdf25b12b622de9cc6397d60dfe8a4d7fa2
|