Skip to main content

Cloud-powered PII detection and masking with local fallback support.

Project description

AnotiAI PII Masker - Cloud-Powered Privacy Protection

A lightweight Python package for detecting and masking personally identifiable information (PII) in text using cloud-based AI models with optional local fallback.

PyPI version Python 3.8+ License: MIT

๐Ÿš€ Features

  • โ˜๏ธ Cloud-Powered: Uses state-of-the-art AI models hosted on RunPod for maximum accuracy
  • โšก Lightning Fast: ~2-3 seconds inference time (after model warm-up)
  • ๐Ÿ’ก Intelligent: Combines multiple detection approaches (rule-based, ML, transformers)
  • ๐Ÿ”„ Reversible: Mask and unmask PII while preserving data structure
  • ๐Ÿ›ก๏ธ Privacy-First: No data storage - all processing is ephemeral
  • ๐Ÿ“ฆ Lightweight: Minimal dependencies for cloud mode (~10MB vs ~5GB local)
  • ๐Ÿ”ง Flexible: Support for both cloud and local inference modes
  • ๐ŸŽฏ Simple API: Just provide your user API key - no complex setup required

๐Ÿ”ง Installation

Cloud Mode (Recommended)

pip install anotiai-pii-masker

Local Mode (Full Dependencies)

pip install anotiai-pii-masker[local]

Development

pip install anotiai-pii-masker[dev]

๐Ÿš€ Quick Start

Cloud Inference (Default)

from anotiai_pii_masker import WhosePIIGuardian

# Simple setup - just provide your user API key
guardian = WhosePIIGuardian(
    user_api_key="your_jwt_api_key"
)

# Mask PII in text
text = "Hi, I'm John Doe and my email is john.doe@company.com"
result = guardian.mask_text(text)

print(f"Original: {text}")
print(f"Masked: {result['masked_text']}")
# Output: "Hi, I'm [REDACTED_NAME_1] and my email is [REDACTED_EMAIL_1]"

# Unmask when needed
unmask_result = guardian.unmask_text(result['masked_text'], result['pii_map'])
print(f"Unmasked: {unmask_result['unmasked_text']}")

Local Inference (Fallback)

# Requires pip install anotiai-pii-masker[local]
guardian = WhosePIIGuardian(local_mode=True)

# Same API as cloud mode
result = guardian.mask_text(text)
print(f"Masked: {result['masked_text']}")

Cloud with Local Fallback

# Automatically falls back to local if cloud fails
guardian = WhosePIIGuardian(
    user_api_key="your_jwt_api_key",
    local_fallback=True
)

๐Ÿ”‘ Getting API Credentials

  1. Get your JWT API key from AnotiAI
  2. No RunPod setup required - credentials are handled automatically
  3. Simple usage - just provide your user API key
# That's it! No complex setup needed
guardian = WhosePIIGuardian(user_api_key="your_jwt_api_key")

๐Ÿ“– Advanced Usage

Detection Only

# Get detected entities without masking
result = guardian.detect_pii(text)
print(f"Found {result['entities_found']} PII entities")

for entity in result['pii_results']:
    print(f"- {entity['type']}: {entity['value']} (confidence: {entity['confidence']})")

Confidence Thresholds

# Adjust sensitivity (0.0 = very sensitive, 1.0 = very strict)
result = guardian.mask_text(text, confidence_threshold=0.8)

Token Usage Tracking

# All methods return detailed token usage information
result = guardian.mask_text(text)
print(f"Input tokens: {result['usage']['input_tokens']}")
print(f"Output tokens: {result['usage']['output_tokens']}")
print(f"Total tokens: {result['usage']['total_tokens']}")

# Usage tracking for unmasking
unmask_result = guardian.unmask_text(masked_text, pii_map)
print(f"Unmasked tokens: {unmask_result['usage']['output_tokens']}")

Error Handling

from anotiai_pii_masker import WhosePIIGuardian, APIError

try:
    guardian = WhosePIIGuardian(user_api_key="your_jwt_api_key")
    result = guardian.mask_text(text)
except APIError as e:
    print(f"API Error: {e}")
except Exception as e:
    print(f"Error: {e}")

Health Check

# Check if the service is healthy
health = guardian.health_check()
print(f"Service status: {health['status']}")

๐Ÿ—๏ธ Architecture

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚   Your App      โ”‚    โ”‚  anotiai-pii-    โ”‚    โ”‚   RunPod Cloud  โ”‚
โ”‚                 โ”‚โ”€โ”€โ”€โ–ถโ”‚     masker       โ”‚โ”€โ”€โ”€โ–ถโ”‚                 โ”‚
โ”‚ guardian.mask() โ”‚    โ”‚   (lightweight)  โ”‚    โ”‚ GPU Models      โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ”‚ โ€ข DeBERTa       โ”‚
                                               โ”‚ โ€ข RoBERTa       โ”‚
                                               โ”‚ โ€ข Presidio      โ”‚
                                               โ”‚ โ€ข spaCy         โ”‚
                                               โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

๐Ÿ”ง Technical Implementation

Token Counting Algorithm

The package uses a sophisticated token counting system:

def count_tokens(data: Any) -> int:
    """
    Calculates token count based on character length:
    - Strings: Character count
    - Dicts/Lists: JSON serialization length
    - Other types: String representation length
    """

Usage Tracking Flow

  1. Input Processing: Counts original text characters
  2. Output Processing: Counts masked text + PII map JSON
  3. Total Calculation: Sums input and output tokens
  4. Billing Integration: Returns structured usage data

API Response Format

{
    "masked_text": "My name is [REDACTED_NAME_1]",
    "pii_map": {"__TOKEN_1__": {...}},
    "entities_found": 1,
    "confidence_threshold": 0.5,
    "usage": {
        "input_tokens": 15,      # Original text length
        "output_tokens": 25,     # Masked text + PII map JSON
        "total_tokens": 40       # Total for billing
    }
}

๐Ÿ“Š Supported PII Types

  • Personal: Names, dates of birth, addresses
  • Contact: Email addresses, phone numbers, URLs
  • Financial: Credit card numbers, bank accounts
  • Government: SSNs, passport numbers, license numbers
  • Healthcare: Medical license numbers
  • Technical: IP addresses, crypto addresses

๐Ÿ”’ Security & Privacy

  • No Data Storage: All processing is ephemeral
  • Encrypted Transit: HTTPS/TLS for all API communications
  • Reversible Masking: Original data can be restored when needed
  • Configurable Thresholds: Adjust sensitivity based on your needs

๐Ÿšจ Migration from v1.x

Version 2.0 introduces cloud-first architecture with simplified API. To migrate:

# v1.x (local only)
from anotiai_pii_masker import WhosePIIGuardian
guardian = WhosePIIGuardian()
masked_text, pii_map = guardian.mask_text(text)

# v2.x (cloud-first with simplified API)
guardian = WhosePIIGuardian(user_api_key="your_jwt_api_key")
result = guardian.mask_text(text)
masked_text = result['masked_text']
pii_map = result['pii_map']

๐Ÿ“ˆ Performance

Mode Setup Time Inference Time Memory Usage Accuracy
Cloud ~1s ~2-3s ~50MB 99.5%
Local ~30s ~5-10s ~8GB 99.5%

๐Ÿ“Š Token Usage & Billing

The package provides comprehensive token usage tracking for accurate billing and monitoring:

Automatic Token Counting

  • Input tokens: Counted from original text
  • Output tokens: Counted from masked text + PII map
  • Total tokens: Sum of input and output tokens
  • JSON serialization: PII maps are counted as JSON character length

Usage Examples

# Masking with token tracking
result = guardian.mask_text("My name is John Doe")
print(f"Input: {result['usage']['input_tokens']} tokens")
print(f"Output: {result['usage']['output_tokens']} tokens") 
print(f"Total: {result['usage']['total_tokens']} tokens")

# Unmasking with token tracking
unmask_result = guardian.unmask_text(masked_text, pii_map)
print(f"Restored: {unmask_result['usage']['output_tokens']} tokens")

Billing Integration

# Track usage across multiple operations
total_input_tokens = 0
total_output_tokens = 0

for text in texts:
    result = guardian.mask_text(text)
    total_input_tokens += result['usage']['input_tokens']
    total_output_tokens += result['usage']['output_tokens']

print(f"Total processed: {total_input_tokens + total_output_tokens} tokens")

๐ŸŽฏ Key Benefits

  • Simplified Setup: Just provide your JWT API key - no complex RunPod configuration
  • Automatic Fallback: Seamlessly switches to local mode if cloud is unavailable
  • Production Ready: Battle-tested on RunPod Serverless infrastructure
  • Cost Effective: Pay-per-use pricing with no idle costs
  • Enterprise Grade: Built for scale with proper error handling and monitoring
  • Usage Tracking: Comprehensive token counting for accurate billing

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ“š Documentation

๐Ÿ†˜ Support


Protect your users' privacy with AnotiAI PII Masker ๐Ÿ›ก๏ธ

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

anotiai_pii_masker-0.0.2.tar.gz (45.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

anotiai_pii_masker-0.0.2-py3-none-any.whl (57.1 kB view details)

Uploaded Python 3

File details

Details for the file anotiai_pii_masker-0.0.2.tar.gz.

File metadata

  • Download URL: anotiai_pii_masker-0.0.2.tar.gz
  • Upload date:
  • Size: 45.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for anotiai_pii_masker-0.0.2.tar.gz
Algorithm Hash digest
SHA256 4ebac134025aef228eb822b202db32de77f84216ca4fc60202a0844231bde3f6
MD5 b87c69691150ef16fc5cbcbe2a6f750f
BLAKE2b-256 3f009bc1eeb5737dd40b9cf0b73eda4172509a8624821b2894fc35239daf83ce

See more details on using hashes here.

File details

Details for the file anotiai_pii_masker-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for anotiai_pii_masker-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 577f44282233c3d341a1472134b1f57f9c64ed6d88c5ae142c0629da558f848e
MD5 2e0e254d41b1e1c3ec294052be4693aa
BLAKE2b-256 8ad285bcaff32da0eecac18df25c5fdf25b12b622de9cc6397d60dfe8a4d7fa2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page