Privacy firewall layer for agent systems (Option C MVP)

These details have not been verified by PyPI

Project links

Project description

Privacy Firewall V2 🛡️

Best-in-class multi-language, domain-aware anonymization library for AI applications

🌟 Why Privacy Firewall V2?

This library outperforms competitors by combining:

Domain Awareness: Keep relevant data (medical diagnoses in healthcare, transaction amounts in finance)
Auto Language Detection: Detects 55+ languages automatically with thread-level caching
Locale-Specific Patterns: Country-specific ID formats (Spanish DNI, US SSN, French INSEE, etc.)
Multiple Detection Backends: Pattern matching, Presidio (spaCy), OPF, GLiNER-PII, Nemotron privacy filter, and Transformers
7 Disposition Actions: Keep, Redact, Pseudonymize, Generalize, Mask, Hash, Suppress
Reversible Anonymization: Pseudonymization with secure vault storage
Easy to Extend: Add new locale = add one file

📦 Quick Start

Installation

# From PyPI (basic, pattern-based)
pip install pii-firewall

# Recommended: With Presidio and language detection
pip install "pii-firewall[presidio,langdetect]"

# Full features (includes transformers, OPF, GLiNER)
pip install "pii-firewall[all]"

# Local development install
pip install -e .

# Focused installs
pip install "pii-firewall[opf]"       # OPF runtime (or install from source if your environment requires it)
pip install "pii-firewall[gliner]"    # GLiNER PII models

Basic Usage

from privacy_firewall import create_firewall

# Create healthcare firewall (auto-detects language)
firewall = create_firewall("healthcare")

# Process text
result = firewall.process(
    text="Ana García, 43 años, hipertensión. Prescripción: enalapril 10mg.",
    context={
        "tenant_id": "hospital-001",
        "case_id": "patient-123",
        "thread_id": "consultation-1",
        "actor_id": "doctor-456",
    },
)

print(result.sanitized_text)
# Output: "PERSON_1, [AGE_40-49], hipertensión. enalapril 10mg."
# Notice: Medical terms (hipertensión, enalapril) are KEPT!

🎯 Domain Profiles

Healthcare

Keeps medical data relevant for diagnosis while protecting patient identity:

firewall = create_firewall("healthcare")

# Keeps: diagnoses, medications, procedures, lab values
# Redacts: names, IDs, addresses
# Generalizes: ages (43 → 40-49), dates (specific → month/year)

Finance

Preserves transaction details while protecting PII:

firewall = create_firewall("finance")

# Keeps: transaction amounts, account types, credit scores
# Redacts: medical information, customer PII
# Masks: credit cards (4111...1111)
# Pseudonymizes: account numbers (reversible)

Legal

High anonymity for legal documents:

firewall = create_firewall("legal")

# Keeps: case numbers, statutes, legal references
# Pseudonymizes: party names (reversible for case management)
# Generalizes: all dates to year only
# Redacts: all strong identifiers

🌍 Multi-Language Support

Auto-detects 55+ languages with 0ms overhead after first detection:

firewall = create_firewall("healthcare")

# Spanish - detected automatically
result_es = firewall.process(
    text="Paciente con diabetes tipo 2, DNI 12345678A",
    context={...}
)

# English - detected automatically  
result_en = firewall.process(
    text="Patient with type 2 diabetes, SSN 123-45-6789",
    context={...}
)

# French - detected automatically
result_fr = firewall.process(
    text="Patient avec diabète, INSEE 1234567890123",
    context={...}
)

Supported locales: ES, US, FR, DE, IT, PT, + global patterns

🔧 Advanced Usage

Custom Profiles

from privacy_firewall import (
    PrivacyFirewallV2,
    create_custom_profile,
    EntityDisposition,
    DispositionAction,
)

# Create custom profile
profile = create_custom_profile("legal_discovery")

# Add entity dispositions
profile.add_disposition(EntityDisposition(
    entity_type="PERSON",
    action=DispositionAction.PSEUDONYMIZE,
    confidence_threshold=0.8,
))

profile.add_disposition(EntityDisposition(
    entity_type="CASE_NUMBER",
    action=DispositionAction.KEEP,
    confidence_threshold=0.9,
))

firewall = PrivacyFirewallV2(profile=profile)

Custom Patterns

import re
from privacy_firewall.patterns import EntityPattern

# Add custom pattern at runtime
firewall.add_custom_pattern(EntityPattern(
    entity_type="EMPLOYEE_ID",
    locale="US",
    pattern=re.compile(r"\bEMP-\d{6}\b"),
    confidence=0.95,
    context_words=("employee", "staff", "worker"),
    description="Company employee IDs",
))

Reversible Pseudonymization

# Anonymize
result = firewall.process(text="Contact John Doe at john@example.com", context={...})
print(result.sanitized_text)
# "Contact PERSON_1 at EMAIL_1"

# LLM processes anonymized text
llm_response = "PERSON_1 should verify EMAIL_1 is correct"

# Rehydrate (restore original values)
from privacy_firewall.anonymization_engine import rehydrate_text
mapping = firewall.vault.get_case_mapping(
    tenant_id="...",
    case_id="...",
    thread_id="...",
)
final = rehydrate_text(llm_response, mapping)
print(final)
# "John Doe should verify john@example.com is correct"

Provider-Agnostic SDK Flow

from privacy_firewall import PrivacyFirewallSDK

sdk = PrivacyFirewallSDK.create(domain="healthcare", detector_backend="presidio")

context = {
    "tenant_id": "hospital-001",
    "case_id": "patient-123",
    "thread_id": "consultation-1",
    "actor_id": "doctor-456",
}

# 1) Anonymize input
anon = sdk.anonymize_text(text="Contact John Doe at john@example.com", context=context)

# 2) Call any model client (callable or object with .generate)
def my_llm(prompt: str) -> str:
    return f"Please verify PERSON_1 at EMAIL_1. Input was: {prompt}"

# 3) Rehydrate output
result = sdk.secure_call(
    text="Contact John Doe at john@example.com",
    context=context,
    llm_client=my_llm,
)
print(result.final_text)

GDPR Compliance (Right to be Forgotten)

# Forget all data for a case
deleted = firewall.forget(
    tenant_id="hospital-001",
    case_id="patient-123",
    thread_id="consultation-1",
)
print(f"Deleted {deleted} mappings")

🚀 Web API

Run the FastAPI web server:

cd artifacts/pii-firewall
uvicorn privacy_firewall.web.app:create_app --factory --reload

Access the API at http://localhost:8000/docs

API Example

curl -X POST "http://localhost:8000/api/run" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Ana García, 43 años, hipertensión",
    "tenant_id": "hospital-001",
    "case_id": "patient-123",
    "thread_id": "thread-1",
    "actor_id": "doctor-456",
    "profile": "healthcare",
        "detector_backend": "gliner"
  }'

Web UI

The project includes a Next.js web interface:

cd ../../pii-web-next
npm install
npm run dev

Access at http://localhost:3000

📤 Publish To PyPI

cd artifacts/pii-firewall
python -m pip install --upgrade build twine
python -m build
python -m twine check dist/*

# TestPyPI (recommended first)
python -m twine upload --repository testpypi dist/*

# Production PyPI
python -m twine upload dist/*

Detailed publishing guide: PUBLISHING.md

Suggested release order:

Bump version in pyproject.toml
Build + twine check
Upload to TestPyPI and verify install
Upload to PyPI
Create git tag/release matching the published version

📊 Performance

Language detection: 1-2ms (first message), 0ms (cached)
Pattern matching: <1ms
Presidio NER: 50-200ms (depends on text length)
Transformer NER: 100-500ms (use for accuracy, not speed)
Overall latency: ~50-250ms per request (Presidio mode)

Optimization tips:

Use thread-level language caching (enabled by default)
Preload models on startup: firewall.preload_languages(["es", "en", "fr"])
Use detector_backend="presidio" for best speed/accuracy balance

🏗️ Architecture

src/privacy_firewall/
├── language/              # Auto-detection & routing
│   ├── detector.py       # LanguageDetector (langdetect/fasttext)
│   └── router.py         # LanguageRouter (spaCy model selection)
├── patterns/             # Locale-aware patterns
│   ├── catalog.py        # PatternCatalog
│   └── locales/          # ONE FILE PER LANGUAGE ✨
│       ├── global_patterns.py
│       ├── es_patterns.py
│       ├── us_patterns.py
│       ├── fr_patterns.py
│       ├── de_patterns.py
│       ├── it_patterns.py
│       └── pt_patterns.py
├── profiles/             # Domain profiles
│   ├── profiles.py       # DomainProfile, EntityDisposition
│   └── presets.py        # HEALTHCARE, FINANCE, LEGAL
├── presidio_integration/ # Full Presidio capabilities
│   ├── engine.py         # Analyzer + Anonymizer
│   └── recognizers.py    # Custom recognizers
├── transformers_ner/     # Domain-specific models
│   ├── engine.py         # TransformerNEREngine
│   └── models.py         # BioBERT, FinBERT, etc.
├── unified_detector.py   # Multi-backend orchestration
├── anonymization_engine.py  # Disposition-based anonymization
├── firewall.py        # Next-gen PrivacyFirewall
└── web/                  # FastAPI web interface
    └── app.py            # REST API

🆚 Comparison

Feature	Privacy Firewall V2	Presidio	scrubadub	AWS Comprehend
Domain awareness	✅ Keep relevant data	❌	❌	⚠️ Healthcare only
Multi-language	✅ 55+ auto-detect	✅ Manual	❌ English only	✅ Some
Locale patterns	✅ Per-country	❌	❌	❌
Multiple dispositions	✅	❌ Basic	❌	❌
Transformers	✅ BioBERT, FinBERT	❌	❌	✅ Proprietary
Reversibility	✅ Vault	❌	❌	❌
Custom patterns	✅ Runtime	⚠️ Code	⚠️ Code	❌
Thread caching	✅ 0ms after first	❌	❌	N/A
Open source	✅	✅	✅	❌

🔌 Extending with New Locales

Add support for a new country in 3 steps:

Create pattern file (patterns/locales/nl_patterns.py):

import re
from ..catalog import EntityPattern

NL_BSN = EntityPattern(
    entity_type="NATIONAL_ID",
    locale="NL",
    pattern=re.compile(r"\b\d{9}\b"),
    confidence=0.9,
    context_words=("bsn", "burgerservicenummer"),
    description="Dutch BSN",
)

NL_PATTERNS = [NL_BSN]

Import in patterns/locales/__init__.py:

from .nl_patterns import NL_PATTERNS
LOCALE_PATTERNS = [...] + NL_PATTERNS

Add language config (optional, for spaCy models):

# In language/router.py
"nl": LanguageConfig(
    language_code="nl",
    spacy_model="nl_core_news_sm",
    patterns_locale="NL",
),

Done! Dutch patterns now available automatically.

📚 Documentation

Developer Guide (HTML) - Complete implementation and usage guide
PUBLISHING.md - Package release checklist and PyPI flow
tests_integration/README.md - Integration test notes

To show the guide in a panel in VS Code:

Open docs/guide.html
Select Open Preview (or use Ctrl+Shift+V)

🧪 Testing

# Unit tests
pytest tests/

# Integration tests
pytest tests_integration/

# Quick package smoke test
python -c "import privacy_firewall; print('ok')"

🔐 Security & Privacy

✅ Simple end-to-end anonymize→LLM→rehydrate flow
✅ Reversible pseudo-anonymization with vault
✅ Pluggable vault storage (in-memory and SQLite)
✅ GDPR "right to be forgotten"
✅ Audit trails in result.trace
✅ No data leaves your infrastructure

📝 License

MIT License - see LICENSE file for details.

Commercial licensing options are also available. Contact: info@botlance.ai

🤝 Contributing

Contributions welcome! Areas to contribute:

New locale patterns (add your country!)
Domain profiles (education, government, etc.)
Custom recognizers
Performance optimizations
Documentation improvements

🙏 Acknowledgments

Built with:

Presidio - Microsoft's PII detection library
spaCy - Industrial-strength NLP
langdetect - Fast language detection
transformers - State-of-the-art NLP models

Built with ❤️ for privacy-first AI applications

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.5

Jun 14, 2026

0.3.4

Jun 14, 2026

0.3.3

Jun 14, 2026

0.3.0

Jun 13, 2026

0.2.9

Jun 13, 2026

This version

0.2.8

Jun 12, 2026

0.2.7

Jun 11, 2026

0.2.6

Jun 11, 2026

0.2.5

Jun 11, 2026

0.2.4

Jun 7, 2026

0.2.3

May 21, 2026

0.2.2

May 16, 2026

0.2.1

May 13, 2026

0.2.0

May 11, 2026

0.1.0

May 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pii_firewall-0.2.8.tar.gz (74.2 kB view details)

Uploaded Jun 12, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pii_firewall-0.2.8-py3-none-any.whl (89.7 kB view details)

Uploaded Jun 12, 2026 Python 3

File details

Details for the file pii_firewall-0.2.8.tar.gz.

File metadata

Download URL: pii_firewall-0.2.8.tar.gz
Upload date: Jun 12, 2026
Size: 74.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for pii_firewall-0.2.8.tar.gz
Algorithm	Hash digest
SHA256	`5836d1c1caf1cfc816e2043b626a5ef488741af24dac43ff5ec94b22296be941`
MD5	`f629d0deec5b34e3266522cef2c6d33e`
BLAKE2b-256	`ff30d400d8864821db06dbfaa47114767a334b7eab1d422a62968fdc9ee13d11`

See more details on using hashes here.

File details

Details for the file pii_firewall-0.2.8-py3-none-any.whl.

File metadata

Download URL: pii_firewall-0.2.8-py3-none-any.whl
Upload date: Jun 12, 2026
Size: 89.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for pii_firewall-0.2.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`90bbba6ca3c1c4ebbbf8586677f9539e2303ce22eb8b57e674ff4acdb166ff8c`
MD5	`6583016e2d3dc03e711b6bb4c442c1b4`
BLAKE2b-256	`c80093da3dbc66761fa533f17c8892bfe70ced8dc03d91b2b0e59dbd7fbb037d`

See more details on using hashes here.

pii-firewall 0.2.8

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Privacy Firewall V2 🛡️

🌟 Why Privacy Firewall V2?

📦 Quick Start

Installation

Basic Usage

🎯 Domain Profiles

Healthcare

Finance

Legal

🌍 Multi-Language Support

🔧 Advanced Usage

Custom Profiles

Custom Patterns

Reversible Pseudonymization

Provider-Agnostic SDK Flow

GDPR Compliance (Right to be Forgotten)

🚀 Web API

API Example

Web UI

📤 Publish To PyPI

📊 Performance

🏗️ Architecture

🆚 Comparison

🔌 Extending with New Locales

📚 Documentation

🧪 Testing

🔐 Security & Privacy

📝 License

🤝 Contributing

🙏 Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes