Skip to main content

Content anonymizer/pseudonymizer — redact sensitive data before sharing with AI

Project description

privatiser-engine

Open source anonymization engine powering Privatiser. Redacts IPs, API keys, secrets, PII, and cloud identifiers from any text - replacing them with structurally valid pseudonyms so context is preserved. Fully reversible.

Everything runs locally. Nothing leaves the machine.

Available as a Python library/CLI and a browser-native JavaScript port.


What it detects

Category Examples Pseudonym format
IP addresses 192.168.1.100, 10.0.0.0/16 10.x.x.x (preserves CIDR)
Email addresses admin@company.com user-1@redacted.example.net
Domain names prod-db.mycompany.com redacted-host-1.example.net
MAC addresses AA:BB:CC:DD:EE:FF AA:BB:CC:00:00:01
AWS Account IDs 123456789012 100000000001
AWS ARNs arn:aws:iam::123...:role/admin Structure preserved, values redacted
S3 buckets s3://my-prod-bucket s3://redacted-bucket-1
API keys AWS, OpenAI, Anthropic, Google, Groq, GitHub, Slack, Azure REDACTED_SECRET_n
Connection strings postgresql://user:pass@host/db REDACTED_CONNSTR_n
JWT tokens eyJhbG... REDACTED_JWT_n
PEM private keys -----BEGIN RSA PRIVATE KEY----- REDACTED_PEM_KEY_n
Bearer tokens Authorization: Bearer sk-... REDACTED_BEARER_n
Generic secrets password = "value" Keyword preserved, value redacted
US phone numbers (555) 123-4567, +1-555-123-4567 (555) 000-0001
UK phone numbers +44 7911 123456 +44 7700 900001
Credit cards 4111 1111 1111 1111 (Luhn validated) 4000-0000-0000-0001
US SSN 123-45-6789 078-05-0001
Passports C12345678 X00000001
IBAN DE89370400440532013000 GB00XXXX000000000001
UUIDs 550e8400-e29b-41d4-... 00000000-0000-4000-a000-...
Azure / GCP IDs Subscription IDs, project IDs Redacted with counter

Skips well-known safe values: 127.0.0.1, 0.0.0.0, localhost, amazonaws.com, github.com, etc.


Python

Install

pip install privatiser

Usage

from privatiser import Privatiser

p = Privatiser()

text = 'server = "192.168.1.100"\npassword = "secret123"'
anonymized, mapping = p.anonymize(text)
# server = "10.0.1.8"
# password = "REDACTED_SECRET_1"

restored = p.deanonymize(anonymized, mapping)
assert restored == text  # perfect round-trip

Category toggles

p = Privatiser(enabled_categories={"pii": False})  # skip phone/card/SSN

Allowlist

p = Privatiser(allowlist=["localhost", "example.com"])  # never redact these

Custom patterns

from privatiser import Privatiser, register_custom

register_custom("ticket_id", r"TICKET-\d{4,6}", "REDACTED_TICKET_{n}")

p = Privatiser()
result, mapping = p.anonymize("Fix TICKET-12345")
# result: "Fix REDACTED_TICKET_1"

CLI

# From stdin
cat config.tf | privatiser anonymize

# From file, save mapping
privatiser anonymize config.tf -o clean.tf -m mapping.json

# Restore
privatiser deanonymize clean.tf -m mapping.json

# Disable categories
privatiser anonymize config.tf -d pii -d aws

JavaScript (browser / Node)

The privatiser.js file is a self-contained browser port with no dependencies. Drop it into any web project or use it in Node.

<script src="privatiser.js"></script>
const p = new Privatiser();
const { result, mapping } = p.anonymize(text);

// Restore
const restored = p.deanonymize(result, mapping);

Options

const p = new Privatiser({
  enabledCategories: { pii: false },          // disable a category
  allowlist: ["localhost", "example.com"],    // never redact these
  customWords: ["mycompany", "prod-server"],  // always redact these
});

How it works

  1. Placeholder pass - before any pattern runs, detected values are replaced with null-byte markers (\x00PRIV_0\x00). This prevents patterns from matching inside already-redacted values.
  2. Pattern priority - patterns run highest-priority first (connection strings before passwords, JWTs before base64, etc.).
  3. Deterministic pseudonyms - the same value always gets the same pseudonym within a session, so repeated occurrences stay consistent.
  4. Structural preservation - pseudonyms match the format of the original (IPs look like IPs, emails look like emails) so downstream tools and AI models aren't confused.
  5. Restore pass - deanonymize() does a simple string replacement of pseudonyms back to originals using the mapping.

Project structure

src/privatiser/
  core.py          - Privatiser class, anonymize/deanonymize logic
  patterns/
    secrets.py     - API keys, JWTs, connection strings, PEM keys
    network.py     - IPs, domains, emails, MACs, URLs
    pii.py         - phone, credit card, SSN, passport, IBAN
    aws.py         - AWS account IDs, ARNs, S3 buckets
    cloud.py       - Azure, GCP identifiers
    identifiers.py - UUIDs, generic identifiers
  cli.py           - Click CLI entrypoint
  web/             - Flask web UI (optional)

privatiser.js      - Self-contained browser/Node JS port
tests/             - pytest test suite

Contributing

See CONTRIBUTING.md. Pattern contributions are especially welcome - if you work with a format that Privatiser doesn't detect yet, opening a PR with a new pattern + tests is the fastest way to get it added.


Attribution

MIT licensed - use it freely in personal and commercial projects. If you build something with it, a "Powered by Privatiser" credit is appreciated but not required.

License

MIT - see LICENSE.

Built and maintained by @XionDot. Web tool at privatiser.net.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

privatiser-0.5.0.tar.gz (29.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

privatiser-0.5.0-py3-none-any.whl (21.2 kB view details)

Uploaded Python 3

File details

Details for the file privatiser-0.5.0.tar.gz.

File metadata

  • Download URL: privatiser-0.5.0.tar.gz
  • Upload date:
  • Size: 29.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for privatiser-0.5.0.tar.gz
Algorithm Hash digest
SHA256 de44f158da6347959642403e15f717675bb335d695a92b8f3b72efb990141ddd
MD5 60ad56d388b345770e08d3ec1d0be431
BLAKE2b-256 43ae68bfb009307085a46e925620c59a7102c949e90cc6f2c4df70c3f6f2e4d0

See more details on using hashes here.

File details

Details for the file privatiser-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: privatiser-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 21.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for privatiser-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 784f79c3d9e060a1aba03ffc85cd4b311aa08c1efb73434dbf16c2fdbee7d254
MD5 08716ca69343f7222897d5c2e29d6cde
BLAKE2b-256 d448a94dbeb39703a2a636f3f20512ca61479ace0c78027d9edc7b3419328b26

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page