Skip to main content

Content anonymizer/pseudonymizer — redact sensitive data before sharing with AI

Project description

privatiser-engine

PyPI License: MIT

Open source anonymization engine powering Privatiser. Redacts IPs, API keys, secrets, PII, and cloud identifiers from any text - replacing them with structurally valid pseudonyms so context is preserved. Fully reversible.

Everything runs locally. Nothing leaves the machine.

Available as a Python library/CLI and a browser-native JavaScript port.


What it detects

Category Examples Pseudonym format
IP addresses 192.168.1.100, 10.0.0.0/16 10.x.x.x (preserves CIDR)
Email addresses admin@company.com user-1@redacted.example.net
Domain names prod-db.mycompany.com redacted-host-1.example.net
MAC addresses AA:BB:CC:DD:EE:FF AA:BB:CC:00:00:01
AWS Account IDs 123456789012 100000000001
AWS ARNs arn:aws:iam::123...:role/admin Structure preserved, values redacted
S3 buckets s3://my-prod-bucket s3://redacted-bucket-1
API keys AWS, OpenAI, Anthropic, Google, Groq, GitHub, Slack, Azure REDACTED_SECRET_n
Connection strings postgresql://user:pass@host/db REDACTED_CONNSTR_n
JWT tokens eyJhbG... REDACTED_JWT_n
PEM private keys -----BEGIN RSA PRIVATE KEY----- REDACTED_PEM_KEY_n
Bearer tokens Authorization: Bearer sk-... REDACTED_BEARER_n
Generic secrets password = "value" Keyword preserved, value redacted
US phone numbers (555) 123-4567, +1-555-123-4567 (555) 000-0001
UK phone numbers +44 7911 123456 +44 7700 900001
Credit cards 4111 1111 1111 1111 (Luhn validated) 4000-0000-0000-0001
US SSN 123-45-6789 078-05-0001
Passports C12345678 X00000001
IBAN DE89370400440532013000 GB00XXXX000000000001
UUIDs 550e8400-e29b-41d4-... 00000000-0000-4000-a000-...
Azure / GCP IDs Subscription IDs, project IDs Redacted with counter

Skips well-known safe values: 127.0.0.1, 0.0.0.0, localhost, amazonaws.com, github.com, etc.


Python

Install

pip install privatiser

Usage

from privatiser import Privatiser

p = Privatiser()

text = 'server = "192.168.1.100"\npassword = "secret123"'
anonymized, mapping = p.anonymize(text)
# server = "10.0.1.8"
# password = "REDACTED_SECRET_1"

restored = p.deanonymize(anonymized, mapping)
assert restored == text  # perfect round-trip

Category toggles

p = Privatiser(enabled_categories={"pii": False})  # skip phone/card/SSN

Allowlist

p = Privatiser(allowlist=["localhost", "example.com"])  # never redact these

Custom patterns

from privatiser import Privatiser, register_custom

register_custom("ticket_id", r"TICKET-\d{4,6}", "REDACTED_TICKET_{n}")

p = Privatiser()
result, mapping = p.anonymize("Fix TICKET-12345")
# result: "Fix REDACTED_TICKET_1"

CLI

# From stdin
cat config.tf | privatiser anonymize

# From file, save mapping
privatiser anonymize config.tf -o clean.tf -m mapping.json

# Restore
privatiser deanonymize clean.tf -m mapping.json

# Disable categories
privatiser anonymize config.tf -d pii -d aws

JavaScript (browser / Node)

The privatiser.js file is a self-contained browser port with no dependencies. Drop it into any web project or use it in Node.

<script src="privatiser.js"></script>
const p = new Privatiser();
const { result, mapping } = p.anonymize(text);

// Restore
const restored = p.deanonymize(result, mapping);

Options

const p = new Privatiser({
  enabledCategories: { pii: false },          // disable a category
  allowlist: ["localhost", "example.com"],    // never redact these
  customWords: ["mycompany", "prod-server"],  // always redact these
});

How it works

  1. Placeholder pass - before any pattern runs, detected values are replaced with null-byte markers (\x00PRIV_0\x00). This prevents patterns from matching inside already-redacted values.
  2. Pattern priority - patterns run highest-priority first (connection strings before passwords, JWTs before base64, etc.).
  3. Deterministic pseudonyms - the same value always gets the same pseudonym within a session, so repeated occurrences stay consistent.
  4. Structural preservation - pseudonyms match the format of the original (IPs look like IPs, emails look like emails) so downstream tools and AI models aren't confused.
  5. Restore pass - deanonymize() does a simple string replacement of pseudonyms back to originals using the mapping.

Project structure

src/privatiser/
  core.py          - Privatiser class, anonymize/deanonymize logic
  patterns/
    secrets.py     - API keys, JWTs, connection strings, PEM keys
    network.py     - IPs, domains, emails, MACs, URLs
    pii.py         - phone, credit card, SSN, passport, IBAN
    aws.py         - AWS account IDs, ARNs, S3 buckets
    cloud.py       - Azure, GCP identifiers
    identifiers.py - UUIDs, generic identifiers
  cli.py           - Click CLI entrypoint
  web/             - Flask web UI (optional)

privatiser.js      - Self-contained browser/Node JS port
tests/             - pytest test suite

Contributing

See CONTRIBUTING.md. Pattern contributions are especially welcome - if you work with a format that Privatiser doesn't detect yet, opening a PR with a new pattern + tests is the fastest way to get it added.


Attribution

MIT licensed - use it freely in personal and commercial projects. If you build something with it, a "Powered by Privatiser" credit is appreciated but not required.

License

MIT - see LICENSE.

Built and maintained by @XionDot. Web tool at privatiser.net.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

privatiser-0.6.0.tar.gz (48.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

privatiser-0.6.0-py3-none-any.whl (31.0 kB view details)

Uploaded Python 3

File details

Details for the file privatiser-0.6.0.tar.gz.

File metadata

  • Download URL: privatiser-0.6.0.tar.gz
  • Upload date:
  • Size: 48.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for privatiser-0.6.0.tar.gz
Algorithm Hash digest
SHA256 91fc4ae4446452709f8a45ca0599b4b693e579149cf691ee4b6e158e76de2259
MD5 dcc9c88af0bda4e1791f75fdbbfbef90
BLAKE2b-256 e39d66bbbadc9ecc44df648ba5e3d7779193b4694e2a252c58d20b9216647402

See more details on using hashes here.

File details

Details for the file privatiser-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: privatiser-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 31.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for privatiser-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 562e6a7d7509d4f82a6ed08c9cc1c88a82153346710959c4d1fb038d2c287719
MD5 0731d9362a2382efadc40c5ab6247896
BLAKE2b-256 ca2c16b12755a2c0843dbb252e8f30b5b46671e9c16499a7e4dd61945e60c23d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page