Skip to main content

Generate pseudonymous tokens with built-in plausible deniability

Project description

Stochastic Pseudonymizer

Generate pseudonymous tokens from patron IDs with built-in plausible deniability.

What This Does

Given a patron ID and a secret key, this library produces a short token (like a7f2b3) that:

  1. Is deterministic — the same patron ID always produces the same token
  2. Cannot be reversed — you can't derive the patron ID from the token without the secret
  3. Has intentional collisions — multiple patron IDs may produce the same token

That third property is the key feature. It means that even if someone has your secret key and algorithm, they cannot prove with certainty that a token belongs to a specific patron.

Understanding Plausible Deniability

When you generate tokens, there's a calculable probability that any given patron shares their token with at least one other patron in your population. This is your plausible deniability.

For example, with 350,000 patrons and 6-character tokens:

"There's a 1 in 48 chance this token belongs to a different patron."

This matters in legal contexts. If someone demands you identify a patron from a token, you can truthfully say that the token is ambiguous by design.

Choosing Your Token Size

Larger tokens = fewer collisions = weaker deniability but cleaner analytics. Smaller tokens = more collisions = stronger deniability but noisier analytics.

Use this table to choose. The "1 in X" number is the chance that any given patron shares their token with someone else:

Population 5 chars 6 chars 7 chars
10,000 1 in 105 1 in 1,678 1 in 26,844
50,000 1 in 21 1 in 336 1 in 5,369
100,000 1 in 11 1 in 168 1 in 2,685
350,000 1 in 4 1 in 48 1 in 767
500,000 1 in 3 1 in 34 1 in 537
1,000,000 1 in 2 1 in 17 1 in 269
2,000,000 1 in 1 1 in 9 1 in 135

Important: Plan for Lifetime, Not Just Current

Your population isn't static. Patrons leave, new patrons join. Over 10-20 years, you may tokenize 2-3x more patron IDs than your current active count.

Consider periodically purging or aggregating old transaction data — this is good library policy regardless, and it helps keep your effective population size manageable. Think about your churn rate and growth trajectory when estimating lifetime population.

Plan for lifetime population, not current population.

Quick Recommendations

Library Type Current Patrons Lifetime Estimate Recommended
Small branch 5k - 30k 15k - 100k 5 chars
Medium library 30k - 500k 100k - 1.5M 6 chars
Large consortium 500k+ 1.5M+ 7 chars

Usage

from stochastic_pseudonymizer import StochasticPseudonymizer

# Initialize with your secret and token size
pseudonymizer = StochasticPseudonymizer(
    app_secret="your-secret-key-keep-this-safe",
    token_length=6  # hex characters: 5, 6, or 7
)

# Generate a token from a patron ID
token = pseudonymizer.generate_token(patron_id="P-12345")

print(token)  # e.g., "a7f2b3"

What You Need to Keep Secret

  • app_secret: Anyone with this can generate tokens and potentially match them to patron IDs. Guard it carefully.

What You Can Publish

  • token_length: This is just configuration. Publishing it doesn't compromise privacy.
  • The algorithm: This library is open source. Security comes from the secret, not obscurity.

The "Forever" Decision

Once you start generating tokens with a particular configuration, you cannot change it without invalidating all existing tokens.

Before you begin:

  1. Estimate your lifetime patron population (be generous)
  2. Pick your token length from the table above
  3. Generate a strong, random app_secret
  4. Store the secret securely
  5. Document your configuration

How It Works

Under the hood, this uses HMAC-SHA256 to hash the patron ID with your secret, then truncates to the desired length. The math for collision probability comes from the birthday problem.

The collision probability for any individual is approximately:

P(collision) ≈ population_size / possible_tokens

Where possible_tokens = 16^token_length (since we use hexadecimal output).

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stochastic_pseudonymizer-1.0.0.tar.gz (15.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

stochastic_pseudonymizer-1.0.0-py3-none-any.whl (5.0 kB view details)

Uploaded Python 3

File details

Details for the file stochastic_pseudonymizer-1.0.0.tar.gz.

File metadata

  • Download URL: stochastic_pseudonymizer-1.0.0.tar.gz
  • Upload date:
  • Size: 15.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Pop!_OS","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for stochastic_pseudonymizer-1.0.0.tar.gz
Algorithm Hash digest
SHA256 8eed061def28e1f0102f70b206fd4e4a739052d997b74a6ad11528a28f0a6abf
MD5 3ec4aa61679d9757814ae6960b2b6dc7
BLAKE2b-256 39e88aa1cb34fd91becafc7262dd5d1a3387c392c5dc9abc09570ffb4570d3d8

See more details on using hashes here.

File details

Details for the file stochastic_pseudonymizer-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: stochastic_pseudonymizer-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 5.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Pop!_OS","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for stochastic_pseudonymizer-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 48740317ebbdec1122dda77d10a3eb9226a80aaa6f258d4d56e489060233bafb
MD5 f66991b93304cded84e21e0cbf52c047
BLAKE2b-256 230e869738812f59d536ba5d346b44e008673cc61c9941fe014e4384f90a091f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page