Audit your own social-media export for re-identification (mosaic) risk. Local-first, no-dossier, bring-your-own-LLM.

These details have not been verified by PyPI

Project links

Project description

ExposureCheck

Audit your own social-media history for re-identification risk — before someone else does it to you.

Modern language models can read a few hundred of your ordinary public posts and infer where you live, where you work, your routine, your family, and link a pseudonymous account back to your real name — not from one careless post, but from the mosaic of many individually-innocuous ones. Researchers have shown this works at scale and with unsettling accuracy. exposurecheck runs that same adversarial reading on your own export, on your own terms, and shows you which of your posts to generalise or edit.

It is local-first, produces no dossier, and never writes a profile of you to disk.

📖 Plain-language explainer of the threat: https://cypherpunkguide.com/privacy/social-media-self-audit/ (the companion article — read it first if "mosaic re-identification" is new to you).

Demo

ExposureCheck running an offline audit on the bundled sample data

A ~12-second run on the bundled sample export using the offline --backend heuristic stub — reproducible: clone the repo and run the exact command shown, and nothing leaves your machine (--offline hard-blocks egress). A real --backend local or cloud model surfaces far more; the full run also scores EMPLOYER, FINANCES, FAMILY and SCHEDULE. (Recorded as an asciicast.)

What it does

Parses your Reddit GDPR export and your X / Twitter export (directory or .zip).
Runs a recall-preserving cascade: a cheap pass ranks every post, an expensive pass reads the high-priority ones, and weak signals are kept — the mosaic is built from weak signals, so throwing them away would be false comfort.
Extracts the metadata layer deterministically (this is where X leaks most): the self-set location field, outbound links, image EXIF/GPS, device model, and posting-time concentration that betrays your timezone.
Reports category risk cards (Location, Employer, Family, Schedule, Finances, Account-linkage, …) ranked by risk contribution, each with masked examples and concrete, generalise-first remediation.

What it deliberately does not do

❌ No dossier. It never prints "you live in X / work at Y / your name is Z". Cards show masked snippets; the resolved value only ever appears when you click through to your own original post, in-session, never saved.
❌ No export of findings. ❌ No scraping (export input only). ❌ No posting/ deletion on your behalf. ❌ No analysing anyone else's history.
❌ It does not make you anonymous. It reduces risk. "Low" is not "safe".

Bring your own model — cloud or local

The inference runs on a backend you choose:

backend	what it is	data leaves your machine?
`local`	a local Ollama (or llama.cpp/LM Studio) model	no
`cloud`	any OpenAI-compatible endpoint, your own key	yes
`heuristic`	offline regex stub, near-zero recall	no — dev/CI only, not an audit

⚠️ The one cloud caveat that actually matters

If the account you are auditing is a pseudonymous one you keep separate from your real identity, and your AI/cloud account is registered under your real name or paid with a real-name method, then sending your history to the cloud lets the provider link real identity ↔ anonymous account on their side (subpoena, breach, insider). That is the exact deanonymization this tool exists to prevent.

So: auditing a strictly-anonymous account → use --backend local (or a cloud account opened and paid for anonymously). Auditing your real-name / public account → cloud is fine. The CLI states this and requires acknowledgement when it applies. We never force local (that would shrink the audience to nobody); we make the trade-off explicit.

Install

Core (parsing, EXIF, cascade, and the cloud/local HTTP backends) is Python standard library only — no third-party code touches your export.

# pipx — isolated, recommended (works today; a PyPI release is coming):
pipx install git+https://github.com/coraaegis/exposurecheck

# or from source:
git clone https://github.com/coraaegis/exposurecheck && cd exposurecheck && pip install -e .

# or run without installing:
python -m exposurecheck --help

ExposureCheck is a command-line tool today, aimed at people comfortable with a terminal — which is also where its first reviewers live (GitHub, Hacker News, privacy forums). A one-click app for non-technical users — a packaged build with a local, in-browser UI and no Python to install — is the next milestone (see Status). The CLI stays for power users.

Verifying a release

Releases are not signed with an identity code-signing certificate — the author is pseudonymous, and such a certificate would tie the project to a legal identity, the opposite of the point. Authenticity is cryptographic and verifiable instead:

Each release is PGP-signed by Cora Aegis. Fetch the key via WKD (gpg --locate-keys cora@cypherpunkguide.com), then gpg --verify exposurecheck-<version>.tar.gz.asc.
SHA-256 checksums are published with every release.
Builds are reproducible — rebuild from the tagged source and confirm the artifact matches.
Prefer a package manager (pip / Scoop / Homebrew) over a downloaded .exe. An unsigned Windows binary may show a SmartScreen "unknown publisher" prompt; that is expected — verify the PGP signature or run from source.

Usage

Get your data first:

Reddit → Settings → Privacy → Request a copy of your data (the .zip).
X → Settings → Your account → Download an archive of your data.

# Local model — nothing leaves your machine (recommended for anonymous accounts)
exposurecheck audit \
  --reddit ./reddit_export.zip \
  --twitter ./twitter_export \
  --backend local --expensive-model llama3.1 \
  --i-own-this-data

# Cloud (bring your own key; set it in the ENV, never on the command line)
export OPENAI_API_KEY=sk-...
exposurecheck audit --twitter ./twitter_export --backend cloud --i-own-this-data

# See your own posts behind a category (in-session, nothing is saved)
exposurecheck audit --reddit ./reddit_export --backend local -i --i-own-this-data

The API key is read from an environment variable on purpose — command-line args leak into shell history and process listings.

Cost (cloud)

Roughly $0.59 per profile for ~125 posts on a GPT-4-class model; a real 1–3k-post history lands around $4–15, trimmed by the recall-preserving pre-filter. Local models are free (lower accuracy — the tool warns you).

How it works

export ─▶ parse ─▶ prefilter (drop only TRUE-empty) ─┬─▶ deterministic: profile + EXIF + timing ─┐
                                                      └─▶ cascade: cheap route ─▶ expensive read ─┤
                                                                                                  ▼
                                          risk-contribution scoring ─▶ category cards ─▶ no-dossier report

See docs/THREAT-MODEL.md for what is and isn't in scope, and docs/ABUSE-EVAL.md for the dual-use safeguards and the pre-release abuse evaluation.

Status

v0.1 alpha — the CLI runs end-to-end (Reddit + X, EXIF/GPS, the recall-preserving cascade, the no-dossier report), aimed at terminal-comfortable users for now.

Roadmap:

A one-click app (packaged binary + a hardened, local-only in-browser UI, no Python required) so non-technical people can use it too — the CLI stays for power users.
Image content analysis is deliberately not in v1. Sending images to a cloud model is a serious privacy regression, so v1 extracts EXIF/metadata only and says so plainly; a local-first multimodal option may come later.
Real-corpus recall / false-positive evaluation (SynthPAI); more platforms (Mastodon); single-post / pre-post checks.

Security & contact

Found a privacy or safety flaw? See SECURITY.md. Reach the author at cora@cypherpunkguide.com (PGP via WKD: gpg --locate-keys cora@cypherpunkguide.com).

License & official source

MIT — see LICENSE. Built by Cora Aegis (cypherpunkguide.com).

This repository (and the name ExposureCheck) is the canonical, official source. You are free to fork and reuse under the MIT terms — but please don't present a fork as the official project. ExposureCheck is for auditing your own data; see docs/ABUSE-EVAL.md for the dual-use safeguards that carry that intent (the licence deliberately does not — it can't).

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jun 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

exposurecheck-0.1.0.tar.gz (49.9 kB view details)

Uploaded Jun 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

exposurecheck-0.1.0-py3-none-any.whl (50.5 kB view details)

Uploaded Jun 27, 2026 Python 3

File details

Details for the file exposurecheck-0.1.0.tar.gz.

File metadata

Download URL: exposurecheck-0.1.0.tar.gz
Upload date: Jun 27, 2026
Size: 49.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for exposurecheck-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`5ddc621adee0837d60cf4fe07079fb0ca8dc6cd8b558ad72943ba8dc8688852c`
MD5	`d9d35bc8c80ebbc9bb4cc21d3f53e759`
BLAKE2b-256	`1fd1f41c5c73b5278e37bf3ea8a4523c893f6b0e3fe5618b43ffafef66ce096e`

See more details on using hashes here.

File details

Details for the file exposurecheck-0.1.0-py3-none-any.whl.

File metadata

Download URL: exposurecheck-0.1.0-py3-none-any.whl
Upload date: Jun 27, 2026
Size: 50.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for exposurecheck-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5ff122c89e9196677be42f253c61d40f13af87f0f1b3749d6f822bceb8d738c9`
MD5	`54afead7614b3f47dfde763fcb15e592`
BLAKE2b-256	`b2ba8e3dafa0e5838448970a5600af5bd80ba9943837ce641639b0f8048f1c8b`

See more details on using hashes here.

exposurecheck 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

ExposureCheck

Demo

What it does

What it deliberately does not do

Bring your own model — cloud or local

⚠️ The one cloud caveat that actually matters

Install

Verifying a release

Usage

Cost (cloud)

How it works

Status

Security & contact

License & official source

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes