Audit your own social-media export for re-identification (mosaic) risk. Local-first, no-dossier, bring-your-own-LLM.
Project description
ExposureCheck
Audit your own social-media history for re-identification risk — before someone else does it to you.
Modern language models can read a few hundred of your ordinary public posts and
infer where you live, where you work, your routine, your family, and link a
pseudonymous account back to your real name — not from one careless post, but
from the mosaic of many individually-innocuous ones. Researchers have shown
this works at scale and with unsettling accuracy. exposurecheck runs that same
adversarial reading on your own export, on your own terms, and shows you
which of your posts to generalise or edit.
It is local-first, produces no dossier, and never writes a profile of you to disk.
📖 Plain-language explainer of the threat: https://cypherpunkguide.com/privacy/social-media-self-audit/ (the companion article — read it first if "mosaic re-identification" is new to you).
Demo
A ~12-second run on the bundled sample export using the offline --backend
heuristic stub — reproducible: clone the repo and run the exact command shown,
and nothing leaves your machine (--offline hard-blocks egress). A real
--backend local or cloud model surfaces far more; the full run also scores
EMPLOYER, FINANCES, FAMILY and SCHEDULE. (Recorded as an
asciicast.)
What it does
- Parses your Reddit GDPR export and your X / Twitter export (directory or
.zip). - Runs a recall-preserving cascade: a cheap pass ranks every post, an expensive pass reads the high-priority ones, and weak signals are kept — the mosaic is built from weak signals, so throwing them away would be false comfort.
- Extracts the metadata layer deterministically (this is where X leaks most): the self-set location field, outbound links, image EXIF/GPS, device model, and posting-time concentration that betrays your timezone.
- Reports category risk cards (Location, Employer, Family, Schedule, Finances, Account-linkage, …) ranked by risk contribution, each with masked examples and concrete, generalise-first remediation.
What it deliberately does not do
- ❌ No dossier. It never prints "you live in X / work at Y / your name is Z". Cards show masked snippets; the resolved value only ever appears when you click through to your own original post, in-session, never saved.
- ❌ No export of findings. ❌ No scraping (export input only). ❌ No posting/ deletion on your behalf. ❌ No analysing anyone else's history.
- ❌ It does not make you anonymous. It reduces risk. "Low" is not "safe".
Bring your own model — cloud or local
The inference runs on a backend you choose:
| backend | what it is | data leaves your machine? |
|---|---|---|
local |
a local Ollama (or llama.cpp/LM Studio) model | no |
cloud |
any OpenAI-compatible endpoint, your own key | yes |
heuristic |
offline regex stub, near-zero recall | no — dev/CI only, not an audit |
⚠️ The one cloud caveat that actually matters
If the account you are auditing is a pseudonymous one you keep separate from your real identity, and your AI/cloud account is registered under your real name or paid with a real-name method, then sending your history to the cloud lets the provider link real identity ↔ anonymous account on their side (subpoena, breach, insider). That is the exact deanonymization this tool exists to prevent.
So: auditing a strictly-anonymous account → use --backend local (or a cloud
account opened and paid for anonymously). Auditing your real-name / public
account → cloud is fine. The CLI states this and requires acknowledgement when it
applies. We never force local (that would shrink the audience to nobody); we make
the trade-off explicit.
Install
Core (parsing, EXIF, cascade, and the cloud/local HTTP backends) is Python standard library only — no third-party code touches your export.
# pipx — isolated, recommended (works today; a PyPI release is coming):
pipx install git+https://github.com/coraaegis/exposurecheck
# or from source:
git clone https://github.com/coraaegis/exposurecheck && cd exposurecheck && pip install -e .
# or run without installing:
python -m exposurecheck --help
ExposureCheck is a command-line tool today, aimed at people comfortable with a terminal — which is also where its first reviewers live (GitHub, Hacker News, privacy forums). A one-click app for non-technical users — a packaged build with a local, in-browser UI and no Python to install — is the next milestone (see Status). The CLI stays for power users.
Verifying a release
Releases are not signed with an identity code-signing certificate — the author is pseudonymous, and such a certificate would tie the project to a legal identity, the opposite of the point. Authenticity is cryptographic and verifiable instead:
- Each release is PGP-signed by Cora Aegis. Fetch the key via WKD
(
gpg --locate-keys cora@cypherpunkguide.com), thengpg --verify exposurecheck-<version>.tar.gz.asc. - SHA-256 checksums are published with every release.
- Builds are reproducible — rebuild from the tagged source and confirm the artifact matches.
- Prefer a package manager (pip / Scoop / Homebrew) over a downloaded
.exe. An unsigned Windows binary may show a SmartScreen "unknown publisher" prompt; that is expected — verify the PGP signature or run from source.
Usage
Get your data first:
- Reddit → Settings → Privacy → Request a copy of your data (the
.zip). - X → Settings → Your account → Download an archive of your data.
# Local model — nothing leaves your machine (recommended for anonymous accounts)
exposurecheck audit \
--reddit ./reddit_export.zip \
--twitter ./twitter_export \
--backend local --expensive-model llama3.1 \
--i-own-this-data
# Cloud (bring your own key; set it in the ENV, never on the command line)
export OPENAI_API_KEY=sk-...
exposurecheck audit --twitter ./twitter_export --backend cloud --i-own-this-data
# See your own posts behind a category (in-session, nothing is saved)
exposurecheck audit --reddit ./reddit_export --backend local -i --i-own-this-data
The API key is read from an environment variable on purpose — command-line args leak into shell history and process listings.
Cost (cloud)
Roughly $0.59 per profile for ~125 posts on a GPT-4-class model; a real 1–3k-post history lands around $4–15, trimmed by the recall-preserving pre-filter. Local models are free (lower accuracy — the tool warns you).
How it works
export ─▶ parse ─▶ prefilter (drop only TRUE-empty) ─┬─▶ deterministic: profile + EXIF + timing ─┐
└─▶ cascade: cheap route ─▶ expensive read ─┤
▼
risk-contribution scoring ─▶ category cards ─▶ no-dossier report
See docs/THREAT-MODEL.md for what is and isn't in scope,
and docs/ABUSE-EVAL.md for the dual-use safeguards and the
pre-release abuse evaluation.
Status
v0.1 alpha — the CLI runs end-to-end (Reddit + X, EXIF/GPS, the
recall-preserving cascade, the no-dossier report), aimed at terminal-comfortable
users for now.
Roadmap:
- A one-click app (packaged binary + a hardened, local-only in-browser UI, no Python required) so non-technical people can use it too — the CLI stays for power users.
- Image content analysis is deliberately not in v1. Sending images to a cloud model is a serious privacy regression, so v1 extracts EXIF/metadata only and says so plainly; a local-first multimodal option may come later.
- Real-corpus recall / false-positive evaluation (SynthPAI); more platforms (Mastodon); single-post / pre-post checks.
Security & contact
Found a privacy or safety flaw? See SECURITY.md. Reach the author
at cora@cypherpunkguide.com (PGP via WKD: gpg --locate-keys cora@cypherpunkguide.com).
License & official source
MIT — see LICENSE. Built by Cora Aegis
(cypherpunkguide.com).
This repository (and the name ExposureCheck) is the canonical, official
source. You are free to fork and reuse under the MIT terms — but please don't
present a fork as the official project. ExposureCheck is for auditing your
own data; see docs/ABUSE-EVAL.md for the dual-use
safeguards that carry that intent (the licence deliberately does not — it can't).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file exposurecheck-0.1.0.tar.gz.
File metadata
- Download URL: exposurecheck-0.1.0.tar.gz
- Upload date:
- Size: 49.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5ddc621adee0837d60cf4fe07079fb0ca8dc6cd8b558ad72943ba8dc8688852c
|
|
| MD5 |
d9d35bc8c80ebbc9bb4cc21d3f53e759
|
|
| BLAKE2b-256 |
1fd1f41c5c73b5278e37bf3ea8a4523c893f6b0e3fe5618b43ffafef66ce096e
|
File details
Details for the file exposurecheck-0.1.0-py3-none-any.whl.
File metadata
- Download URL: exposurecheck-0.1.0-py3-none-any.whl
- Upload date:
- Size: 50.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5ff122c89e9196677be42f253c61d40f13af87f0f1b3749d6f822bceb8d738c9
|
|
| MD5 |
54afead7614b3f47dfde763fcb15e592
|
|
| BLAKE2b-256 |
b2ba8e3dafa0e5838448970a5600af5bd80ba9943837ce641639b0f8048f1c8b
|