Skip to main content

Audit your own social-media export for re-identification (mosaic) risk. Local-first, no-dossier, bring-your-own-LLM.

Project description

ExposureCheck

Audit your own social-media history for re-identification risk — before someone else does it to you.

Modern language models can read a few hundred of your ordinary public posts and infer where you live, where you work, your routine, your family, and link a pseudonymous account back to your real name — not from one careless post, but from the mosaic of many individually-innocuous ones. Researchers have shown this works at scale and with unsettling accuracy. exposurecheck runs that same adversarial reading on your own export, on your own terms, and shows you which of your posts to generalise or edit.

It is local-first, produces no dossier, and never writes a profile of you to disk.

📖 Plain-language explainer of the threat: https://cypherpunkguide.com/privacy/social-media-self-audit/ (the companion article — read it first if "mosaic re-identification" is new to you).


Demo

ExposureCheck running an offline audit on the bundled sample data

A ~12-second run on the bundled sample export using the offline --backend heuristic stub — reproducible: clone the repo and run the exact command shown, and nothing leaves your machine (--offline hard-blocks egress). A real --backend local or cloud model surfaces far more; the full run also scores EMPLOYER, FINANCES, FAMILY and SCHEDULE. (Recorded as an asciicast.)

What it does

  • Parses your Reddit GDPR export and your X / Twitter export (directory or .zip).
  • Runs a recall-preserving cascade: a cheap pass ranks every post, an expensive pass reads the high-priority ones, and weak signals are kept — the mosaic is built from weak signals, so throwing them away would be false comfort.
  • Extracts the metadata layer deterministically (this is where X leaks most): the self-set location field, outbound links, image EXIF/GPS, device model, and posting-time concentration that betrays your timezone.
  • Reports category risk cards (Location, Employer, Family, Schedule, Finances, Account-linkage, …) ranked by risk contribution, each with masked examples and concrete, generalise-first remediation.

What it deliberately does not do

  • ❌ No dossier. It never prints "you live in X / work at Y / your name is Z". Cards show masked snippets; the resolved value only ever appears when you click through to your own original post, in-session, never saved.
  • ❌ No export of findings. ❌ No scraping (export input only). ❌ No posting/ deletion on your behalf. ❌ No analysing anyone else's history.
  • ❌ It does not make you anonymous. It reduces risk. "Low" is not "safe".

Bring your own model — cloud or local

The inference runs on a backend you choose:

backend what it is data leaves your machine?
local a local Ollama (or llama.cpp/LM Studio) model no
cloud any OpenAI-compatible endpoint, your own key yes
heuristic offline regex stub, near-zero recall no — dev/CI only, not an audit

⚠️ The one cloud caveat that actually matters

If the account you are auditing is a pseudonymous one you keep separate from your real identity, and your AI/cloud account is registered under your real name or paid with a real-name method, then sending your history to the cloud lets the provider link real identity ↔ anonymous account on their side (subpoena, breach, insider). That is the exact deanonymization this tool exists to prevent.

So: auditing a strictly-anonymous account → use --backend local (or a cloud account opened and paid for anonymously). Auditing your real-name / public account → cloud is fine. The CLI states this and requires acknowledgement when it applies. We never force local (that would shrink the audience to nobody); we make the trade-off explicit.


Install

Core (parsing, EXIF, cascade, and the cloud/local HTTP backends) is Python standard library only — no third-party code touches your export.

# pipx — isolated, recommended (works today; a PyPI release is coming):
pipx install git+https://github.com/coraaegis/exposurecheck

# or from source:
git clone https://github.com/coraaegis/exposurecheck && cd exposurecheck && pip install -e .

# or run without installing:
python -m exposurecheck --help

ExposureCheck is a command-line tool today, aimed at people comfortable with a terminal — which is also where its first reviewers live (GitHub, Hacker News, privacy forums). A one-click app for non-technical users — a packaged build with a local, in-browser UI and no Python to install — is the next milestone (see Status). The CLI stays for power users.

Verifying a release

Releases are not signed with an identity code-signing certificate — the author is pseudonymous, and such a certificate would tie the project to a legal identity, the opposite of the point. Authenticity is cryptographic and verifiable instead:

  • Each release is PGP-signed by Cora Aegis. Fetch the key via WKD (gpg --locate-keys cora@cypherpunkguide.com), then gpg --verify exposurecheck-<version>.tar.gz.asc.
  • SHA-256 checksums are published with every release.
  • Builds are reproducible — rebuild from the tagged source and confirm the artifact matches.
  • Prefer a package manager (pip / Scoop / Homebrew) over a downloaded .exe. An unsigned Windows binary may show a SmartScreen "unknown publisher" prompt; that is expected — verify the PGP signature or run from source.

Usage

Get your data first:

  • Reddit → Settings → Privacy → Request a copy of your data (the .zip).
  • X → Settings → Your account → Download an archive of your data.
# Local model — nothing leaves your machine (recommended for anonymous accounts)
exposurecheck audit \
  --reddit ./reddit_export.zip \
  --twitter ./twitter_export \
  --backend local --expensive-model llama3.1 \
  --i-own-this-data

# Cloud (bring your own key; set it in the ENV, never on the command line)
export OPENAI_API_KEY=sk-...
exposurecheck audit --twitter ./twitter_export --backend cloud --i-own-this-data

# See your own posts behind a category (in-session, nothing is saved)
exposurecheck audit --reddit ./reddit_export --backend local -i --i-own-this-data

The API key is read from an environment variable on purpose — command-line args leak into shell history and process listings.

Cost (cloud)

Roughly $0.59 per profile for ~125 posts on a GPT-4-class model; a real 1–3k-post history lands around $4–15, trimmed by the recall-preserving pre-filter. Local models are free (lower accuracy — the tool warns you).

How it works

export ─▶ parse ─▶ prefilter (drop only TRUE-empty) ─┬─▶ deterministic: profile + EXIF + timing ─┐
                                                      └─▶ cascade: cheap route ─▶ expensive read ─┤
                                                                                                  ▼
                                          risk-contribution scoring ─▶ category cards ─▶ no-dossier report

See docs/THREAT-MODEL.md for what is and isn't in scope, and docs/ABUSE-EVAL.md for the dual-use safeguards and the pre-release abuse evaluation.

Status

v0.1 alpha — the CLI runs end-to-end (Reddit + X, EXIF/GPS, the recall-preserving cascade, the no-dossier report), aimed at terminal-comfortable users for now.

Roadmap:

  • A one-click app (packaged binary + a hardened, local-only in-browser UI, no Python required) so non-technical people can use it too — the CLI stays for power users.
  • Image content analysis is deliberately not in v1. Sending images to a cloud model is a serious privacy regression, so v1 extracts EXIF/metadata only and says so plainly; a local-first multimodal option may come later.
  • Real-corpus recall / false-positive evaluation (SynthPAI); more platforms (Mastodon); single-post / pre-post checks.

Security & contact

Found a privacy or safety flaw? See SECURITY.md. Reach the author at cora@cypherpunkguide.com (PGP via WKD: gpg --locate-keys cora@cypherpunkguide.com).

License & official source

MIT — see LICENSE. Built by Cora Aegis (cypherpunkguide.com).

This repository (and the name ExposureCheck) is the canonical, official source. You are free to fork and reuse under the MIT terms — but please don't present a fork as the official project. ExposureCheck is for auditing your own data; see docs/ABUSE-EVAL.md for the dual-use safeguards that carry that intent (the licence deliberately does not — it can't).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

exposurecheck-0.1.0.tar.gz (49.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

exposurecheck-0.1.0-py3-none-any.whl (50.5 kB view details)

Uploaded Python 3

File details

Details for the file exposurecheck-0.1.0.tar.gz.

File metadata

  • Download URL: exposurecheck-0.1.0.tar.gz
  • Upload date:
  • Size: 49.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for exposurecheck-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5ddc621adee0837d60cf4fe07079fb0ca8dc6cd8b558ad72943ba8dc8688852c
MD5 d9d35bc8c80ebbc9bb4cc21d3f53e759
BLAKE2b-256 1fd1f41c5c73b5278e37bf3ea8a4523c893f6b0e3fe5618b43ffafef66ce096e

See more details on using hashes here.

File details

Details for the file exposurecheck-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: exposurecheck-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 50.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for exposurecheck-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5ff122c89e9196677be42f253c61d40f13af87f0f1b3749d6f822bceb8d738c9
MD5 54afead7614b3f47dfde763fcb15e592
BLAKE2b-256 b2ba8e3dafa0e5838448970a5600af5bd80ba9943837ce641639b0f8048f1c8b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page