Skip to main content

Scan any website and see every tracker, data broker, fingerprinting trick, and session recorder watching you — live, in your terminal.

Project description

leakwatch

CI Python License: MIT

Scan any website and see — live, in your terminal — every tracker, data broker, fingerprinting trick, and session recorder watching you, which companies they report to, and how the site's own security headers hold up. One brutally simple verdict on top, full forensics underneath.

leakwatch nytimes.com

🔴 59 trackers · leaks to 8 data brokers · fingerprints you · 12 fired before consent — 100/100 (F)

demo

Why I built it

Privacy tools tell you a site is "bad" without showing the receipts, and security recon usually means digging through browser devtools by hand. leakwatch does both in one pass and makes the result legible: it loads the page in a real browser, defeats the consent wall, and turns the chaos of fifty tracker domains into a plain answer — who is watching you here, how badly, and what data leaves the page.

It's built for people who want the truth quickly: engineers auditing their own sites, security folks profiling a target's third-party surface, privacy researchers ranking a whole category, and anyone who just wants to point a tool at a URL and get a verdict.

Dumb surface, advanced engine

  • Dumb to use. Run it with a URL. A live dashboard fills in. The top line is the whole story for most people. No config, no manual.
  • Advanced underneath. Every request, cookie, storage write, and fingerprinting call is captured, classified, scored, and attributed to a parent company.

What it does

  • Two-phase consent scan. Loads as a fresh visitor, then defeats the consent wall — including consent managers in cross-origin iframes (OneTrust, Sourcepoint/ TCF, Cookiebot, Quantcast, Didomi, TrustArc, Usercentrics, Osano, CookieYes, Google, Complianz) by their language-independent IDs, with a multilingual text fallback — and records the trackers that only fire after acceptance.
  • Before/after-consent headline. "18 trackers fired before you consented" — the legally interesting part, and only claimed when a banner truly existed.
  • CMP detection. Even when a button can't be clicked, the IAB __tcfapi/__gpp APIs and known globals reveal the gate, so a consent-walled site is never falsely reported as clean.
  • Company rollup. ~200 curated trackers attributed to parent entities and jurisdictions — Google, Meta, Oracle, LiveRamp, The Trade Desk, and the rest.
  • Data brokers & session recorders called out explicitly ("records your screen" — Hotjar, FullStory, Clarity, …).
  • Fingerprinting via canvas, WebGL, AudioContext, font enumeration, and navigator probes.
  • Security-headers audit. Grades the page A–F on HSTS, CSP, X-Frame-Options, X-Content-Type-Options, Referrer-Policy, and Permissions-Policy.
  • Block detection. Flags Cloudflare-style challenge walls, CAPTCHAs, and 4xx/5xx instead of pretending a site is clean.
  • Anti-bot context. Realistic user agent, viewport, locale, and no webdriver tell, so sites serve their real page.

Install

pip install leakwatch            # once published
# or, from source:
pip install -e .
playwright install chromium      # one-time: leakwatch drives a real browser

leakwatch ships a browser engine (Playwright/Chromium), so it's a heavier install than a pure-Python linter — expected for this class of tool. If the leakwatch command isn't on your PATH, python -m leakwatch ... always works.

Usage

leakwatch example.com              # live dashboard
leakwatch example.com --show       # run with a visible browser (watch the scan)
leakwatch example.com --no-tui     # plain-text report
leakwatch example.com --json       # machine-readable output

leakwatch batch sites.txt                  # ranked leaderboard scorecard
leakwatch batch sites.txt --format markdown --out report.md

leakwatch diff example.com -b baseline.json    # CI gate: exit non-zero on new trackers
leakwatch diff example.com --save-baseline baseline.json

leakwatch login example.com -o auth.json       # sign in by hand, save the session
leakwatch example.com --storage-state auth.json

In the dashboard: type a domain in the top bar and press Enter (or n to focus it) to scan a new site without quitting · r re-runs the current site · q quits.

Leaderboard mode — the shareable artifact

Give it a list of sites (one per line); it ranks them by leakage into a scorecard in text, Markdown, or JSON:

leakwatch batch examples/news-sites.txt --format markdown --out leaderboard.md

Example output:

# Site Leakage Trackers Brokers Records screen Fingerprinting
1 example-news.com 🔴 96 (F) 41 5 yes yes
2 example-shop.com 🟠 64 (C) 22 1 no yes
3 example-wiki.org 🟢 0 (A) 0 0 no no

CI mode — a tracker linter for your own site

Save a baseline, then fail the build when a new third party appears between commits:

leakwatch diff https://your-site.com -b baseline.json   # exit 1 on new trackers

Auditing pages behind a login

By default leakwatch scans as a fresh anonymous visitor — exactly what you want, because that's what a first-time visitor leaks. To audit your own authenticated pages, leakwatch login opens a visible browser, you sign in by hand, and only the resulting session blob is saved locally. leakwatch never sees or stores a password. Batch/leaderboard mode is anonymous-only by design.

How the score works

The leakage score runs 0 (clean) to 100 (worst): a small weight per tracker and company, with heavier penalties for data brokers, session recorders, fingerprinting, and trackers that slip through a consent gate. It maps to a grade from A to F.

Privacy & scope

leakwatch records only the tracking surface — network metadata, cookies, storage keys, fingerprinting call counts, and response headers. It never downloads, stores, renders, or serves page content, images, or media. It only loads pages a normal visitor would, and batch scans are public-only.

Development

git clone https://github.com/gazzycodes/leakwatch
cd leakwatch
pip install -e ".[dev]"
playwright install chromium

PYTHONPATH=src python -m unittest discover -s tests -v
ruff check .

The classification, scoring, security-header, and reporting layers have no browser dependency, so the test suite runs fast and offline. The dataset under src/leakwatch/data is a curated, license-safe set; update-data can fetch fuller external lists later at your discretion.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

leakwatch-0.6.0.tar.gz (6.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

leakwatch-0.6.0-py3-none-any.whl (31.6 kB view details)

Uploaded Python 3

File details

Details for the file leakwatch-0.6.0.tar.gz.

File metadata

  • Download URL: leakwatch-0.6.0.tar.gz
  • Upload date:
  • Size: 6.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for leakwatch-0.6.0.tar.gz
Algorithm Hash digest
SHA256 36ca3a6f06d411b3cf5e6adc747fb3cc0e3defe4a5e2e0205233e5d489bece7a
MD5 d58abe9fec873b441613cf8a3e60fc91
BLAKE2b-256 b5d4f158758845b2280a28924af167b3cce853db7021e45a63026a46a42eb79b

See more details on using hashes here.

File details

Details for the file leakwatch-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: leakwatch-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 31.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for leakwatch-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ca7166f7b4204374d6b24486fa50e91e57a43ed2988ff6208efa47999fc1aeb0
MD5 141fc3e71e25332dce8510de557a9919
BLAKE2b-256 b0fbab555490a8ed96eae3cec1c351421edab04012b5c981199bbd8364a899f9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page