Scan any website and see every tracker, data broker, fingerprinting trick, and session recorder watching you — live, in your terminal.
Project description
leakwatch
Scan any website and see — live, in your terminal — every tracker, data broker, fingerprinting trick, and session recorder watching you, which companies they report to, and how the site's own security headers hold up. One brutally simple verdict on top, full forensics underneath.
leakwatch nytimes.com
🔴 59 trackers · leaks to 8 data brokers · fingerprints you · 12 fired before consent — 100/100 (F)
Why I built it
Privacy tools tell you a site is "bad" without showing the receipts, and security recon usually means digging through browser devtools by hand. leakwatch does both in one pass and makes the result legible: it loads the page in a real browser, defeats the consent wall, and turns the chaos of fifty tracker domains into a plain answer — who is watching you here, how badly, and what data leaves the page.
It's built for people who want the truth quickly: engineers auditing their own sites, security folks profiling a target's third-party surface, privacy researchers ranking a whole category, and anyone who just wants to point a tool at a URL and get a verdict.
Dumb surface, advanced engine
- Dumb to use. Run it with a URL. A live dashboard fills in. The top line is the whole story for most people. No config, no manual.
- Advanced underneath. Every request, cookie, storage write, and fingerprinting call is captured, classified, scored, and attributed to a parent company.
What it does
- Two-phase consent scan. Loads as a fresh visitor, then defeats the consent wall — including consent managers in cross-origin iframes (OneTrust, Sourcepoint/ TCF, Cookiebot, Quantcast, Didomi, TrustArc, Usercentrics, Osano, CookieYes, Google, Complianz) by their language-independent IDs, with a multilingual text fallback — and records the trackers that only fire after acceptance.
- Before/after-consent headline. "18 trackers fired before you consented" — the legally interesting part, and only claimed when a banner truly existed.
- CMP detection. Even when a button can't be clicked, the IAB
__tcfapi/__gppAPIs and known globals reveal the gate, so a consent-walled site is never falsely reported as clean. - Company rollup. ~200 curated trackers attributed to parent entities and jurisdictions — Google, Meta, Oracle, LiveRamp, The Trade Desk, and the rest.
- Data brokers & session recorders called out explicitly ("records your screen" — Hotjar, FullStory, Clarity, …).
- Fingerprinting via canvas, WebGL, AudioContext, font enumeration, and navigator probes.
- Security-headers audit. Grades the page A–F on HSTS, CSP, X-Frame-Options, X-Content-Type-Options, Referrer-Policy, and Permissions-Policy.
- Block detection. Flags Cloudflare-style challenge walls, CAPTCHAs, and 4xx/5xx instead of pretending a site is clean.
- Anti-bot context. Realistic user agent, viewport, locale, and no
webdrivertell, so sites serve their real page.
Install
pip install leakwatch # once published
# or, from source:
pip install -e .
playwright install chromium # one-time: leakwatch drives a real browser
leakwatch ships a browser engine (Playwright/Chromium), so it's a heavier install
than a pure-Python linter — expected for this class of tool. If the leakwatch
command isn't on your PATH, python -m leakwatch ... always works.
Usage
leakwatch example.com # live dashboard
leakwatch example.com --show # run with a visible browser (watch the scan)
leakwatch example.com --no-tui # plain-text report
leakwatch example.com --json # machine-readable output
leakwatch batch sites.txt # ranked leaderboard scorecard
leakwatch batch sites.txt --format markdown --out report.md
leakwatch diff example.com -b baseline.json # CI gate: exit non-zero on new trackers
leakwatch diff example.com --save-baseline baseline.json
leakwatch login example.com -o auth.json # sign in by hand, save the session
leakwatch example.com --storage-state auth.json
In the dashboard: type a domain in the top bar and press Enter (or n to focus
it) to scan a new site without quitting · r re-runs the current site · q quits.
Leaderboard mode — the shareable artifact
Give it a list of sites (one per line); it ranks them by leakage into a scorecard in text, Markdown, or JSON:
leakwatch batch examples/news-sites.txt --format markdown --out leaderboard.md
Example output:
| # | Site | Leakage | Trackers | Brokers | Records screen | Fingerprinting |
|---|---|---|---|---|---|---|
| 1 | example-news.com | 🔴 96 (F) | 41 | 5 | yes | yes |
| 2 | example-shop.com | 🟠 64 (C) | 22 | 1 | no | yes |
| 3 | example-wiki.org | 🟢 0 (A) | 0 | 0 | no | no |
CI mode — a tracker linter for your own site
Save a baseline, then fail the build when a new third party appears between commits:
leakwatch diff https://your-site.com -b baseline.json # exit 1 on new trackers
Auditing pages behind a login
By default leakwatch scans as a fresh anonymous visitor — exactly what you want,
because that's what a first-time visitor leaks. To audit your own authenticated
pages, leakwatch login opens a visible browser, you sign in by hand, and only the
resulting session blob is saved locally. leakwatch never sees or stores a
password. Batch/leaderboard mode is anonymous-only by design.
How the score works
The leakage score runs 0 (clean) to 100 (worst): a small weight per tracker and company, with heavier penalties for data brokers, session recorders, fingerprinting, and trackers that slip through a consent gate. It maps to a grade from A to F.
Privacy & scope
leakwatch records only the tracking surface — network metadata, cookies, storage keys, fingerprinting call counts, and response headers. It never downloads, stores, renders, or serves page content, images, or media. It only loads pages a normal visitor would, and batch scans are public-only.
Development
git clone https://github.com/gazzycodes/leakwatch
cd leakwatch
pip install -e ".[dev]"
playwright install chromium
PYTHONPATH=src python -m unittest discover -s tests -v
ruff check .
The classification, scoring, security-header, and reporting layers have no browser
dependency, so the test suite runs fast and offline. The dataset under
src/leakwatch/data is a curated, license-safe set; update-data can fetch fuller
external lists later at your discretion.
License
MIT — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file leakwatch-0.6.0.tar.gz.
File metadata
- Download URL: leakwatch-0.6.0.tar.gz
- Upload date:
- Size: 6.7 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
36ca3a6f06d411b3cf5e6adc747fb3cc0e3defe4a5e2e0205233e5d489bece7a
|
|
| MD5 |
d58abe9fec873b441613cf8a3e60fc91
|
|
| BLAKE2b-256 |
b5d4f158758845b2280a28924af167b3cce853db7021e45a63026a46a42eb79b
|
File details
Details for the file leakwatch-0.6.0-py3-none-any.whl.
File metadata
- Download URL: leakwatch-0.6.0-py3-none-any.whl
- Upload date:
- Size: 31.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ca7166f7b4204374d6b24486fa50e91e57a43ed2988ff6208efa47999fc1aeb0
|
|
| MD5 |
141fc3e71e25332dce8510de557a9919
|
|
| BLAKE2b-256 |
b0fbab555490a8ed96eae3cec1c351421edab04012b5c981199bbd8364a899f9
|