Scan a document dump for canaries, trackers, web beacons, and per-recipient fingerprints before interacting with the data.
Project description
canary-scan
Scan document data-sources for canaries, trackers, web beacons, and per-recipient fingerprints before interacting with supplied datasets.
When you receive a large document dump from an external party — a leak, legal disclosure, or investigation — those files and documents can contain deliberate or indirect canaries: tracking pixels, embedded JavaScript, remote template links, steganographic watermarks, or per-recipient metadata fingerprints that phone home the moment a file is opened.
canary-scan inspects files without opening it in its native viewer, extracting and analysing raw structure, metadata, embedded objects, and near-duplicate fingerprints to surface anything that may reveal to an external party that the data-source is being examined.
→ Full documentation: psaintelligence.github.io/canary-scan
Quick Start (Docker)
The recommended way to run canary-scan is via Docker, as the image bundles all required system utilities and dependencies:
# Run the scan using the GitHub Container Registry image
docker run --rm \
-v /mnt/datasource:/data:ro \
-v $(pwd)/canary-scan-out:/output \
ghcr.io/psaintelligence/canary-scan:latest scan /data -o /output
# Review findings
jq '.[] | select(.severity=="critical")' canary-scan-out/canary-scan-report.json
Quick Start (pipx)
If you prefer to run canary-scan directly on your host machine:
# 1. Install canary-scan
pipx install canary-scan
# 2. Install required system dependencies (Ubuntu 24.04 example)
sudo apt install libimage-exiftool-perl qpdf poppler-utils mupdf-tools \
ripgrep unzip p7zip-full
# 3. Run the scan
canary-scan scan /mnt/datasource
Detection pipeline
Seven sequential stages: inventory → metadata → remote-refs → embedded → stego → uniqueness → report
Each stage writes a JSONL artefact to .canary-scan/. Run canary-scan --guide for a concise cheat sheet.
License
Apache-2.0. Bundled third-party scripts (pdfid, pdf-parser, rtfdump) are BSD 2-Clause — see src/canary_scan/bundled/README.md.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file canary_scan-0.1.2.tar.gz.
File metadata
- Download URL: canary_scan-0.1.2.tar.gz
- Upload date:
- Size: 394.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b6e7f0f2de3531796cf491a2257e55fb47e6fb54739ea11e17f43ebe30065ac1
|
|
| MD5 |
d7c29363d977d7c0e684602e23e07ea0
|
|
| BLAKE2b-256 |
4041eae9fa68067bc38d12aaa7f9e7e40893d99395a961915eb569c95151083b
|
File details
Details for the file canary_scan-0.1.2-py3-none-any.whl.
File metadata
- Download URL: canary_scan-0.1.2-py3-none-any.whl
- Upload date:
- Size: 100.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2a04975fb3813c272088451864a2408e38d974919037857a9eec72e518ff331e
|
|
| MD5 |
dcd2056106084d56561a845edcd24695
|
|
| BLAKE2b-256 |
c7043aeaf388dcf05f79d6d2a9b72b989b304d5300b29a67744e0f1682ccbad5
|