Skip to main content

Scan a document dump for canaries, trackers, web beacons, and per-recipient fingerprints before interacting with the data.

Project description

canary-scan

python license

Scan document data-sources for canaries, trackers, web beacons, and per-recipient fingerprints before interacting with supplied datasets.

When you receive a large document dump from an external party — a leak, legal disclosure, or investigation — those files and documents can contain deliberate or indirect canaries: tracking pixels, embedded JavaScript, remote template links, steganographic watermarks, or per-recipient metadata fingerprints that phone home the moment a file is opened.

canary-scan inspects files without opening it in its native viewer, extracting and analysing raw structure, metadata, embedded objects, and near-duplicate fingerprints to surface anything that may reveal to an external party that the data-source is being examined.

Full documentation: psaintelligence.github.io/canary-scan


Quick Start (Docker)

The recommended way to run canary-scan is via Docker, as the image bundles all required system utilities and dependencies:

# Run the scan using the GitHub Container Registry image
docker run --rm \
  -v /mnt/datasource:/data:ro \
  -v $(pwd)/canary-scan-out:/output \
  ghcr.io/psaintelligence/canary-scan:latest scan /data -o /output

# Review findings
jq '.[] | select(.severity=="critical")' canary-scan-out/canary-scan-report.json

Quick Start (pipx)

If you prefer to run canary-scan directly on your host machine:

# 1. Install canary-scan
pipx install canary-scan

# 2. Install required system dependencies (Ubuntu 24.04 example)
sudo apt install libimage-exiftool-perl qpdf poppler-utils mupdf-tools \
    ripgrep unzip p7zip-full

# 3. Run the scan
canary-scan scan /mnt/datasource

Detection pipeline

Seven sequential stages: inventory → metadata → remote-refs → embedded → stego → uniqueness → report

Each stage writes a JSONL artefact to .canary-scan/. Run canary-scan --guide for a concise cheat sheet.


License

Apache-2.0. Bundled third-party scripts (pdfid, pdf-parser, rtfdump) are BSD 2-Clause — see src/canary_scan/bundled/README.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

canary_scan-0.1.2.tar.gz (394.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

canary_scan-0.1.2-py3-none-any.whl (100.6 kB view details)

Uploaded Python 3

File details

Details for the file canary_scan-0.1.2.tar.gz.

File metadata

  • Download URL: canary_scan-0.1.2.tar.gz
  • Upload date:
  • Size: 394.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for canary_scan-0.1.2.tar.gz
Algorithm Hash digest
SHA256 b6e7f0f2de3531796cf491a2257e55fb47e6fb54739ea11e17f43ebe30065ac1
MD5 d7c29363d977d7c0e684602e23e07ea0
BLAKE2b-256 4041eae9fa68067bc38d12aaa7f9e7e40893d99395a961915eb569c95151083b

See more details on using hashes here.

File details

Details for the file canary_scan-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: canary_scan-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 100.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for canary_scan-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2a04975fb3813c272088451864a2408e38d974919037857a9eec72e518ff331e
MD5 dcd2056106084d56561a845edcd24695
BLAKE2b-256 c7043aeaf388dcf05f79d6d2a9b72b989b304d5300b29a67744e0f1682ccbad5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page