Skip to main content

Scan a document dump for canaries, trackers, web beacons, and per-recipient fingerprints before interacting with the data.

Project description

canary-scan

pypi python build tests license

Scan document data-sources for canaries, trackers, web beacons, and per-recipient fingerprints before interacting with supplied datasets.

When you receive a large document dump from an external party — a leak, legal disclosure, or investigation — those files and documents can contain deliberate or indirect canaries: tracking pixels, embedded JavaScript, remote template links, steganographic watermarks, or per-recipient metadata fingerprints that phone home the moment a file is opened.

canary-scan inspects files without opening it in its native viewer, extracting and analysing raw structure, metadata, embedded objects, and near-duplicate fingerprints to surface anything that may reveal to an external party that the data-source is being examined.

Full documentation: psaintelligence.github.io/canary-scan


Quick Start (Docker)

The recommended way to run canary-scan is via Docker, as the image bundles all required system utilities and dependencies:

# Run the scan using the GitHub Container Registry image
docker run --rm \
  -v /mnt/datasource:/data:ro \
  -v $(pwd)/canary-scan-out:/output \
  ghcr.io/psaintelligence/canary-scan:latest scan /data -o /output

# Review findings
jq '.[] | select(.severity=="critical")' canary-scan-out/canary-scan-report.json

Quick Start (pipx)

If you prefer to run canary-scan directly on your host machine:

# 1. Install canary-scan
pipx install canary-scan

# 2. Install required system dependencies (Ubuntu 24.04 example)
sudo apt install libimage-exiftool-perl qpdf poppler-utils mupdf-tools \
    ripgrep unzip p7zip-full

# 3. Run the scan
canary-scan scan /mnt/datasource

Detection pipeline

Seven sequential stages: inventory → metadata → remote-refs → embedded → stego → uniqueness → report

Each stage writes a JSONL artefact to .canary-scan/. Run canary-scan --guide for a concise cheat sheet.


License

Apache-2.0. Bundled third-party scripts (pdfid, pdf-parser, rtfdump) are BSD 2-Clause — see src/canary_scan/bundled/README.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

canary_scan-0.1.3.tar.gz (394.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

canary_scan-0.1.3-py3-none-any.whl (100.6 kB view details)

Uploaded Python 3

File details

Details for the file canary_scan-0.1.3.tar.gz.

File metadata

  • Download URL: canary_scan-0.1.3.tar.gz
  • Upload date:
  • Size: 394.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for canary_scan-0.1.3.tar.gz
Algorithm Hash digest
SHA256 f03902f77ba76efa23675334000a7baceee22023a80b5040ef6d263aceac6389
MD5 7274b86d820c41e43aedcc2fe56e274a
BLAKE2b-256 ebbd0b6b3c14b15a5d68f9b7afe3da64c48ec37ed50f7b51bcee2c527c6697fe

See more details on using hashes here.

File details

Details for the file canary_scan-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: canary_scan-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 100.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for canary_scan-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 d934f6e3bce718e227c8386c7caa7d54a3433486c0287f5eed9178093fd3ede0
MD5 d7e4f17286e6d837fa3c4908b934f228
BLAKE2b-256 0bd7a6bee6d01ca086da414a6ba6d2598ab7c402da43ff3085fc83f0c51696fb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page