Skip to main content

Scan a document dump for canaries, trackers, web beacons, and per-recipient fingerprints before interacting with the data.

Project description

canary-scan

pypi python build tests license

Scan document data-sources for canaries, trackers, web beacons, and per-recipient fingerprints before interacting with supplied datasets.

When you receive a large document dump from an external party — a leak, legal disclosure, or investigation — those files and documents can contain deliberate or indirect canaries: tracking pixels, embedded JavaScript, remote template links, steganographic watermarks, or per-recipient metadata fingerprints that phone home the moment a file is opened.

canary-scan inspects files without opening it in its native viewer, extracting and analysing raw structure, metadata, embedded objects, and near-duplicate fingerprints to surface anything that may reveal to an external party that the data-source is being examined.

Full documentation: psaintelligence.github.io/canary-scan


Quick Start (Docker)

The recommended way to run canary-scan is via Docker, as the image bundles all required system utilities and dependencies:

# Run the scan using the GitHub Container Registry image
docker run --rm \
  -v /mnt/datasource:/data:ro \
  -v $(pwd)/canary-scan-out:/output \
  ghcr.io/psaintelligence/canary-scan:latest scan /data -o /output

# Review findings
jq '.[] | select(.severity=="critical")' canary-scan-out/canary-scan-report.json

Quick Start (pipx)

If you prefer to run canary-scan directly on your host machine:

# 1. Install canary-scan
pipx install canary-scan

# 2. Install required system dependencies (Ubuntu 24.04 example)
sudo apt install libimage-exiftool-perl qpdf poppler-utils mupdf-tools \
    ripgrep unzip p7zip-full

# 3. Run the scan
canary-scan scan /mnt/datasource

Detection pipeline

Seven sequential stages: inventory → metadata → remote-refs → embedded → stego → uniqueness → report

Each stage writes a JSONL artefact to .canary-scan/. Run canary-scan --guide for a concise cheat sheet.


License

Apache-2.0. Bundled third-party scripts (pdfid, pdf-parser, rtfdump) are BSD 2-Clause — see src/canary_scan/bundled/README.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

canary_scan-0.1.4.tar.gz (394.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

canary_scan-0.1.4-py3-none-any.whl (100.6 kB view details)

Uploaded Python 3

File details

Details for the file canary_scan-0.1.4.tar.gz.

File metadata

  • Download URL: canary_scan-0.1.4.tar.gz
  • Upload date:
  • Size: 394.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for canary_scan-0.1.4.tar.gz
Algorithm Hash digest
SHA256 f0274ae99e62cf64098d9b1365684464d6ed8d53701d35884eeca9d36b4e5003
MD5 03785ec6d279301f46ea8e52ce3417bf
BLAKE2b-256 90252e34710da6c8c85754559b433f7a43ec2e4015071bd837ee06600a1f0a09

See more details on using hashes here.

File details

Details for the file canary_scan-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: canary_scan-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 100.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for canary_scan-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 54feb4b6278655579bae8adc073c80a46ef7aa5fc6af5e6b4a376a58ee7528a1
MD5 5b5a361f06c31a0a4e805d1862ea312e
BLAKE2b-256 7811a976989d59af10033b5dda702df6111d01a7ea96556fd0d12ff08138a026

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page