Skip to main content

Search directories and archives for documents containing payment card numbers (PANs).

Project description

PANhunt

Bandit CodeQL DevSkim

Introduction

PANhunt is a tool that can be used to search drives for primary account numbers (PANs).

PAN Acronym for “primary account number.” Unique payment card number (credit, debit, or prepaid cards, etc.) that identifies the issuer and the cardholder account.

The tool is useful for checking PCI DSS scope accuracy. Ensuring PAN does not leak out of the authorized locations and there's no clear text PAN within the infrastructure, the tool may help as a simple control.

Acknowledgements

PANhunt remains rooted in the original Dionach PANhunt project, created and released by Dionach Ltd. The original project made a simple, practical PAN discovery tool available to the PCI and security community, including the PST parsing foundation that this fork continues to build on.

This fork keeps the Dionach icon as a visible sign of respect for the original source, developer, and project history. The BSD-3-Clause license and copyright notice from Dionach Ltd. are preserved in this repository.

Function

PANhunt uses regular expressions to look for Visa, MasterCard, AMEX and other credit card numbers across a broad set of document, mail, and archive formats. Archive and container formats are recursed so nested documents, emails, and attachments can be searched.

Currently supported searchable formats include:

  • Plain text and text-like files, including CSV, XML, HTML, logs, source files, and flat OpenDocument XML (.fodt, .fods, .fodp) when detected as text
  • Rich Text Format (.rtf)
  • Legacy Microsoft Office binary files (.doc, .xls, .ppt)
  • Modern Microsoft Office Open XML files (.docx, .xlsx, .pptx)
  • OpenDocument containers (.odt, .ott, .ods, .ots, .odp, .otp, .odg, .otg, .odf, .odm)
  • PDF documents (.pdf)
  • Outlook and email stores/messages (.pst, .msg, .eml, .mbox), including supported attachments
  • Recursive archives and compressed files (.zip, .tar, .gz, .xz), including Office and OpenDocument container files

PANhunt will list but does not yet search Access databases.

Architecture

PANhunt follows a layered, dependency-injected architecture:

  • PanHuntService — orchestrates a full scan session with no UI concerns
  • CliPresenter — handles all terminal output and report file writing
  • Hunter / Dispatcher — file-system traversal and concurrent scanning
  • ScannerFactory / ArchiveFactory — produce scanner and archive-handler instances for each file type; custom scanners can be registered at runtime
  • ScanConfiguration — immutable configuration object created once and injected into all components
  • ScanResult / Finding — data-transfer objects carrying structured output

The service and presenter layers are fully decoupled, making it straightforward to embed PANhunt in a larger application or swap the CLI presenter for a different UI.

Installation and publishing

PANhunt requires Python 3.9 or later. For normal usage, install the package and run the console script:

pipx install panhunt
panhunt --help

libmagic prerequisite on non-Windows systems

PANhunt uses python-magic for file type detection. On Windows, the packaged python-magic-bin dependency provides the native library. On Linux and macOS, python-magic usually requires the OS-level libmagic library to be installed before PANhunt can import and scan files successfully. Install it with your platform package manager before or after installing PANhunt, for example:

# Debian / Ubuntu
sudo apt-get install libmagic1

# Fedora
sudo dnf install file-libs

# macOS with Homebrew
brew install libmagic

For local development, install the project with its development extras from the repository root:

pip install -e .[dev]
pytest

Testing

The test suite uses pytest and covers the core scanning logic, configuration, factories, service layer, and presenter.

pytest src/tests/

To include coverage details, run pytest --cov=panhunt src/tests/.

Usage

usage: panhunt [-h] [-x EXCLUDE_DIRS] [-o REPORT_DIR] [-j JSON_DIR] [-C CONFIG] [-X EXCLUDE_PAN] [-w WORKERS] [-q] [target_path]

PANHunt : search directories and sub directories for documents containing PANs.

positional arguments:
  target_path      file or directory to search (default: None)

options:
  -h, --help       show this help message and exit
  -x EXCLUDE_DIRS  directories to exclude from the search (use absolute paths) (default: None)
  -o REPORT_DIR    Report file directory for TXT formatted PAN report (default: ./)
  -j JSON_DIR      Report file directory for JSON formatted PAN report (default: None)
  -C CONFIG        configuration file to use (default: None)
  -X EXCLUDE_PAN   PAN to exclude from search (default: None)
  -w WORKERS       Number of worker threads (default: 1) (default: None)
  -q               No terminal output (default: False)

For advanced scanning controls, use -C config.ini. The configuration file supports additional options beyond the command-line parameters.

Running PANhunt without a target path or -C config.ini no longer starts a root-directory scan. It prints a short reminder to use -h or --help and exits without scanning. Reports are written as panhunt_<timestamp>.report in the report directory, and JSON reports are written as panhunt_<timestamp>.json when -j or the json configuration key is set.

Example Output

FOUND PANs: D:\PANhunt\test\eml\test with attachments.eml (176.91KB)
        Mastercard:510510******5100
        Visa:401288******1881
        AMEX:371449*****8431

FOUND PANs: D:\PANhunt\test\eml\test.eml (41.87KB)
        Mastercard:510510******5100
        Visa:401288******1881
        AMEX:371449*****8431

FOUND PANs: D:\PANhunt\test\msg\test with attachments.msg (169.50KB)
        Mastercard:510510******5100
        Visa:401288******1881
        AMEX:371449*****8431

FOUND PANs: D:\PANhunt\test\msg\test.msg (22.50KB)
        Mastercard:510510******5100
        Visa:401288******1881
        AMEX:371449*****8431

FOUND PANs: D:\PANhunt\test\office\test.rtf (40.79KB)
        Mastercard:510510******5100
        Visa:401288******1881
        AMEX:371449*****8431

FOUND PANs: D:\PANhunt\test\pdf\test.pdf (39.57KB)
        Mastercard:510510******5100
        Visa:401288******1881
        AMEX:371449*****8431

FOUND PANs: D:\PANhunt\test\plain\test.txt (96.00B)
        Mastercard:510510******5100
        Visa:401288******1881
        AMEX:371449*****8431

FOUND PANs: D:\PANhunt\test\plain\dir2\test.txt (96.00B)
        Mastercard:510510******5100
        Visa:401288******1881
        AMEX:371449*****8431

FOUND PANs: test with attachments.eml\test.txt (96.00B)
        Mastercard:510510******5100
        Visa:401288******1881
        AMEX:371449*****8431

FOUND PANs: success.tar\test.rtf (40.79KB)
        Mastercard:510510******5100
        Visa:401288******1881
        AMEX:371449*****8431

FOUND PANs: test.eml\None (36.77KB)
        Mastercard:510510******5100
        Visa:401288******1881
        AMEX:371449*****8431

FOUND PANs: D:\PANhunt\test\gz\test.txt.gz\test.txt (54.00B)
        Mastercard:510510******5100
        Visa:401288******1881
        AMEX:371449*****8431

FOUND PANs: test with attachments.msg\test.txt (96.00B)
        Mastercard:510510******5100
        Visa:401288******1881
        AMEX:371449*****8431

FOUND PANs: D:\PANhunt\test\office\test.docx\word/document.xml (3.50KB)
        Mastercard:510510******5100
        Visa:401288******1881
        AMEX:371449*****8431

FOUND PANs: D:\PANhunt\test\office\test.pptx\ppt/slides/slide1.xml (1.68KB)
        Mastercard:510510******5100
        Visa:401288******1881
        AMEX:371449*****8431

FOUND PANs: D:\PANhunt\test\office\test.xlsx\xl/sharedStrings.xml (328.00B)
        Mastercard:510510******5100
        Visa:401288******1881
        AMEX:371449*****8431

FOUND PANs: D:\PANhunt\test\tar\success.tar\dir2/test.txt (96.00B)
        Mastercard:510510******5100
        Visa:401288******1881
        AMEX:371449*****8431

FOUND PANs: D:\PANhunt\test\xz\test.txt.xz\test.txt (54.00B)
        Mastercard:510510******5100
        Visa:401288******1881
        AMEX:371449*****8431

FOUND PANs: D:\PANhunt\test\zip\test.zip\dir2/test.txt (96.00B)
        Mastercard:510510******5100
        Visa:401288******1881
        AMEX:371449*****8431

FOUND PANs: D:\PANhunt\test\zip\test.zip\test.txt (96.00B)
        Mastercard:510510******5100
        Visa:401288******1881
        AMEX:371449*****8431

FOUND PANs: D:\PANhunt\test\tar\success.tar.gz\success.tar\dir2/test.txt (54.00B)
        Mastercard:510510******5100
        Visa:401288******1881
        AMEX:371449*****8431

FOUND PANs: D:\PANhunt\test\tar\success.tar.xz\success.tar\dir2/test.txt (54.00B)
        Mastercard:510510******5100
        Visa:401288******1881
        AMEX:371449*****8431

Report written to D:\PANhunt\out\panhunt_2024-09-14-221629.report

Configuration

The script allows for a configuration file that sets default values, so you don't need to repeatedly specify paths or PANs to exclude on the command line.

Example config.ini:

[DEFAULT]
# Target can be supplied as target, search, or file.
search = /data
exclude = /data/logs,/data/tmp
outfile = /var/reports
json = /var/reports
excludepans = 4111111111111111
sizeLimit = 8589934592
workers = 2
quiet = false

# Optional safety/resource limits. Values are bytes unless otherwise noted.
maxScanDepth = 25
maxChildJobs = 100000
maxTotalExpandedBytes = 8589934592
maxArchiveMembers = 10000
maxArchiveCompressionRatio = 100
maxArchivePathLength = 4096
archiveSpoolThreshold = 8388608
maxAttachmentSize = 8589934592
maxAttachmentsPerMessage = 1000
maxTotalAttachmentBytes = 8589934592
allowedArchiveTypes =
deniedArchiveTypes =
parserTimeoutSeconds = 30
parserMemoryLimitBytes = 536870912
maxPdfPages = 100
maxPdfTextBytes = 10485760

Pass the config file with -C config.ini. The configuration file is the preferred way to use advanced scanning controls because it supports more options than the command-line parameters, including safety limits for nested archives, compressed data, attachments, parser isolation, and PDF extraction. Command-line quiet mode (-q) overrides the quiet value from the configuration file. The sizeLimit setting also updates the default total expanded-byte and attachment-byte limits unless those more specific settings are supplied.

Restricting memory usage

An important detail is that when working with large compressed files such as compressed log files larger than memory, panhunt may use all the CPU power, and it may be better to limit the CPU usage to prevent issues. If you are using systemd, a command like systemd-run --scope -p CPUQuota=60% panhunt <your parameters here> would save your resources.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

panhunt-2.0.0.tar.gz (102.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

panhunt-2.0.0-py3-none-any.whl (113.9 kB view details)

Uploaded Python 3

File details

Details for the file panhunt-2.0.0.tar.gz.

File metadata

  • Download URL: panhunt-2.0.0.tar.gz
  • Upload date:
  • Size: 102.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for panhunt-2.0.0.tar.gz
Algorithm Hash digest
SHA256 b6b3092b68ebdb9cdac58a8572f7fcbfc6f60cb4d4a14b950fa1ebccee7b1dc2
MD5 a06ffeeb8a1f8178aed99499b4bbb34e
BLAKE2b-256 57920cc6ae44c14fd9a6bbc4174f5a68de214d1522e48a6e9e8862359bef53f2

See more details on using hashes here.

File details

Details for the file panhunt-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: panhunt-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 113.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for panhunt-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 afe278fdc1b091a18b6958e91f9e7c9823521019fcf59a75b5773bb2f28f8c82
MD5 c3633a4e6fd019e53156ededb5d89c5c
BLAKE2b-256 09984fe9c9749af1d795fb8d15c698d1e3d024cfdbfb279e83ca579ed8868223

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page