Search directories and archives for documents containing payment card numbers (PANs).
Project description
PANhunt
Introduction
PANhunt is a tool that can be used to search drives for primary account numbers (PANs).
PAN Acronym for “primary account number.” Unique payment card number (credit, debit, or prepaid cards, etc.) that identifies the issuer and the cardholder account.
The tool is useful for checking PCI DSS scope accuracy. Ensuring PAN does not leak out of the authorized locations and there's no clear text PAN within the infrastructure, the tool may help as a simple control.
Acknowledgements
PANhunt remains rooted in the original Dionach PANhunt project, created and released by Dionach Ltd. The original project made a simple, practical PAN discovery tool available to the PCI and security community, including the PST parsing foundation that this fork continues to build on.
This fork keeps the Dionach icon as a visible sign of respect for the original source, developer, and project history. The BSD-3-Clause license and copyright notice from Dionach Ltd. are preserved in this repository.
Function
PANhunt uses regular expressions to look for Visa, MasterCard, AMEX and other credit card numbers across a broad set of document, mail, and archive formats. Archive and container formats are recursed so nested documents, emails, and attachments can be searched.
Currently supported searchable formats include:
- Plain text and text-like files, including CSV, XML, HTML, logs, source files, and flat OpenDocument XML (
.fodt,.fods,.fodp) when detected as text - Rich Text Format (
.rtf) - Legacy Microsoft Office binary files (
.doc,.xls,.ppt) - Modern Microsoft Office Open XML files (
.docx,.xlsx,.pptx) - OpenDocument containers (
.odt,.ott,.ods,.ots,.odp,.otp,.odg,.otg,.odf,.odm) - PDF documents (
.pdf) - Outlook and email stores/messages (
.pst,.msg,.eml,.mbox), including supported attachments - Recursive archives and compressed files (
.zip,.tar,.gz,.xz), including Office and OpenDocument container files
PANhunt will list but does not yet search Access databases.
Architecture
PANhunt follows a layered, dependency-injected architecture:
PanHuntService— orchestrates a full scan session with no UI concernsCliPresenter— handles all terminal output and report file writingHunter/Dispatcher— file-system traversal and concurrent scanningScannerFactory/ArchiveFactory— produce scanner and archive-handler instances for each file type; custom scanners can be registered at runtimeScanConfiguration— immutable configuration object created once and injected into all componentsScanResult/Finding— data-transfer objects carrying structured output
The service and presenter layers are fully decoupled, making it straightforward to embed PANhunt in a larger application or swap the CLI presenter for a different UI.
Installation and publishing
PANhunt requires Python 3.9 or later. For normal usage, install the package and run the console script:
pipx install panhunt
panhunt --help
libmagic prerequisite on non-Windows systems
PANhunt uses python-magic for file type detection. On Windows, the packaged python-magic-bin dependency provides the native library. On Linux and macOS, python-magic usually requires the OS-level libmagic library to be installed before PANhunt can import and scan files successfully. Install it with your platform package manager before or after installing PANhunt, for example:
# Debian / Ubuntu
sudo apt-get install libmagic1
# Fedora
sudo dnf install file-libs
# macOS with Homebrew
brew install libmagic
For local development, install the project with its development extras from the repository root:
pip install -e .[dev]
pytest
Testing
The test suite uses pytest and covers the core scanning logic, configuration, factories, service layer, and presenter.
pytest src/tests/
To include coverage details, run pytest --cov=panhunt src/tests/.
Usage
usage: panhunt [-h] [-x EXCLUDE_DIRS] [-o REPORT_DIR] [-j JSON_DIR] [-C CONFIG] [-X EXCLUDE_PAN] [-w WORKERS] [-q] [target_path]
PANHunt : search directories and sub directories for documents containing PANs.
positional arguments:
target_path file or directory to search (default: None)
options:
-h, --help show this help message and exit
-x EXCLUDE_DIRS directories to exclude from the search (use absolute paths) (default: None)
-o REPORT_DIR Report file directory for TXT formatted PAN report (default: ./)
-j JSON_DIR Report file directory for JSON formatted PAN report (default: None)
-C CONFIG configuration file to use (default: None)
-X EXCLUDE_PAN PAN to exclude from search (default: None)
-w WORKERS Number of worker threads (default: 1) (default: None)
-q No terminal output (default: False)
For advanced scanning controls, use -C config.ini. The configuration file supports additional options beyond the command-line parameters.
Running PANhunt without a target path or -C config.ini no longer starts a root-directory scan. It prints a short reminder to use -h or --help and exits without scanning. Reports are written as panhunt_<timestamp>.report in the report directory, and JSON reports are written as panhunt_<timestamp>.json when -j or the json configuration key is set.
Example Output
FOUND PANs: D:\PANhunt\test\eml\test with attachments.eml (176.91KB)
Mastercard:510510******5100
Visa:401288******1881
AMEX:371449*****8431
FOUND PANs: D:\PANhunt\test\eml\test.eml (41.87KB)
Mastercard:510510******5100
Visa:401288******1881
AMEX:371449*****8431
FOUND PANs: D:\PANhunt\test\msg\test with attachments.msg (169.50KB)
Mastercard:510510******5100
Visa:401288******1881
AMEX:371449*****8431
FOUND PANs: D:\PANhunt\test\msg\test.msg (22.50KB)
Mastercard:510510******5100
Visa:401288******1881
AMEX:371449*****8431
FOUND PANs: D:\PANhunt\test\office\test.rtf (40.79KB)
Mastercard:510510******5100
Visa:401288******1881
AMEX:371449*****8431
FOUND PANs: D:\PANhunt\test\pdf\test.pdf (39.57KB)
Mastercard:510510******5100
Visa:401288******1881
AMEX:371449*****8431
FOUND PANs: D:\PANhunt\test\plain\test.txt (96.00B)
Mastercard:510510******5100
Visa:401288******1881
AMEX:371449*****8431
FOUND PANs: D:\PANhunt\test\plain\dir2\test.txt (96.00B)
Mastercard:510510******5100
Visa:401288******1881
AMEX:371449*****8431
FOUND PANs: test with attachments.eml\test.txt (96.00B)
Mastercard:510510******5100
Visa:401288******1881
AMEX:371449*****8431
FOUND PANs: success.tar\test.rtf (40.79KB)
Mastercard:510510******5100
Visa:401288******1881
AMEX:371449*****8431
FOUND PANs: test.eml\None (36.77KB)
Mastercard:510510******5100
Visa:401288******1881
AMEX:371449*****8431
FOUND PANs: D:\PANhunt\test\gz\test.txt.gz\test.txt (54.00B)
Mastercard:510510******5100
Visa:401288******1881
AMEX:371449*****8431
FOUND PANs: test with attachments.msg\test.txt (96.00B)
Mastercard:510510******5100
Visa:401288******1881
AMEX:371449*****8431
FOUND PANs: D:\PANhunt\test\office\test.docx\word/document.xml (3.50KB)
Mastercard:510510******5100
Visa:401288******1881
AMEX:371449*****8431
FOUND PANs: D:\PANhunt\test\office\test.pptx\ppt/slides/slide1.xml (1.68KB)
Mastercard:510510******5100
Visa:401288******1881
AMEX:371449*****8431
FOUND PANs: D:\PANhunt\test\office\test.xlsx\xl/sharedStrings.xml (328.00B)
Mastercard:510510******5100
Visa:401288******1881
AMEX:371449*****8431
FOUND PANs: D:\PANhunt\test\tar\success.tar\dir2/test.txt (96.00B)
Mastercard:510510******5100
Visa:401288******1881
AMEX:371449*****8431
FOUND PANs: D:\PANhunt\test\xz\test.txt.xz\test.txt (54.00B)
Mastercard:510510******5100
Visa:401288******1881
AMEX:371449*****8431
FOUND PANs: D:\PANhunt\test\zip\test.zip\dir2/test.txt (96.00B)
Mastercard:510510******5100
Visa:401288******1881
AMEX:371449*****8431
FOUND PANs: D:\PANhunt\test\zip\test.zip\test.txt (96.00B)
Mastercard:510510******5100
Visa:401288******1881
AMEX:371449*****8431
FOUND PANs: D:\PANhunt\test\tar\success.tar.gz\success.tar\dir2/test.txt (54.00B)
Mastercard:510510******5100
Visa:401288******1881
AMEX:371449*****8431
FOUND PANs: D:\PANhunt\test\tar\success.tar.xz\success.tar\dir2/test.txt (54.00B)
Mastercard:510510******5100
Visa:401288******1881
AMEX:371449*****8431
Report written to D:\PANhunt\out\panhunt_2024-09-14-221629.report
Configuration
The script allows for a configuration file that sets default values, so you don't need to repeatedly specify paths or PANs to exclude on the command line.
Example config.ini:
[DEFAULT]
# Target can be supplied as target, search, or file.
search = /data
exclude = /data/logs,/data/tmp
outfile = /var/reports
json = /var/reports
excludepans = 4111111111111111
sizeLimit = 8589934592
workers = 2
quiet = false
# Optional safety/resource limits. Values are bytes unless otherwise noted.
maxScanDepth = 25
maxChildJobs = 100000
maxTotalExpandedBytes = 8589934592
maxArchiveMembers = 10000
maxArchiveCompressionRatio = 100
maxArchivePathLength = 4096
archiveSpoolThreshold = 8388608
maxAttachmentSize = 8589934592
maxAttachmentsPerMessage = 1000
maxTotalAttachmentBytes = 8589934592
allowedArchiveTypes =
deniedArchiveTypes =
parserTimeoutSeconds = 30
parserMemoryLimitBytes = 536870912
maxPdfPages = 100
maxPdfTextBytes = 10485760
Pass the config file with -C config.ini. The configuration file is the preferred way to use advanced scanning controls because it supports more options than the command-line parameters, including safety limits for nested archives, compressed data, attachments, parser isolation, and PDF extraction. Command-line quiet mode (-q) overrides the quiet value from the configuration file. The sizeLimit setting also updates the default total expanded-byte and attachment-byte limits unless those more specific settings are supplied.
Restricting memory usage
An important detail is that when working with large compressed files such as compressed log files larger than memory, panhunt may use all the CPU power, and it may be better to limit the CPU usage to prevent issues. If you are using systemd, a command like systemd-run --scope -p CPUQuota=60% panhunt <your parameters here> would save your resources.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file panhunt-2.0.0.tar.gz.
File metadata
- Download URL: panhunt-2.0.0.tar.gz
- Upload date:
- Size: 102.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b6b3092b68ebdb9cdac58a8572f7fcbfc6f60cb4d4a14b950fa1ebccee7b1dc2
|
|
| MD5 |
a06ffeeb8a1f8178aed99499b4bbb34e
|
|
| BLAKE2b-256 |
57920cc6ae44c14fd9a6bbc4174f5a68de214d1522e48a6e9e8862359bef53f2
|
File details
Details for the file panhunt-2.0.0-py3-none-any.whl.
File metadata
- Download URL: panhunt-2.0.0-py3-none-any.whl
- Upload date:
- Size: 113.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
afe278fdc1b091a18b6958e91f9e7c9823521019fcf59a75b5773bb2f28f8c82
|
|
| MD5 |
c3633a4e6fd019e53156ededb5d89c5c
|
|
| BLAKE2b-256 |
09984fe9c9749af1d795fb8d15c698d1e3d024cfdbfb279e83ca579ed8868223
|