Advanced phishing actor attribution using Bayesian inference and graph analysis
Project description
HUNTERTRACE
Advanced phishing actor attribution using multi-signal Bayesian inference and infrastructure graph analysis
Current release: 1.2.2
Overview
HUNTERTRACE is an open-source phishing attribution engine that identifies the geographic origin of phishing actors through multi-signal Bayesian inference, combining 8+ orthogonal signals to bypass VPN and proxy obfuscation. Evaluated on 53 labeled emails, it achieves 52.8% country-level and 56.6% region-level accuracy — outperforming single-signal methods — with larger-scale validation ongoing.
Traditional email forensics relies on IP geolocation alone (~31% accuracy). HUNTERTRACE fuses 8+ orthogonal signals through Bayesian inference:
| Signal | Source | VPN-Resistant |
|---|---|---|
| Webmail IP leaks | X-Originating-IP, X-Sender-IP headers | Yes |
| Timezone offset | Date header / Received chain | Yes |
| Language fingerprint | Content-Type charset, Subject encoding | Yes |
| Infrastructure reuse | Graph centrality across campaigns | Yes |
| Hop chain forgery | Received header consistency | Partial |
| VPN exit node mapping | ASN + hosting provider classification | N/A |
| SPF/DKIM/DMARC/ARC | Authentication results (incl. ARC chain validation) | Partial |
| Webmail provider | Header fingerprinting (Gmail/Yahoo/Outlook) | Yes |
Architecture
┌─────────────────────────────────────────────────────────────┐
│ HUNTERTRACE PIPELINE │
├─────────────────────────────────────────────────────────────┤
│ │
│ Stage 1: Header Extraction (RFC 2822 parsing) │
│ ↓ │
│ Webmail IP Leak Detection (X-Originating-IP extraction) │
│ ↓ │
│ Stage 2: IP Classification (VPN/Tor/Proxy/Residential) │
│ ↓ │
│ Stage 3A: Enrichment (WHOIS, ASN, hosting provider) │
│ ↓ │
│ VPN Backtrack Analysis (12 bypass techniques) │
│ ↓ │
│ Real IP Extraction (strips proxy layers) │
│ ↓ │
│ Stage 3B: Threat Intelligence │
│ Stage 3C: Correlation Analysis │
│ ↓ │
│ Stage 4: Geolocation (city-level, IPv4 + IPv6) │
│ ↓ │
│ Stage 5: Attribution Analysis (evidence packaging) │
│ ↓ │
│ Bayesian Multi-Signal Fusion (ACI confidence scoring) │
│ ↓ │
│ Sender Classification (hop forgery + timezone analysis) │
│ ↓ │
│ Output: JSON report + text summary + attack graph HTML │
│ │
└─────────────────────────────────────────────────────────────┘
Quick Start
Installation
pip install huntertrace
Python API
from huntertrace import HunterTrace
# Run the full 7-stage pipeline
pipeline = HunterTrace(verbose=True)
result = pipeline.run("phishing.eml")
# Generate text report
report = result.generate_report()
print(report.generate_text_report())
# Access Bayesian attribution
bayes = result.bayesian_attribution
if bayes:
print(f"Region: {bayes.primary_region}")
print(f"Confidence: {bayes.aci_adjusted_prob:.1%}")
print(f"Tier: {bayes.tier} — {bayes.tier_label}")
Command Line
# Single email analysis
huntertrace analyze phishing.eml --verbose
# Batch processing
huntertrace batch emails/ -o results/
# Campaign correlation (cross-email actor linking)
huntertrace campaign emails/ -o campaign_report/
Performance
Evaluated on a labeled corpus of 53 phishing emails with known ground-truth origins:
| Method | Top-1 Country Accuracy | Notes |
|---|---|---|
| IP Geolocation Only | ~31% | Industry baseline |
| Timezone Only | ~52% | VPN-resistant, coarse |
| HUNTERTRACE (Bayesian) | 52.8% | Multi-signal fusion |
| HUNTERTRACE (+ Graph) | 56.6% | Region-level accuracy |
95% Confidence Interval: 39.7% – 65.6% (n=53)
Webmail IP Leak Rate: 37.7% of analyzed emails
Coverage: 100% (no failed predictions)
⚠️ Note: Performance numbers are based on an initial corpus of 53 labeled emails. Larger-scale validation is in progress. Region-level accuracy (56.6%) is more reliable than country-level given current corpus size.
✨ Key Features
- 🎯 Multi-Signal Attribution (8+ signals)
- 🔓 VPN Bypass (webmail leaks, timezone)
- 🕸️ Graph Analysis (infrastructure reuse)
- 📊 Bayesian Fusion (probabilistic)
🚀 Quick Start
git clone https://github.com/akshaydotweb/HunterTrace.git
cd HunterTrace
pip install -r requirements.txt
# Analyze email
python hunterTrace.py analyze phishing.eml
📖 Documentation
🔬 Evaluation
Dataset: 53 labeled phishing emails
Methodology: Manual OSINT labeling with ground truth
- Top-1 Country Accuracy: 52.8%
- Top-1 Region Accuracy: 56.6%
- 95% Confidence Interval: 39.7% – 65.6%
- Webmail Leak Rate: 37.7%
- Macro F1: 0.37
See evaluation/ for full results.
🎓 Citation
@software{huntertrace2026,
author = {[Your Name]},
title = {HUNTERTRACE: Multi-Signal Phishing Attribution},
year = {2026},
url = {https://github.com/akshaydotweb/HunterTrace}
}
📄 License
MIT License - See LICENSE
Black Hat Arsenal 2026 Submission
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file huntertrace-1.2.2.tar.gz.
File metadata
- Download URL: huntertrace-1.2.2.tar.gz
- Upload date:
- Size: 597.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5652fd76eae6943fdbbc8ed110a8a343a73c2eee8248382a0189401f899e976c
|
|
| MD5 |
0bd8fdb358ded65237b702882e0468ac
|
|
| BLAKE2b-256 |
f47bda4f4a0cfaad92debcae877426386a689897b5c6897b5f7350dbe93455a9
|
File details
Details for the file huntertrace-1.2.2-py3-none-any.whl.
File metadata
- Download URL: huntertrace-1.2.2-py3-none-any.whl
- Upload date:
- Size: 459.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
347d6916f35a81193c2b5ec633e33c742d36ef67ad332834fc6c29a18c01e131
|
|
| MD5 |
65f495b38a47bac62908ff616a856469
|
|
| BLAKE2b-256 |
b8d00ccbdafefa9ffa5cba56737b0a84369600c53748ab19168c52d0cc8ddd4c
|