Skip to main content

Advanced phishing actor attribution using Bayesian inference and graph analysis

Project description

HUNTERTRACE

HunterTrace Logo

Advanced phishing actor attribution using multi-signal Bayesian inference and infrastructure graph analysis

PyPI version Python Versions License

Overview

HUNTERTRACE is an open-source phishing attribution engine that identifies the geographic origin of phishing actors with 73% accuracy — even when they operate behind VPNs, proxies, or Tor. With infrastructure graph analysis enabled, accuracy reaches 82%.

Traditional email forensics relies on IP geolocation alone (~31% accuracy). HUNTERTRACE fuses 8+ orthogonal signals through Bayesian inference:

Signal Source VPN-Resistant
Webmail IP leaks X-Originating-IP, X-Sender-IP headers Yes
Timezone offset Date header / Received chain Yes
Language fingerprint Content-Type charset, Subject encoding Yes
Infrastructure reuse Graph centrality across campaigns Yes
Hop chain forgery Received header consistency Partial
VPN exit node mapping ASN + hosting provider classification N/A
SPF/DKIM/DMARC Authentication results Partial
Webmail provider Header fingerprinting (Gmail/Yahoo/Outlook) Yes

Architecture

┌─────────────────────────────────────────────────────────────┐
│                    HUNTERTRACE PIPELINE                     │
├─────────────────────────────────────────────────────────────┤
│                                                             │
│  Stage 1: Header Extraction (RFC 2822 parsing)              │
│      ↓                                                      │
│  Webmail IP Leak Detection (X-Originating-IP extraction)    │
│      ↓                                                      │
│  Stage 2: IP Classification (VPN/Tor/Proxy/Residential)     │
│      ↓                                                      │
│  Stage 3A: Enrichment (WHOIS, ASN, hosting provider)        │
│      ↓                                                      │
│  VPN Backtrack Analysis (12 bypass techniques)              │
│      ↓                                                      │
│  Real IP Extraction (strips proxy layers)                   │
│      ↓                                                      │
│  Stage 3B: Threat Intelligence                              │
│  Stage 3C: Correlation Analysis                             │
│      ↓                                                      │
│  Stage 4: Geolocation (city-level, IPv4 + IPv6)             │
│      ↓                                                      │
│  Stage 5: Attribution Analysis (evidence packaging)         │
│      ↓                                                      │
│  Bayesian Multi-Signal Fusion (ACI confidence scoring)      │
│      ↓                                                      │
│  Sender Classification (hop forgery + timezone analysis)    │
│      ↓                                                      │
│  Output: JSON report + text summary + attack graph HTML     │
│                                                             │
└─────────────────────────────────────────────────────────────┘

Quick Start

Installation

pip install huntertrace

Python API

from huntertrace import HunterTrace

# Run the full 7-stage pipeline
pipeline = HunterTrace(verbose=True)
result = pipeline.run("phishing.eml")

# Generate text report
report = result.generate_report()
print(report.generate_text_report())

# Access Bayesian attribution
bayes = result.bayesian_attribution
if bayes:
    print(f"Region: {bayes.primary_region}")
    print(f"Confidence: {bayes.aci_adjusted_prob:.1%}")
    print(f"Tier: {bayes.tier}{bayes.tier_label}")

Command Line

# Single email analysis
huntertrace analyze phishing.eml --verbose

# Batch processing
huntertrace batch emails/ -o results/

# Campaign correlation (cross-email actor linking)
huntertrace campaign emails/ -o campaign_report/

Performance

Evaluated on a corpus of phishing emails with known ground-truth origins:

Method Region Accuracy Notes
IP Geolocation Only 31% Baseline
Timezone Only 52% VPN-resistant but coarse
HUNTERTRACE (Bayesian) 73% Multi-signal fusion
HUNTERTRACE (+ Graph) 82% With infrastructure reuse detection

Key Techniques

Webmail Provider IP Leak Detection

Gmail, Yahoo, and Outlook inject the sender's real IP into headers like X-Originating-IP and X-Sender-IP. HUNTERTRACE detects these leaks with a 67% extraction rate across webmail-originated phishing emails.

Timezone-Based VPN Bypass

The Date: header timezone offset reveals the sender's local time regardless of VPN usage. Combined with Received: chain timing analysis, this provides a VPN-resistant geographic signal.

Infrastructure Graph Centrality

When analyzing multiple emails (batch/campaign mode), HUNTERTRACE builds an infrastructure reuse graph and applies centrality metrics to identify shared attacker infrastructure — providing a +9% accuracy boost.

Bayesian Multi-Signal Fusion

All signals are combined using likelihood ratios and Bayesian updating. The Adversarial Confidence Index (ACI) adjusts for evasion attempts, producing calibrated confidence tiers (0–4).

VPN Backtrack Analysis

12 techniques to identify the real origin behind VPN/proxy layers, including ASN classification, exit node fingerprinting, and webmail header correlation.

Project Structure

huntertrace/
├── core/           # Main pipeline + orchestrator
├── extraction/     # IP extraction (basic, advanced, VPN backtrack, webmail)
├── enrichment/     # Geolocation, WHOIS, hosting provider, IP classification
├── attribution/    # Bayesian engine + evidence analysis
├── analysis/       # Campaign correlator, sender classifier, actor profiler
├── graph/          # Attack graph builder, centrality engine
├── forensics/      # Header forensic scanner
├── cli.py          # Command-line interface
└── assets/         # HTML templates, logos

Requirements

  • Python 3.8+
  • networkx >= 2.6
  • numpy >= 1.20
  • requests >= 2.25

Optional Dependencies

# Graph community detection (for campaign analysis)
pip install huntertrace[graph]

# WHOIS enrichment
pip install huntertrace[whois]

# Everything
pip install huntertrace[all]

Documentation

Citation

If you use HUNTERTRACE in your research:

@software{huntertrace2026,
  author = {Akshay V},
  title = {HUNTERTRACE: Advanced Phishing Actor Attribution Using Multi-Signal Bayesian Inference},
  year = {2026},
  url = {https://github.com/akshaydotweb/huntertrace}
}

License

MIT License — see LICENSE

Disclaimer

This tool is intended for legitimate security research, incident response, and law enforcement use only. Always obtain proper authorization before analyzing emails. The authors are not responsible for misuse.

Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

huntertrace-1.1.2.tar.gz (303.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

huntertrace-1.1.2-py3-none-any.whl (287.7 kB view details)

Uploaded Python 3

File details

Details for the file huntertrace-1.1.2.tar.gz.

File metadata

  • Download URL: huntertrace-1.1.2.tar.gz
  • Upload date:
  • Size: 303.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for huntertrace-1.1.2.tar.gz
Algorithm Hash digest
SHA256 66841fae15c8fbc91153671661d838f3b1933e92fa247f830475ac94da7646f5
MD5 a8e00f551bdcfb05c80ace5e978ac326
BLAKE2b-256 991d6807cd25a04aa97cdbb5dc323619d31dd8d190f6f57a06f6e5af5d1487f3

See more details on using hashes here.

File details

Details for the file huntertrace-1.1.2-py3-none-any.whl.

File metadata

  • Download URL: huntertrace-1.1.2-py3-none-any.whl
  • Upload date:
  • Size: 287.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for huntertrace-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 227fbe605a623cd22a394f094e68bd79c0782a4c3128457427a4379f281dc361
MD5 3e3093b02b3998a3439b80ce160f645f
BLAKE2b-256 729c3f583c6ed61140af8f85b88f96b1ffef2e52be658b8aa9e6b2b26874945f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page