Skip to main content

Heuristic phishing URL analyzer for SOC/DFIR workflows

Project description

barb logo

barb

Catch phishing URLs before they catch you.

Heuristic phishing URL analyzer for SOC/DFIR workflows. Offline core — no API keys, never fetches the analyzed URL. Optional --osint flag adds DNS/RDAP enrichment.


Features

  • 12 heuristic analyzers: entropy, homoglyph, TLD, subdomain, brand impersonation, URL shortener, encoding abuse, IP-based URLs, typosquat, keyword, lexical, file extension
  • 5-tier verdict: SAFE / LOW_RISK / SUSPICIOUS / HIGH_RISK / PHISHING with severity-floor escalation
  • Zero API keys required for core analysis — offline, no external calls
  • Opt-in --osint enrichment: DNS resolution + RDAP registration lookups (stdlib only, no API key); never fetches the analyzed URL
  • Allowlist false-positive suppression: ~71 known-good domains suppress noisy domain-based signals; path/query signals still fire
  • OSINT result cache: SQLite cache at ~/.barb/cache.db (default TTL 6 h); bypass with --no-cache
  • Output formats: Rich tables, console, JSON, NDJSON, CSV, STIX 2.1
  • --explain flag: template-based explanation by default, optional LLM (Anthropic Claude, OpenAI)
  • --version flag: report the installed version (barb --version or barb version)
  • Offline eval harness (eval/): measures precision/recall/F1 against a labeled URL corpus; wired into CI as a detection-quality regression gate
  • Batch processing: analyze URL lists from files, stdin, or multiple arguments
  • Automation-ready: exit codes (0=safe, 1=suspicious, 2=phishing, 3=error), --threshold filtering
  • IOC defanging: automatic in terminal output (hxxps[://]evil[.]com)
  • Configurable scoring: per-analyzer weights and verdict thresholds via YAML
  • Minimal dependencies: 5 core packages (typer, rich, pydantic, pyyaml, python-dotenv)

Quick Start

Installation

From PyPI:

pip install barb-phish

With LLM support (optional):

pip install barb-phish[llm]

From source:

git clone https://github.com/duathron/barb.git
cd barb
pip install -e ".[dev]"

Usage

Analyze a single URL:

barb analyze https://suspicious-site.tk/paypal-login

Batch analysis from file:

barb analyze -f urls.txt -o json

With explanation:

barb analyze https://pаypal.com --explain

With OSINT enrichment (DNS + RDAP, opt-in):

barb analyze https://suspicious-site.tk/paypal-login --osint

Force fresh OSINT lookups, bypass cache:

barb analyze https://suspicious-site.tk/paypal-login --osint --no-cache

Pipe from stdin:

cat urls.txt | barb analyze -o csv

Output Examples

Rich Output (default)

╭──────────────────────── barb ────────────────────────╮
│ URL       hxxp[://]192[.]168[.]1[.]1/paypal-login    │
│ Verdict   ⚠ SUSPICIOUS                               │
│ Score     4.0                                         │
╰──────────────────────────────────────────────────────╯
 Severity   Analyzer     Finding
 HIGH       ip_url       URL uses IP address instead of domain
 LOW        subdomain    Domain has 4 levels

JSON Output

barb analyze http://evil.tk/login -o json
{
  "url": "http://evil.tk/login",
  "verdict": "SUSPICIOUS",
  "risk_score": 4.0,
  "signals": [
    {"analyzer": "tld", "severity": "MEDIUM", "detail": "Suspicious TLD: .tk"}
  ]
}

NDJSON Output

One compact JSON object per line — suitable for streaming pipelines and log aggregators.

barb analyze http://evil.tk/login -o ndjson

STIX 2.1 Output

Emits a STIX bundle with indicator objects for SUSPICIOUS / HIGH_RISK / PHISHING verdicts (deterministic IDs, confidence mapped from verdict).

barb analyze http://evil.tk/login -o stix

Analyzers

Heuristic analyzers (offline)

Analyzer What it detects Example
Entropy High Shannon entropy in domain/path x7k2m9p.evil.com
Homoglyph Unicode confusables + mixed-script labels (Latin+Cyrillic); pure non-ASCII IDN emits a LOW informational signal pаypal.com (Cyrillic 'а')
TLD Suspicious top-level domains paypal-login.tk
Subdomain Excessive depth / squatting patterns secure.paypal.com.evil.com
Brand Brand name in non-brand domain paypal-secure.evil.com
Shortener Known URL shortener services bit.ly/abc123
Encoding Percent-encoding / punycode abuse %70%61%79pal.com
IP URL IP address instead of domain; @-obfuscation on a domain host → CRITICAL http://192.168.1.1/login, paypal.com@evil.com
Typosquat ASCII brand lookalikes via Levenshtein 1–2 + digit↔letter swaps; skips official brand domains paypa1.com, g00gle.com
Keyword Phishing keywords in path/query (login, verify, secure, webscr, bank, …); one aggregated LOW signal /login/verify-account
Lexical URL length, hyphen count, digit ratio; LOW signals my-secure-bank-update-2024.com
File Ext Suspicious file extensions in the URL path; double-extension masquerade → HIGH, single executable/script → LOW, archive → INFO invoice.pdf.exe, setup.ps1

OSINT enrichers (--osint)

Opt-in, off by default, fail-open. Queries infrastructure metadata about the domain — never fetches the analyzed URL.

Enricher What it checks Signals
DNS Resolves the host via socket.getaddrinfo (stdlib, timeout 2 s) HIGH on loopback/sinkhole IP; MEDIUM on private IP or NXDOMAIN
RDAP IANA RDAP bootstrap, urllib (stdlib, no API key, timeout 5 s) HIGH if domain <30 days old; MEDIUM if <90 days; LOW if registrant privacy/redacted

Results are cached per host in ~/.barb/cache.db (SQLite, TTL 6 h). Use --no-cache to force fresh lookups.


Configuration

Create ~/.barb/config.yaml:

scoring:
  weights:
    entropy: 1.0
    homoglyph: 1.5
    brand: 1.2
    typosquat: 1.3
    keyword: 0.6
    lexical: 0.5
  thresholds:
    suspicious: 4
    phishing: 13

explain:
  provider: "template"     # template | anthropic | openai
  send_url: true           # send defanged URL to LLM

output:
  default_format: "rich"
  quiet: false

osint:
  dns_timeout: 2           # seconds per DNS lookup
  rdap_timeout: 5          # seconds per RDAP request
  cache_ttl_hours: 6       # SQLite cache TTL (~/.barb/cache.db)

Environment variable: Set BARB_LLM_KEY for LLM API key.


Comparison

Feature barb VirusTotal URL Scan URLScan.io PhishTank
Offline analysis Core offline; opt-in --osint for DNS/RDAP No No No
API key required No Yes Yes Optional
Heuristic detection 12 analyzers Signature-based Browser-based Community
CLI tool Yes Web/API Web/API Web/API
LLM explanation Optional No No No
Self-hosted Yes No No No

Use barb for offline heuristic URL triage. Use vex for VirusTotal IOC enrichment. Pipe barb JSON output into vex for full enrichment (v1.1).


Exit Codes

Code Meaning
0 SAFE or LOW_RISK
1 SUSPICIOUS or HIGH_RISK
2 PHISHING
3 Error (invalid input, missing file)

Development

git clone https://github.com/duathron/barb.git
cd barb
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest tests/ -v

Security

  • No HTTP requests are ever made to analyzed URLs — this holds unconditionally, including when --osint is enabled
  • The offline core is pure string-based heuristics with no external calls
  • The optional --osint flag performs DNS resolution and RDAP lookups about the domain (infrastructure metadata only); it never fetches the URL itself
  • URL length capped at 2048 characters
  • Config directory secured with 0o700 permissions
  • LLM and OSINT dependencies are optional extras — core install has zero network deps

Privacy footprint of --osint

The offline core makes zero outbound connections. When you opt into --osint, barb makes three kinds of request — never to the analyzed host itself:

Connection Endpoint What it reveals Notes
DNS resolution Your system resolver (/etc/resolv.conf: ISP/router/corporate DNS, port 53) The domain being looked up Same lookup any browser would do
RDAP bootstrap https://data.iana.org/rdap/dns.json That you use barb/RDAP Fetched at most once per 7 days (cached at ~/.barb/rdap_bootstrap.json)
RDAP query The TLD's registry RDAP server (e.g. rdap.verisign.com for .com, rdap.pir.org for .org) The domain being investigated No API key; stdlib urllib only
  • The suspect host is never contacted — no HTTP GET/HEAD to the URL, no DNS beacon to attacker-controlled infrastructure beyond normal name resolution.
  • No credentials are ever transmitted.
  • OSINT results are cached per host in ~/.barb/cache.db (default TTL 6 h), so repeat lookups make no network calls; --no-cache forces fresh requests.
  • All OSINT calls are fail-open: a timeout or error simply drops the enrichment signals and analysis continues offline.

License

MIT License. See LICENSE.md.


Author: Christian Huhn

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

barb_phish-1.3.0.tar.gz (55.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

barb_phish-1.3.0-py3-none-any.whl (48.2 kB view details)

Uploaded Python 3

File details

Details for the file barb_phish-1.3.0.tar.gz.

File metadata

  • Download URL: barb_phish-1.3.0.tar.gz
  • Upload date:
  • Size: 55.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for barb_phish-1.3.0.tar.gz
Algorithm Hash digest
SHA256 86acd0185fe8ccfe5a977357c5804130ef28d116b2d966bbb00f0b478396fd50
MD5 80eda08efe74132f519810a5e05917fa
BLAKE2b-256 857554769f439fbe82ad8d6a8ab1fd08da53aa214cd0bdcaa32f9c719ba29c83

See more details on using hashes here.

Provenance

The following attestation bundles were made for barb_phish-1.3.0.tar.gz:

Publisher: publish.yml on duathron/barb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file barb_phish-1.3.0-py3-none-any.whl.

File metadata

  • Download URL: barb_phish-1.3.0-py3-none-any.whl
  • Upload date:
  • Size: 48.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for barb_phish-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b92c49a166bc86cad2265b13a6edb9a54a68eb0b66116f295c3d3557bb3b8c90
MD5 905433daef0b9a875b3bc819c2260072
BLAKE2b-256 a5efa9d7c4a52cadf2abae297dcac37711ba698468d657ba316c8460b6a4f4ad

See more details on using hashes here.

Provenance

The following attestation bundles were made for barb_phish-1.3.0-py3-none-any.whl:

Publisher: publish.yml on duathron/barb

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page