Skip to main content

Subdomain reconnaissance with LLM-powered interestingness scoring for bug bounty hunters and pentesters.

Project description

SubSift

SubSift

Subdomain reconnaissance that actually ranks what matters.

subfinder gives you 5 000 subdomains. SubSift gives you the 20 that probably have a vulnerability — and tells you why.

CI PyPI License: AGPL v3 Python 3.11+ Code style: ruff Type checked: mypy strict Tests: 239 passing


Why

The standard recon pipeline (subfinder → httpx → eyeball every line) doesn't scale. A modern enterprise has 5–10 k subdomains and you can't manually triage that. SubSift bolts an interestingness model onto the pipeline: every subdomain gets a 0–100 score with one-sentence reasoning, and the UI/CLI ranks them so the suspicious ones surface first.

The scoring rubric — admin / dev / staging / vpn names, auth-boundary status codes (401/403), outdated tech, exposed cloud storage — is in src/subsift/llm/prompts.py. Tune it for your engagement.

Real-world result — tesla.com, 2 m 11 s

A single scan against tesla.com from the private Fly.io deploy (subfinder + crt.sh, httpx probe, OpenAI gpt-5-mini scorer):

Subdomains discovered  726
Probed (live HTTP)     258
Scored                 347
High-score (≥70)       220

The model surfaced VPN endpoints, authentication services, password-reset (SSPR) backends, financial gateways, MFA origins, and production vehicle-file storage. The first six rows of the ranked output:

Score FQDN Status Reasoning
98 origin-finplat-prd.tesla.com Production financial platform origin — extremely sensitive backend
95 apacvpn.tesla.com Regional VPN endpoint, almost certainly auth / remote access
95 auth-global-stage.tesla.com 403 Global staging auth service behind an Access-Denied boundary
95 auth.prd.usw.vn.cloud.tesla.com 200 Production auth service exposing the login flow (Envoy / hCaptcha)
95 sspr.tesla.com 403 Self-service password reset behind 403 — top-tier account-takeover risk
95 vehicle-files.prd.usw2.vn.cloud.tesla.com 403 Production vehicle-file storage behind auth

The rest of the long tail (marketing, CDN edges, redirects) sits comfortably below 50 — exactly where you want it during triage.

The pipeline at a glance

                    ┌──────────────────────────────────────┐
   subsift scan ──▶ │ ScanOrchestrator                     │
   POST /scans  ──▶ │   1. enumerate (subfinder + crt.sh)  │
   POST /ui/scans   │   2. dedupe + scope-filter (RFC1035) │
                    │   3. upsert subdomains (+ junction)  │
                    │   4. probe (httpx PD: code/tech/ip)  │
                    │   5. score 0-100 (Ollama / Claude)   │
                    │   6. persist Probe + ScoreResult     │
                    └──────────────────────────────────────┘
                          │
   ┌──────────────────────┼──────────────────────────────────┐
   ▼                      ▼                                  ▼
 CLI tables            HTML UI at /ui                  JSON API at /scans
 (Rich, ranked)        (HTMX, ranked, filterable)      (REST, paginated)
                                                          │
                                                          └─▶ exports:
                                                              .json .csv .txt .md
Tool Output Ranking
subfinder raw subdomain list none
amass subdomains + DNS data none
httpx live hosts + tech by status code
SubSift subdomains + probes + LLM scores + diffs over time by interestingness, with reasoning + history

Enumeration sources

Seven passive sources run concurrently behind an asyncio Semaphore. Each is a Protocol impl — adding more is a one-file change (no schema migration: the scan records its sources in a single sources_used column).

Source Kind Default in registry Notes
subfinder ProjectDiscovery binary yes broad passive recon, fast
crtsh Certificate Transparency logs yes finds names from TLS certs only
wayback Internet Archive CDX API yes historical URLs → hostnames
otx AlienVault OTX passive DNS yes optional API key boosts rate limits
amass OWASP binary, -passive mode yes slower but very thorough
anubis jldc.me Anubis DB yes free JSON API, no key
hackertarget HackerTarget hostsearch yes free, rate-limited (fails soft)

Use subsift scan example.com -s crtsh -s wayback to run a subset. Sources whose binaries aren't installed (amass, subfinder) fail soft and the scan continues with the rest.

Quickstart

One-line scan

subsift scan example.com

The first time you run this it'll enumerate (crt.sh), probe live hosts (httpx), then ask the LLM to score each subdomain. Output:

 scan_id       1
 domain        example.com
 duration      18.42s
 total unique  87
 inserted      87
 updated       0
 probes        62 persisted

Per-source results
┌──────────┬────────┬───────┬────────┐
│ Source   │ Status │ Count │ Time   │
├──────────┼────────┼───────┼────────┤
│ crtsh    │ ok     │   142 │ 4.10s  │
│ subfinder│ ok     │    71 │ 6.85s  │
└──────────┴────────┴───────┴────────┘

LLM scoring
┌──────────┬───────────────────┬──────┬───────────┬────────┐
│ Provider │ Model             │ Stat │ Persisted │ Time   │
├──────────┼───────────────────┼──────┼───────────┼────────┤
│ ollama   │ llama3.2:3b       │ ok   │       62  │ 4.92s  │
└──────────┴───────────────────┴──────┴───────────┴────────┘

Then subsift scores 1 to see the ranked table (highest first):

Score  FQDN                            Reasoning
  92   admin.staging.example.com       admin keyword + 401 auth boundary
  88   jenkins.example.com             exposed Jenkins UI, default branding
  74   gitlab-internal.example.com     internal name leaked publicly
  ...
  12   www.example.com                 marketing site behind CDN

Web UI

subsift serve --reload
# open http://localhost:8000/ui

Three pages: home (recent scans + form), scan detail (ranked table with live filter + export buttons + polling badge), diff view (added/removed/score-changed buckets).

Diff against last week's scan

subsift diff --domain example.com
# or explicitly:
subsift diff 1 2 --threshold 20

Shows what appeared, what disappeared, and which scores moved significantly between two scans of the same domain.

Alert when a high-score subdomain appears

Wire a webhook so SubSift pings you (Slack, Discord, PagerDuty, your own endpoint) the moment a new finding with score ≥ 80 lands:

subsift alerts add "admin-watch" "https://hooks.slack.com/..." \
    --domain example.com --min-score 80 --trigger added
subsift alerts test 1   # synthetic payload, audited in alert_deliveries

Then cron a nightly scan:

0 3 * * *  subsift scan example.com --no-score

Every scan that has a previous scan to diff against evaluates every active rule against the diff and POSTs a JSON payload to webhooks whose threshold matched. Failures are isolated — one broken endpoint never affects other rules or the scan itself, and every attempt (sent / failed / skipped) gets a row in alert_deliveries for audit.

Install

Requirements

  • Python 3.11+ with uv for dependency management
  • ProjectDiscovery binaries (subfinder, httpx, dnsx) — install with Go, see docs/CONFIGURATION.md
  • LLM — choose one:
    • Ollama running locally with any 3B+ instruct model (default, free)
    • Anthropic API key — opt-in via SUBSIFT_LLM_PROVIDER=claude

From PyPI

pip install subsift          # or: uv tool install subsift / pipx install subsift
subsift init-db              # create the local SQLite schema
subsift scan example.com

Optional extras: pip install "subsift[screenshots]" (Playwright capture + thumbnails) and pip install "subsift[storage-s3]" (S3-compatible blob storage). You still need the ProjectDiscovery binaries on PATH and an LLM (Ollama running locally, or an API key) — see Requirements above.

From source (for development)

git clone https://github.com/Ataraxia-ia-labs/Subsift.git
cd subsift
cp .env.example .env
uv sync
uv run alembic upgrade head   # migration-managed schema (vs. init-db)
uv run subsift --help

Docker (when WSL2 / Docker Desktop is available)

cp .env.example .env
docker compose up --build -d
docker compose exec ollama ollama pull llama3.2:3b
curl http://localhost:8000/health

The docker-compose.yml ships an Ollama service alongside the app so a fresh clone works without external dependencies.

Configuration

Everything is driven by environment variables prefixed SUBSIFT_. Copy .env.example to .env and edit. Full reference in docs/CONFIGURATION.md.

Key knobs:

Variable Default What it does
SUBSIFT_LLM_PROVIDER ollama ollama (local) or claude (API)
SUBSIFT_OLLAMA_MODEL llama3.1:8b Any chat-completion model your Ollama has
SUBSIFT_ANTHROPIC_API_KEY Required when provider = claude
SUBSIFT_TOOL_RUNNER native native (binaries on PATH) or docker (image per tool)
SUBSIFT_HTTPX_BIN httpx Absolute path needed on Windows — see docs/CONFIGURATION.md

Deploy (private, on Fly.io)

SubSift ships a complete production-deploy story: HTTPBasic-gated app on Fly.io (São Paulo region), OpenAI (gpt-5-mini) as the LLM by default (Claude and Ollama are one secret swap away), persistent SQLite volume, and idle machines auto-stopped to keep the bill at ~$0 for personal use.

fly auth login
fly apps create subsift
fly volumes create subsift_data --region gru --size 1
fly secrets set \
    SUBSIFT_AUTH_PASSWORD="$(openssl rand -base64 24)" \
    SUBSIFT_OPENAI_API_KEY="sk-..."
fly deploy

Full step-by-step — smoke tests, log tailing, password rotation, volume resizing, the Tailscale-to-local-Ollama variant — in docs/DEPLOY.md. The committed fly.toml already wires the release-command Alembic migration, the volume mount at /app/data, the /health check, and auto-stop on idle.

Documentation

Roadmap

Phase Status
1 — Scaffolding (Python 3.11, FastAPI, SQLModel, uv) :white_check_mark:
2 — Enumeration + persistence (subfinder, crt.sh, SQLite, Alembic) :white_check_mark:
3 — Probing + enrichment (httpx PD) :white_check_mark:
4 — LLM scoring (Ollama + Claude via tool-use) :white_check_mark:
5 — Web UI (Jinja2 + HTMX + Alpine + compiled Tailwind) :white_check_mark:
6 — Exports (JSON / CSV / TXT / Markdown) :white_check_mark:
7 — Historical diffs with junction table :white_check_mark:
8 — Docs + v0.1.0-alpha release :white_check_mark:
9 — Webhook alerts on new high-scored findings :white_check_mark:
10 — Wayback + Amass + AlienVault OTX enumerators :white_check_mark:
11a — Screenshot capture per probe (Playwright, local storage) :white_check_mark:
11b — Storage abstraction (S3-compatible) + thumbnails :white_check_mark:
12 — HTTPBasic auth + Fly.io deploy (gru, persistent volume, auto-stop) :white_check_mark:

Architecture notes

  • Enumerator Protocol + a registry — adding a new source is one file (see src/subsift/core/enumerators/crtsh.py for the smallest example).
  • Prober and LLMClient Protocols for the same reason — swap httpx for naabu, swap Ollama for OpenAI, no orchestrator changes.
  • ToolRunner abstraction so binaries can run native or via Docker without the wrappers caring.
  • Repository pattern so the CLI / API / UI never construct SQL — testable with an in-memory engine.
  • Junction table scan_subdomains so diffs are set operations, not heuristics over first_seen boundaries.

Quality gates

Every push to main runs:

  • pre-commit run --all-files — ruff (lint + format), mypy --strict, detect-secrets, file hygiene.
  • pytest --cov on Python 3.11 and 3.12.
  • uvx pip-audit --strict over exported runtime deps — fails on any known CVE.
  • docker build --target runtime followed by a /health smoke-test inside the container.

Local: make check (POSIX) or scripts\tasks.ps1 check (Windows) reproduces lint + types + tests in one shot.

Legal

SubSift is for authorised security testing only — bug bounty programs, your own assets, contracted pentests, CTFs. Unauthorised scanning of third-party infrastructure may violate the Computer Fraud and Abuse Act (US), the Computer Misuse Act (UK), and equivalent legislation elsewhere. You are responsible for your use. Full terms in DISCLAIMER.md.

License

AGPL-3.0-or-later © 2026 KaiserCode. See LICENSE.

SubSift is copyleft: if you run a modified version as a network service, the AGPL requires you to offer that modified source to its users. This keeps the free/core tier open while leaving room for a separately-licensed Pro tier.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

subsift-0.1.0a6.tar.gz (542.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

subsift-0.1.0a6-py3-none-any.whl (199.3 kB view details)

Uploaded Python 3

File details

Details for the file subsift-0.1.0a6.tar.gz.

File metadata

  • Download URL: subsift-0.1.0a6.tar.gz
  • Upload date:
  • Size: 542.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for subsift-0.1.0a6.tar.gz
Algorithm Hash digest
SHA256 05151c7b1d182a4f8b3d5500e6bf24890b739e77cae75cf2fa77b3ad37621bf3
MD5 99bcdac8c1a98fbceb114fa08bfea0d7
BLAKE2b-256 8538beb6f720a8e289cd07c57a112b5c7d67d957a81e16e0ca6726c86f572a3e

See more details on using hashes here.

Provenance

The following attestation bundles were made for subsift-0.1.0a6.tar.gz:

Publisher: publish.yml on Ataraxia-ia-labs/Subsift

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file subsift-0.1.0a6-py3-none-any.whl.

File metadata

  • Download URL: subsift-0.1.0a6-py3-none-any.whl
  • Upload date:
  • Size: 199.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for subsift-0.1.0a6-py3-none-any.whl
Algorithm Hash digest
SHA256 15b9f33ba2130f27bf43506ac263dd02c57854c42bac4ed99b7fa1e4b460d43d
MD5 f38c2baf67e5c271f9e9701b0351b8ff
BLAKE2b-256 de45f20d81a9c129eb9875a0a4b3678dd08e85a30d11251444cd76e05f327dc4

See more details on using hashes here.

Provenance

The following attestation bundles were made for subsift-0.1.0a6-py3-none-any.whl:

Publisher: publish.yml on Ataraxia-ia-labs/Subsift

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page