Subdomain reconnaissance with LLM-powered interestingness scoring for bug bounty hunters and pentesters.
Project description
SubSift
Subdomain reconnaissance that actually ranks what matters.
subfinder gives you 5 000 subdomains. SubSift gives you the 20 that probably have a vulnerability — and tells you why.
Why
The standard recon pipeline (subfinder → httpx → eyeball every line) doesn't scale. A modern enterprise has 5–10 k subdomains and you can't manually triage that. SubSift bolts an interestingness model onto the pipeline: every subdomain gets a 0–100 score with one-sentence reasoning, and the UI/CLI ranks them so the suspicious ones surface first.
The scoring rubric — admin / dev / staging / vpn names, auth-boundary status codes (401/403), outdated tech, exposed cloud storage — is in src/subsift/llm/prompts.py. Tune it for your engagement.
Real-world result — tesla.com, 2 m 11 s
A single scan against tesla.com from the private Fly.io deploy (subfinder + crt.sh, httpx probe, OpenAI gpt-5-mini scorer):
Subdomains discovered 726
Probed (live HTTP) 258
Scored 347
High-score (≥70) 220
The model surfaced VPN endpoints, authentication services, password-reset (SSPR) backends, financial gateways, MFA origins, and production vehicle-file storage. The first six rows of the ranked output:
| Score | FQDN | Status | Reasoning |
|---|---|---|---|
| 98 | origin-finplat-prd.tesla.com |
— | Production financial platform origin — extremely sensitive backend |
| 95 | apacvpn.tesla.com |
— | Regional VPN endpoint, almost certainly auth / remote access |
| 95 | auth-global-stage.tesla.com |
403 | Global staging auth service behind an Access-Denied boundary |
| 95 | auth.prd.usw.vn.cloud.tesla.com |
200 | Production auth service exposing the login flow (Envoy / hCaptcha) |
| 95 | sspr.tesla.com |
403 | Self-service password reset behind 403 — top-tier account-takeover risk |
| 95 | vehicle-files.prd.usw2.vn.cloud.tesla.com |
403 | Production vehicle-file storage behind auth |
The rest of the long tail (marketing, CDN edges, redirects) sits comfortably below 50 — exactly where you want it during triage.
The pipeline at a glance
┌──────────────────────────────────────┐
subsift scan ──▶ │ ScanOrchestrator │
POST /scans ──▶ │ 1. enumerate (subfinder + crt.sh) │
POST /ui/scans │ 2. dedupe + scope-filter (RFC1035) │
│ 3. upsert subdomains (+ junction) │
│ 4. probe (httpx PD: code/tech/ip) │
│ 5. score 0-100 (Ollama / Claude) │
│ 6. persist Probe + ScoreResult │
└──────────────────────────────────────┘
│
┌──────────────────────┼──────────────────────────────────┐
▼ ▼ ▼
CLI tables HTML UI at /ui JSON API at /scans
(Rich, ranked) (HTMX, ranked, filterable) (REST, paginated)
│
└─▶ exports:
.json .csv .txt .md
| Tool | Output | Ranking |
|---|---|---|
subfinder |
raw subdomain list | none |
amass |
subdomains + DNS data | none |
httpx |
live hosts + tech | by status code |
| SubSift | subdomains + probes + LLM scores + diffs over time | by interestingness, with reasoning + history |
Enumeration sources
Seven passive sources run concurrently behind an asyncio Semaphore. Each is a Protocol impl — adding more is a one-file change (no schema migration: the scan records its sources in a single sources_used column).
| Source | Kind | Default in registry | Notes |
|---|---|---|---|
subfinder |
ProjectDiscovery binary | yes | broad passive recon, fast |
crtsh |
Certificate Transparency logs | yes | finds names from TLS certs only |
wayback |
Internet Archive CDX API | yes | historical URLs → hostnames |
otx |
AlienVault OTX passive DNS | yes | optional API key boosts rate limits |
amass |
OWASP binary, -passive mode |
yes | slower but very thorough |
anubis |
jldc.me Anubis DB | yes | free JSON API, no key |
hackertarget |
HackerTarget hostsearch | yes | free, rate-limited (fails soft) |
Use subsift scan example.com -s crtsh -s wayback to run a subset. Sources whose binaries aren't installed (amass, subfinder) fail soft and the scan continues with the rest.
Quickstart
One-line scan
subsift scan example.com
The first time you run this it'll enumerate (crt.sh), probe live hosts (httpx), then ask the LLM to score each subdomain. Output:
scan_id 1
domain example.com
duration 18.42s
total unique 87
inserted 87
updated 0
probes 62 persisted
Per-source results
┌──────────┬────────┬───────┬────────┐
│ Source │ Status │ Count │ Time │
├──────────┼────────┼───────┼────────┤
│ crtsh │ ok │ 142 │ 4.10s │
│ subfinder│ ok │ 71 │ 6.85s │
└──────────┴────────┴───────┴────────┘
LLM scoring
┌──────────┬───────────────────┬──────┬───────────┬────────┐
│ Provider │ Model │ Stat │ Persisted │ Time │
├──────────┼───────────────────┼──────┼───────────┼────────┤
│ ollama │ llama3.2:3b │ ok │ 62 │ 4.92s │
└──────────┴───────────────────┴──────┴───────────┴────────┘
Then subsift scores 1 to see the ranked table (highest first):
Score FQDN Reasoning
92 admin.staging.example.com admin keyword + 401 auth boundary
88 jenkins.example.com exposed Jenkins UI, default branding
74 gitlab-internal.example.com internal name leaked publicly
...
12 www.example.com marketing site behind CDN
Web UI
subsift serve --reload
# open http://localhost:8000/ui
Three pages: home (recent scans + form), scan detail (ranked table with live filter + export buttons + polling badge), diff view (added/removed/score-changed buckets).
Diff against last week's scan
subsift diff --domain example.com
# or explicitly:
subsift diff 1 2 --threshold 20
Shows what appeared, what disappeared, and which scores moved significantly between two scans of the same domain.
Alert when a high-score subdomain appears
Wire a webhook so SubSift pings you (Slack, Discord, PagerDuty, your own endpoint) the moment a new finding with score ≥ 80 lands:
subsift alerts add "admin-watch" "https://hooks.slack.com/..." \
--domain example.com --min-score 80 --trigger added
subsift alerts test 1 # synthetic payload, audited in alert_deliveries
Then cron a nightly scan:
0 3 * * * subsift scan example.com --no-score
Every scan that has a previous scan to diff against evaluates every active rule against the diff and POSTs a JSON payload to webhooks whose threshold matched. Failures are isolated — one broken endpoint never affects other rules or the scan itself, and every attempt (sent / failed / skipped) gets a row in alert_deliveries for audit.
Install
Requirements
- Python 3.11+ with uv for dependency management
- ProjectDiscovery binaries (
subfinder,httpx,dnsx) — install with Go, see docs/CONFIGURATION.md - LLM — choose one:
- Ollama running locally with any 3B+ instruct model (default, free)
- Anthropic API key — opt-in via
SUBSIFT_LLM_PROVIDER=claude
From PyPI
pip install subsift # or: uv tool install subsift / pipx install subsift
subsift init-db # create the local SQLite schema
subsift scan example.com
Optional extras: pip install "subsift[screenshots]" (Playwright capture +
thumbnails) and pip install "subsift[storage-s3]" (S3-compatible blob
storage). You still need the ProjectDiscovery binaries on PATH and an LLM
(Ollama running locally, or an API key) — see Requirements above.
From source (for development)
git clone https://github.com/Ataraxia-ia-labs/Subsift.git
cd subsift
cp .env.example .env
uv sync
uv run alembic upgrade head # migration-managed schema (vs. init-db)
uv run subsift --help
Docker (when WSL2 / Docker Desktop is available)
cp .env.example .env
docker compose up --build -d
docker compose exec ollama ollama pull llama3.2:3b
curl http://localhost:8000/health
The docker-compose.yml ships an Ollama service alongside the app so a fresh clone works without external dependencies.
Configuration
Everything is driven by environment variables prefixed SUBSIFT_. Copy .env.example to .env and edit. Full reference in docs/CONFIGURATION.md.
Key knobs:
| Variable | Default | What it does |
|---|---|---|
SUBSIFT_LLM_PROVIDER |
ollama |
ollama (local) or claude (API) |
SUBSIFT_OLLAMA_MODEL |
llama3.1:8b |
Any chat-completion model your Ollama has |
SUBSIFT_ANTHROPIC_API_KEY |
— | Required when provider = claude |
SUBSIFT_TOOL_RUNNER |
native |
native (binaries on PATH) or docker (image per tool) |
SUBSIFT_HTTPX_BIN |
httpx |
Absolute path needed on Windows — see docs/CONFIGURATION.md |
Deploy (private, on Fly.io)
SubSift ships a complete production-deploy story: HTTPBasic-gated app on
Fly.io (São Paulo region), OpenAI (gpt-5-mini) as the LLM by default
(Claude and Ollama are one secret swap away), persistent SQLite volume,
and idle machines auto-stopped to keep the bill at ~$0 for personal use.
fly auth login
fly apps create subsift
fly volumes create subsift_data --region gru --size 1
fly secrets set \
SUBSIFT_AUTH_PASSWORD="$(openssl rand -base64 24)" \
SUBSIFT_OPENAI_API_KEY="sk-..."
fly deploy
Full step-by-step — smoke tests, log tailing, password rotation, volume
resizing, the Tailscale-to-local-Ollama variant — in
docs/DEPLOY.md. The committed fly.toml already
wires the release-command Alembic migration, the volume mount at
/app/data, the /health check, and auto-stop on idle.
Documentation
- docs/ARCHITECTURE.md — module layout, request lifecycle, data model.
- docs/CLI.md — every command, every flag, examples.
- docs/API.md — REST endpoint reference with curl examples.
- docs/CONFIGURATION.md — env var reference + install troubleshooting.
- docs/DEPLOY.md — Fly.io deploy guide, secrets, cost expectations.
- CHANGELOG.md — release notes.
- CONTRIBUTING.md — dev setup, style, PR workflow.
- CODE_OF_CONDUCT.md — Contributor Covenant 2.1.
- SECURITY.md — vulnerability disclosure + responsible-use.
- DISCLAIMER.md — ethical-use terms + legal warning.
Roadmap
| Phase | Status |
|---|---|
| 1 — Scaffolding (Python 3.11, FastAPI, SQLModel, uv) | :white_check_mark: |
| 2 — Enumeration + persistence (subfinder, crt.sh, SQLite, Alembic) | :white_check_mark: |
| 3 — Probing + enrichment (httpx PD) | :white_check_mark: |
| 4 — LLM scoring (Ollama + Claude via tool-use) | :white_check_mark: |
| 5 — Web UI (Jinja2 + HTMX + Alpine + compiled Tailwind) | :white_check_mark: |
| 6 — Exports (JSON / CSV / TXT / Markdown) | :white_check_mark: |
| 7 — Historical diffs with junction table | :white_check_mark: |
| 8 — Docs + v0.1.0-alpha release | :white_check_mark: |
| 9 — Webhook alerts on new high-scored findings | :white_check_mark: |
| 10 — Wayback + Amass + AlienVault OTX enumerators | :white_check_mark: |
| 11a — Screenshot capture per probe (Playwright, local storage) | :white_check_mark: |
| 11b — Storage abstraction (S3-compatible) + thumbnails | :white_check_mark: |
| 12 — HTTPBasic auth + Fly.io deploy (gru, persistent volume, auto-stop) | :white_check_mark: |
Architecture notes
EnumeratorProtocol + a registry — adding a new source is one file (seesrc/subsift/core/enumerators/crtsh.pyfor the smallest example).ProberandLLMClientProtocols for the same reason — swap httpx for naabu, swap Ollama for OpenAI, no orchestrator changes.ToolRunnerabstraction so binaries can run native or via Docker without the wrappers caring.- Repository pattern so the CLI / API / UI never construct SQL — testable with an in-memory engine.
- Junction table
scan_subdomainsso diffs are set operations, not heuristics overfirst_seenboundaries.
Quality gates
Every push to main runs:
pre-commit run --all-files— ruff (lint + format), mypy--strict, detect-secrets, file hygiene.pytest --covon Python 3.11 and 3.12.uvx pip-audit --strictover exported runtime deps — fails on any known CVE.docker build --target runtimefollowed by a/healthsmoke-test inside the container.
Local: make check (POSIX) or scripts\tasks.ps1 check (Windows) reproduces lint + types + tests in one shot.
Legal
SubSift is for authorised security testing only — bug bounty programs, your own assets, contracted pentests, CTFs. Unauthorised scanning of third-party infrastructure may violate the Computer Fraud and Abuse Act (US), the Computer Misuse Act (UK), and equivalent legislation elsewhere. You are responsible for your use. Full terms in DISCLAIMER.md.
License
AGPL-3.0-or-later © 2026 KaiserCode. See LICENSE.
SubSift is copyleft: if you run a modified version as a network service, the AGPL requires you to offer that modified source to its users. This keeps the free/core tier open while leaving room for a separately-licensed Pro tier.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file subsift-0.1.0a6.tar.gz.
File metadata
- Download URL: subsift-0.1.0a6.tar.gz
- Upload date:
- Size: 542.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
05151c7b1d182a4f8b3d5500e6bf24890b739e77cae75cf2fa77b3ad37621bf3
|
|
| MD5 |
99bcdac8c1a98fbceb114fa08bfea0d7
|
|
| BLAKE2b-256 |
8538beb6f720a8e289cd07c57a112b5c7d67d957a81e16e0ca6726c86f572a3e
|
Provenance
The following attestation bundles were made for subsift-0.1.0a6.tar.gz:
Publisher:
publish.yml on Ataraxia-ia-labs/Subsift
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
subsift-0.1.0a6.tar.gz -
Subject digest:
05151c7b1d182a4f8b3d5500e6bf24890b739e77cae75cf2fa77b3ad37621bf3 - Sigstore transparency entry: 1631843795
- Sigstore integration time:
-
Permalink:
Ataraxia-ia-labs/Subsift@a1910e5823b6896e5e946fdf3798f669a2f6dcbb -
Branch / Tag:
refs/tags/v0.1.0a6 - Owner: https://github.com/Ataraxia-ia-labs
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a1910e5823b6896e5e946fdf3798f669a2f6dcbb -
Trigger Event:
release
-
Statement type:
File details
Details for the file subsift-0.1.0a6-py3-none-any.whl.
File metadata
- Download URL: subsift-0.1.0a6-py3-none-any.whl
- Upload date:
- Size: 199.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
15b9f33ba2130f27bf43506ac263dd02c57854c42bac4ed99b7fa1e4b460d43d
|
|
| MD5 |
f38c2baf67e5c271f9e9701b0351b8ff
|
|
| BLAKE2b-256 |
de45f20d81a9c129eb9875a0a4b3678dd08e85a30d11251444cd76e05f327dc4
|
Provenance
The following attestation bundles were made for subsift-0.1.0a6-py3-none-any.whl:
Publisher:
publish.yml on Ataraxia-ia-labs/Subsift
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
subsift-0.1.0a6-py3-none-any.whl -
Subject digest:
15b9f33ba2130f27bf43506ac263dd02c57854c42bac4ed99b7fa1e4b460d43d - Sigstore transparency entry: 1631843827
- Sigstore integration time:
-
Permalink:
Ataraxia-ia-labs/Subsift@a1910e5823b6896e5e946fdf3798f669a2f6dcbb -
Branch / Tag:
refs/tags/v0.1.0a6 - Owner: https://github.com/Ataraxia-ia-labs
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@a1910e5823b6896e5e946fdf3798f669a2f6dcbb -
Trigger Event:
release
-
Statement type: