Subdomain reconnaissance with LLM-powered interestingness scoring for bug bounty hunters and pentesters.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

KaiserCode

These details have not been verified by PyPI

Project description

SubSift

Subdomain reconnaissance that actually ranks what matters.

subfinder gives you 5 000 subdomains. SubSift gives you the 20 that probably have a vulnerability — and tells you why.

Why

The standard recon pipeline (subfinder → httpx → eyeball every line) doesn't scale. A modern enterprise has 5–10 k subdomains and you can't manually triage that. SubSift bolts an interestingness model onto the pipeline: every subdomain gets a 0–100 score with one-sentence reasoning, and the UI/CLI ranks them so the suspicious ones surface first.

The scoring rubric — admin / dev / staging / vpn names, auth-boundary status codes (401/403), outdated tech, exposed cloud storage — is in src/subsift/llm/prompts.py. Tune it for your engagement.

Real-world result — `tesla.com`, 2 m 11 s

A single scan against tesla.com from the private Fly.io deploy (subfinder + crt.sh, httpx probe, OpenAI gpt-5-mini scorer):

Subdomains discovered  726
Probed (live HTTP)     258
Scored                 347
High-score (≥70)       220

The model surfaced VPN endpoints, authentication services, password-reset (SSPR) backends, financial gateways, MFA origins, and production vehicle-file storage. The first six rows of the ranked output:

Score	FQDN	Status	Reasoning
98	`origin-finplat-prd.tesla.com`	—	Production financial platform origin — extremely sensitive backend
95	`apacvpn.tesla.com`	—	Regional VPN endpoint, almost certainly auth / remote access
95	`auth-global-stage.tesla.com`	403	Global staging auth service behind an Access-Denied boundary
95	`auth.prd.usw.vn.cloud.tesla.com`	200	Production auth service exposing the login flow (Envoy / hCaptcha)
95	`sspr.tesla.com`	403	Self-service password reset behind 403 — top-tier account-takeover risk
95	`vehicle-files.prd.usw2.vn.cloud.tesla.com`	403	Production vehicle-file storage behind auth

The rest of the long tail (marketing, CDN edges, redirects) sits comfortably below 50 — exactly where you want it during triage.

The pipeline at a glance

                    ┌──────────────────────────────────────┐
   subsift scan ──▶ │ ScanOrchestrator                     │
   POST /scans  ──▶ │   1. enumerate (subfinder + crt.sh)  │
   POST /ui/scans   │   2. dedupe + scope-filter (RFC1035) │
                    │   3. upsert subdomains (+ junction)  │
                    │   4. probe (httpx PD: code/tech/ip)  │
                    │   5. score 0-100 (Ollama / Claude)   │
                    │   6. persist Probe + ScoreResult     │
                    └──────────────────────────────────────┘
                          │
   ┌──────────────────────┼──────────────────────────────────┐
   ▼                      ▼                                  ▼
 CLI tables            HTML UI at /ui                  JSON API at /scans
 (Rich, ranked)        (HTMX, ranked, filterable)      (REST, paginated)
                                                          │
                                                          └─▶ exports:
                                                              .json .csv .txt .md

Tool	Output	Ranking
`subfinder`	raw subdomain list	none
`amass`	subdomains + DNS data	none
`httpx`	live hosts + tech	by status code
SubSift	subdomains + probes + LLM scores + diffs over time	by interestingness, with reasoning + history

Enumeration sources

Seven passive sources run concurrently behind an asyncio Semaphore. Each is a Protocol impl — adding more is a one-file change (no schema migration: the scan records its sources in a single sources_used column).

Source	Kind	Default in registry	Notes
`subfinder`	ProjectDiscovery binary	yes	broad passive recon, fast
`crtsh`	Certificate Transparency logs	yes	finds names from TLS certs only
`wayback`	Internet Archive CDX API	yes	historical URLs → hostnames
`otx`	AlienVault OTX passive DNS	yes	optional API key boosts rate limits
`amass`	OWASP binary, `-passive` mode	yes	slower but very thorough
`anubis`	jldc.me Anubis DB	yes	free JSON API, no key
`hackertarget`	HackerTarget hostsearch	yes	free, rate-limited (fails soft)

Use subsift scan example.com -s crtsh -s wayback to run a subset. Sources whose binaries aren't installed (amass, subfinder) fail soft and the scan continues with the rest.

Quickstart

One-line scan

subsift scan example.com

The first time you run this it'll enumerate (crt.sh), probe live hosts (httpx), then ask the LLM to score each subdomain. Output:

 scan_id       1
 domain        example.com
 duration      18.42s
 total unique  87
 inserted      87
 updated       0
 probes        62 persisted

Per-source results
┌──────────┬────────┬───────┬────────┐
│ Source   │ Status │ Count │ Time   │
├──────────┼────────┼───────┼────────┤
│ crtsh    │ ok     │   142 │ 4.10s  │
│ subfinder│ ok     │    71 │ 6.85s  │
└──────────┴────────┴───────┴────────┘

LLM scoring
┌──────────┬───────────────────┬──────┬───────────┬────────┐
│ Provider │ Model             │ Stat │ Persisted │ Time   │
├──────────┼───────────────────┼──────┼───────────┼────────┤
│ ollama   │ llama3.2:3b       │ ok   │       62  │ 4.92s  │
└──────────┴───────────────────┴──────┴───────────┴────────┘

Then subsift scores 1 to see the ranked table (highest first):

Score  FQDN                            Reasoning
  92   admin.staging.example.com       admin keyword + 401 auth boundary
  88   jenkins.example.com             exposed Jenkins UI, default branding
  74   gitlab-internal.example.com     internal name leaked publicly
  ...
  12   www.example.com                 marketing site behind CDN

Web UI

subsift serve --reload
# open http://localhost:8000/ui

Three pages: home (recent scans + form), scan detail (ranked table with live filter + export buttons + polling badge), diff view (added/removed/score-changed buckets).

Diff against last week's scan

subsift diff --domain example.com
# or explicitly:
subsift diff 1 2 --threshold 20

Shows what appeared, what disappeared, and which scores moved significantly between two scans of the same domain.

Alert when a high-score subdomain appears

Wire a webhook so SubSift pings you (Slack, Discord, PagerDuty, your own endpoint) the moment a new finding with score ≥ 80 lands:

subsift alerts add "admin-watch" "https://hooks.slack.com/..." \
    --domain example.com --min-score 80 --trigger added
subsift alerts test 1   # synthetic payload, audited in alert_deliveries

Then cron a nightly scan:

0 3 * * *  subsift scan example.com --no-score

Every scan that has a previous scan to diff against evaluates every active rule against the diff and POSTs a JSON payload to webhooks whose threshold matched. Failures are isolated — one broken endpoint never affects other rules or the scan itself, and every attempt (sent / failed / skipped) gets a row in alert_deliveries for audit.

Install

Requirements

Python 3.11+ with uv for dependency management
ProjectDiscovery binaries (subfinder, httpx, dnsx) — install with Go, see docs/CONFIGURATION.md
LLM — choose one:
- Ollama running locally with any 3B+ instruct model (default, free)
- Anthropic API key — opt-in via SUBSIFT_LLM_PROVIDER=claude

From PyPI

pip install subsift          # or: uv tool install subsift / pipx install subsift
subsift init-db              # create the local SQLite schema
subsift scan example.com

Optional extras: pip install "subsift[screenshots]" (Playwright capture + thumbnails) and pip install "subsift[storage-s3]" (S3-compatible blob storage). You still need the ProjectDiscovery binaries on PATH and an LLM (Ollama running locally, or an API key) — see Requirements above.

From source (for development)

git clone https://github.com/Ataraxia-ia-labs/Subsift.git
cd subsift
cp .env.example .env
uv sync
uv run alembic upgrade head   # migration-managed schema (vs. init-db)
uv run subsift --help

Docker (when WSL2 / Docker Desktop is available)

cp .env.example .env
docker compose up --build -d
docker compose exec ollama ollama pull llama3.2:3b
curl http://localhost:8000/health

The docker-compose.yml ships an Ollama service alongside the app so a fresh clone works without external dependencies.

Configuration

Everything is driven by environment variables prefixed SUBSIFT_. Copy .env.example to .env and edit. Full reference in docs/CONFIGURATION.md.

Key knobs:

Variable	Default	What it does
`SUBSIFT_LLM_PROVIDER`	`ollama`	`ollama` (local) or `claude` (API)
`SUBSIFT_OLLAMA_MODEL`	`llama3.1:8b`	Any chat-completion model your Ollama has
`SUBSIFT_ANTHROPIC_API_KEY`	—	Required when provider = `claude`
`SUBSIFT_TOOL_RUNNER`	`native`	`native` (binaries on PATH) or `docker` (image per tool)
`SUBSIFT_HTTPX_BIN`	`httpx`	Absolute path needed on Windows — see docs/CONFIGURATION.md

Deploy (private, on Fly.io)

SubSift ships a complete production-deploy story: HTTPBasic-gated app on Fly.io (São Paulo region), OpenAI (gpt-5-mini) as the LLM by default (Claude and Ollama are one secret swap away), persistent SQLite volume, and idle machines auto-stopped to keep the bill at ~$0 for personal use.

fly auth login
fly apps create subsift
fly volumes create subsift_data --region gru --size 1
fly secrets set \
    SUBSIFT_AUTH_PASSWORD="$(openssl rand -base64 24)" \
    SUBSIFT_OPENAI_API_KEY="sk-..."
fly deploy

Full step-by-step — smoke tests, log tailing, password rotation, volume resizing, the Tailscale-to-local-Ollama variant — in docs/DEPLOY.md. The committed fly.toml already wires the release-command Alembic migration, the volume mount at /app/data, the /health check, and auto-stop on idle.

Documentation

docs/ARCHITECTURE.md — module layout, request lifecycle, data model.
docs/CLI.md — every command, every flag, examples.
docs/API.md — REST endpoint reference with curl examples.
docs/CONFIGURATION.md — env var reference + install troubleshooting.
docs/DEPLOY.md — Fly.io deploy guide, secrets, cost expectations.
CHANGELOG.md — release notes.
CONTRIBUTING.md — dev setup, style, PR workflow.
CODE_OF_CONDUCT.md — Contributor Covenant 2.1.
SECURITY.md — vulnerability disclosure + responsible-use.
DISCLAIMER.md — ethical-use terms + legal warning.

Roadmap

Phase	Status
1 — Scaffolding (Python 3.11, FastAPI, SQLModel, uv)	:white_check_mark:
2 — Enumeration + persistence (subfinder, crt.sh, SQLite, Alembic)	:white_check_mark:
3 — Probing + enrichment (httpx PD)	:white_check_mark:
4 — LLM scoring (Ollama + Claude via tool-use)	:white_check_mark:
5 — Web UI (Jinja2 + HTMX + Alpine + compiled Tailwind)	:white_check_mark:
6 — Exports (JSON / CSV / TXT / Markdown)	:white_check_mark:
7 — Historical diffs with junction table	:white_check_mark:
8 — Docs + v0.1.0-alpha release	:white_check_mark:
9 — Webhook alerts on new high-scored findings	:white_check_mark:
10 — Wayback + Amass + AlienVault OTX enumerators	:white_check_mark:
11a — Screenshot capture per probe (Playwright, local storage)	:white_check_mark:
11b — Storage abstraction (S3-compatible) + thumbnails	:white_check_mark:
12 — HTTPBasic auth + Fly.io deploy (gru, persistent volume, auto-stop)	:white_check_mark:

Architecture notes

Enumerator Protocol + a registry — adding a new source is one file (see src/subsift/core/enumerators/crtsh.py for the smallest example).
Prober and LLMClient Protocols for the same reason — swap httpx for naabu, swap Ollama for OpenAI, no orchestrator changes.
ToolRunner abstraction so binaries can run native or via Docker without the wrappers caring.
Repository pattern so the CLI / API / UI never construct SQL — testable with an in-memory engine.
Junction table scan_subdomains so diffs are set operations, not heuristics over first_seen boundaries.

Quality gates

Every push to main runs:

pre-commit run --all-files — ruff (lint + format), mypy --strict, detect-secrets, file hygiene.
pytest --cov on Python 3.11 and 3.12.
uvx pip-audit --strict over exported runtime deps — fails on any known CVE.
docker build --target runtime followed by a /health smoke-test inside the container.

Local: make check (POSIX) or scripts\tasks.ps1 check (Windows) reproduces lint + types + tests in one shot.

Legal

SubSift is for authorised security testing only — bug bounty programs, your own assets, contracted pentests, CTFs. Unauthorised scanning of third-party infrastructure may violate the Computer Fraud and Abuse Act (US), the Computer Misuse Act (UK), and equivalent legislation elsewhere. You are responsible for your use. Full terms in DISCLAIMER.md.

License

SubSift is copyleft: if you run a modified version as a network service, the AGPL requires you to offer that modified source to its users. This keeps the free/core tier open while leaving room for a separately-licensed Pro tier.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

KaiserCode

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0a6 pre-release

May 26, 2026

0.1.0a5 pre-release

May 25, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

subsift-0.1.0a6.tar.gz (542.3 kB view details)

Uploaded May 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

subsift-0.1.0a6-py3-none-any.whl (199.3 kB view details)

Uploaded May 26, 2026 Python 3

File details

Details for the file subsift-0.1.0a6.tar.gz.

File metadata

Download URL: subsift-0.1.0a6.tar.gz
Upload date: May 26, 2026
Size: 542.3 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for subsift-0.1.0a6.tar.gz
Algorithm	Hash digest
SHA256	`05151c7b1d182a4f8b3d5500e6bf24890b739e77cae75cf2fa77b3ad37621bf3`
MD5	`99bcdac8c1a98fbceb114fa08bfea0d7`
BLAKE2b-256	`8538beb6f720a8e289cd07c57a112b5c7d67d957a81e16e0ca6726c86f572a3e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for subsift-0.1.0a6.tar.gz:

Publisher: publish.yml on Ataraxia-ia-labs/Subsift

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: subsift-0.1.0a6.tar.gz
- Subject digest: 05151c7b1d182a4f8b3d5500e6bf24890b739e77cae75cf2fa77b3ad37621bf3
- Sigstore transparency entry: 1631843795
- Sigstore integration time: May 26, 2026
Source repository:
- Permalink: Ataraxia-ia-labs/Subsift@a1910e5823b6896e5e946fdf3798f669a2f6dcbb
- Branch / Tag: refs/tags/v0.1.0a6
- Owner: https://github.com/Ataraxia-ia-labs
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@a1910e5823b6896e5e946fdf3798f669a2f6dcbb
- Trigger Event: release

File details

Details for the file subsift-0.1.0a6-py3-none-any.whl.

File metadata

Download URL: subsift-0.1.0a6-py3-none-any.whl
Upload date: May 26, 2026
Size: 199.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for subsift-0.1.0a6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`15b9f33ba2130f27bf43506ac263dd02c57854c42bac4ed99b7fa1e4b460d43d`
MD5	`f38c2baf67e5c271f9e9701b0351b8ff`
BLAKE2b-256	`de45f20d81a9c129eb9875a0a4b3678dd08e85a30d11251444cd76e05f327dc4`

See more details on using hashes here.

Provenance

The following attestation bundles were made for subsift-0.1.0a6-py3-none-any.whl:

Publisher: publish.yml on Ataraxia-ia-labs/Subsift

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: subsift-0.1.0a6-py3-none-any.whl
- Subject digest: 15b9f33ba2130f27bf43506ac263dd02c57854c42bac4ed99b7fa1e4b460d43d
- Sigstore transparency entry: 1631843827
- Sigstore integration time: May 26, 2026
Source repository:
- Permalink: Ataraxia-ia-labs/Subsift@a1910e5823b6896e5e946fdf3798f669a2f6dcbb
- Branch / Tag: refs/tags/v0.1.0a6
- Owner: https://github.com/Ataraxia-ia-labs
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@a1910e5823b6896e5e946fdf3798f669a2f6dcbb
- Trigger Event: release

subsift 0.1.0a6

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

SubSift

Why

Real-world result — tesla.com, 2 m 11 s

The pipeline at a glance

Enumeration sources

Quickstart

One-line scan

Web UI

Diff against last week's scan

Alert when a high-score subdomain appears

Install

Requirements

From PyPI

From source (for development)

Docker (when WSL2 / Docker Desktop is available)

Configuration

Deploy (private, on Fly.io)

Documentation

Roadmap

Architecture notes

Quality gates

Legal

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Real-world result — `tesla.com`, 2 m 11 s