Detect a website's anti-bot defenses and rate how hard it is to scrape.

These details have not been verified by PyPI

Project links

Project description

doorknock

Detect a website's anti-bot defenses and rate how hard it is to scrape — from the command line or from Python.

doorknock performs a single set of HTTP probes against a target URL and runs a battery of detectors that look for:

WAFs / CDNs — Cloudflare, Akamai, Sucuri, Imperva (Incapsula), F5 BIG-IP, AWS WAF, Fastly, Azure Front Door, StackPath, Wallarm, Reblaze, Barracuda, …
Bot management — DataDome, PerimeterX / HUMAN, Kasada, Shape Security (F5), Imperva ABP (Distil), Arkose Labs, Reblaze, Radware, Netacea, …
CAPTCHAs — Google reCAPTCHA / reCAPTCHA Enterprise, hCaptcha, Cloudflare Turnstile, Arkose FunCaptcha, GeeTest, DataDome captcha, custom image captchas, …
JavaScript challenges — Cloudflare "Just a moment", Incapsula challenges, Akamai sensor cookies, Kasada KPSDK, PerimeterX interstitials, "checking your browser" pages, SPA / client-only rendering
Rate-limit signals — RateLimit-*, X-RateLimit-*, Retry-After, hostile statuses (403/406/429/503)
User-Agent filtering — compares a no-UA probe against a browser-UA probe
Cookie/session requirements — large initial cookie sets, CSRF/XSRF tokens, __Host- / __Secure- cookies
TLS / HTTP version — HTTPS, HTTP/2 (relevant to JA3/JA4 fingerprinting)
robots.txt rules — Disallow: / for everyone, scraper-targeted user agents
Client-side fingerprinting — FingerprintJS, Castle, Sift, Forter, ThreatMetrix, iovation, custom canvas/WebGL/audio probes

Findings are weighted by severity and aggregated into a single scraping difficulty rating from EASY to EXTREME along with a 0–100 score.

Install

pip install doorknock

For colored, prettier CLI output you can also install the cli extras (uses Rich):

pip install "doorknock[cli]"

Requires Python 3.8+.

CLI

doorknock https://example.com

======================================================================
  Target:     https://example.com
  Final URL:  https://example.com/
  Status:     200
  Difficulty: EASY  (score 0/100)
======================================================================

  Looks easy to scrape. No meaningful anti-bot defenses were detected.
  A plain requests script with a polite User-Agent should work.

Useful flags:

Flag	Purpose
`--json`	Emit machine-readable JSON.
`--timeout 10`	HTTP timeout in seconds.
`--no-verify`	Disable TLS verification.
`--no-robots`	Skip the robots.txt fetch.
`--no-color`	Disable ANSI colors in human output.
`--user-agent "..."`	Override the User-Agent for the main probe.
`--exit-code`	Exit non-zero when difficulty is `HARD` or worse (useful in CI).

You can also run it without installing the entry point:

python -m doorknock https://example.com --json

Library usage

from doorknock import scan

result = scan("https://example.com")

print(result.difficulty)            # Difficulty.EASY
print(result.score)                  # 0..100
print(result.summary)
for f in result.findings:
    print(f.severity.value, f.category.value, f.name)

The same data is available as a plain dict / JSON:

import json
print(json.dumps(result.to_dict(), indent=2))
# or
print(result.to_json())

For more control, use the class directly:

from doorknock import AntiBotScanner

scanner = AntiBotScanner(
    timeout=10,
    user_agent="my-bot/1.0",
    extra_headers={"Accept-Language": "en-GB,en;q=0.9"},
    check_robots=True,
    probe_no_user_agent=True,
)
result = scanner.scan("https://example.com")

Difficulty buckets

Score	Difficulty	What it means
0–14	`EASY`	Nothing meaningful in the way. Plain `requests` works.
15–34	`MODERATE`	Light defenses — UA filtering, rate limits, generic CDN. Use a session and realistic headers.
35–59	`HARD`	Real WAF, CAPTCHA, or JS challenges. Plan for a real browser or proxies.
60–84	`VERY_HARD`	Layered defenses (bot management + CAPTCHA / JS challenge). Undetected browser + residential proxies likely.
85–100	`EXTREME`	Top-tier bot management (DataDome, PerimeterX, Kasada, Shape, …). Expect a serious engineering project or commercial unblockers.

How the scoring works

Every finding has a severity (info, low, medium, high, critical) which maps to a weight. Weights are summed per category and capped so one chatty detector cannot dominate the result. A small "synergy bonus" is added when multiple serious categories show up together (e.g. bot management + CAPTCHA + fingerprinting), because layered defenses are meaningfully harder than a single layer.

What it does NOT do

It does not attempt to bypass anything. It performs read-only HTTP requests (GET /, GET /robots.txt, plus a no-UA probe).
It does not execute JavaScript. Findings come purely from headers, cookies, status codes, and the raw HTML body.
It is a heuristic tool. Some defenses (TLS / JA3 fingerprinting, behavioral biometrics, server-side ML) cannot be observed from a single HTTP request and are inferred from vendor signatures. False negatives are possible — especially for in-house systems.
Detection is best-effort: real-world sites mix vendors and rebrand things constantly.

Ethics & legality

Use this tool to evaluate sites you have permission to access, to assess your own infrastructure, or for security research. Respect robots.txt, terms of service, and applicable law in your jurisdiction.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Apr 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

doorknock-0.1.0.tar.gz (26.0 kB view details)

Uploaded Apr 27, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

doorknock-0.1.0-py3-none-any.whl (30.5 kB view details)

Uploaded Apr 27, 2026 Python 3

File details

Details for the file doorknock-0.1.0.tar.gz.

File metadata

Download URL: doorknock-0.1.0.tar.gz
Upload date: Apr 27, 2026
Size: 26.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for doorknock-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`fb337d59781223664b8e858a71f712a79dbd5395bbc834e00a8d4c6927c0c3a5`
MD5	`e95b068024b34c4162746f44fcbda17a`
BLAKE2b-256	`cd8e1affa1f9278d31c485f2f14659a04cd14beed5f7b48c3ba4bf266efffb43`

See more details on using hashes here.

File details

Details for the file doorknock-0.1.0-py3-none-any.whl.

File metadata

Download URL: doorknock-0.1.0-py3-none-any.whl
Upload date: Apr 27, 2026
Size: 30.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for doorknock-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6f5d4bc676b342c26f6f20471fed2598098b43eed33a17104daad7e6ef83b360`
MD5	`4b727d69ba4b0a9dfcb8f7d62b399842`
BLAKE2b-256	`163691e96e5ac045f2950a0ad6b6ba22bf3015fa009998510d3aa9b8a4823952`

See more details on using hashes here.

doorknock 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

doorknock

Install

CLI

Library usage

Difficulty buckets

How the scoring works

What it does NOT do

Ethics & legality

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes