Fast, regex-free crawler detection from user agents. Zero deps, ReDoS-safe heuristics, ~40× faster than alternatives.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

tn3w.dev

These details have not been verified by PyPI

Project links

Homepage

Project description

is-crawler

Fast, regex-free crawler detection from user agents. Zero deps, ReDoS-safe heuristics, ~40× faster than alternatives.

Docs & live demo: is-crawler.tn3w.dev

Why regex-free?

Regex is a frequent source of ReDoS vulnerabilities, one un-anchored .* or nested quantifier against a hostile UA can spike CPU to seconds. Crawler detection runs on every request, so a catastrophic pattern is a denial-of-service primitive. is-crawler implements all heuristics with str.find + char scans. No regex engine, no backtracking, no ReDoS surface. crawler_info uses re only to match against curated DB patterns (monperrus/crawler-user-agents) which are simple literals (e.g. Googlebot\/, bingbot, AdsBot-Google([^-]|$), [wW]get), no nested quantifiers, no catastrophic backtracking paths.

Install

pip install is-crawler

Usage

from is_crawler import (
    is_crawler, crawler_signals, crawler_info, crawler_has_tag,
    crawler_name, crawler_version, crawler_url, CrawlerInfo,
)

ua = "Googlebot/2.1 (+http://www.google.com/bot.html)"

is_crawler(ua)                              # True
crawler_signals(ua)                         # ['bot_signal', 'no_browser_signature', 'url_in_ua']
crawler_name(ua)                            # 'Googlebot'
crawler_version(ua)                         # '2.1'
crawler_url(ua)                             # 'http://www.google.com/bot.html'

info = crawler_info(ua)                     # CrawlerInfo(...)
if info is not None:
    info.url                                # 'http://www.google.com/bot.html'
    info.description                        # "Google's main web crawling bot..."
    info.tags                               # ('search-engine',)

crawler_has_tag(ua, "search-engine")        # True
crawler_has_tag(ua, ["ai-crawler", "seo"])  # False

API

`is_crawler(ua: str) -> bool`

Heuristic detection. Returns True if the UA is a crawler. No DB lookup, no regex.

Three short-circuit rules:

Positive signal: bot keywords (bot, crawl, spider, scrape, headless, slurp, archiv, preview, ...), known tools (playwright, selenium, wget, lighthouse, sqlmap, nikto, nmap, httrack, pingdom, google-safety, ...), or a URL/email embedded in the UA.
No browser signature: missing Mozilla/, WebKit, Gecko, Trident, Presto, KHTML, Links, Lynx, Opera, or an OS token like (Windows, (Linux, (X11, (Macintosh.
Bare (compatible; ...): classic bot block without OS/browser tokens inside.

`crawler_signals(ua: str) -> list[str]`

Which individual rules fired. Subset of: bot_signal, no_browser_signature, bare_compatible, known_tool, url_in_ua. Useful for diagnostics and logging. is_crawler does not call this.

`crawler_name(ua: str) -> str | None`

Product name extracted from the UA.

Googlebot/2.1 ... → 'Googlebot'
Mozilla/5.0 (compatible; bingbot/2.0; ...) → 'bingbot'
Mozilla/5.0 ... Speedy Spider (...) → 'Speedy Spider'
Chrome/Firefox/Safari → None

`crawler_version(ua: str) -> str | None`

Version token extracted from the UA. Returns None if no non-browser version is detectable.

curl/7.64.1 → '7.64.1'
Mozilla/5.0 (compatible; Miniflux/2.0.10; ...) → '2.0.10'
Googlebot/2.1 ... → '2.1'

`crawler_url(ua: str) -> str | None`

URL embedded in the UA (after +, ;, or -).

Googlebot/2.1 (+http://www.google.com/bot.html) → 'http://www.google.com/bot.html'
UA with no embedded URL → None

`crawler_info(ua: str) -> CrawlerInfo | None`

DB lookup against 646 known crawler patterns. Returns None for browsers (short-circuits via is_crawler).

class CrawlerInfo(NamedTuple):
    url: str                # crawler's info/docs URL (may be '')
    description: str        # human-readable description
    tags: tuple[str, ...]   # classification tags, e.g. ('search-engine',)

`crawler_has_tag(ua: str, tags: str | Iterable[str]) -> bool`

True if the crawler has any of the given tags. tags accepts a single string or a list.

Available tags: search-engine, ai-crawler, seo, social-preview, advertising, archiver, feed-reader, monitoring, scanner, academic, http-library, browser-automation.

Category shortcuts

One-tag wrappers over crawler_has_tag:

is_search_engine(ua)       # 'search-engine'
is_ai_crawler(ua)          # 'ai-crawler'
is_seo(ua)                 # 'seo'
is_social_preview(ua)      # 'social-preview'
is_advertising(ua)         # 'advertising'
is_archiver(ua)            # 'archiver'
is_feed_reader(ua)         # 'feed-reader'
is_monitoring(ua)          # 'monitoring'
is_scanner(ua)             # 'scanner'
is_academic(ua)            # 'academic'
is_http_library(ua)        # 'http-library'
is_browser_automation(ua)  # 'browser-automation'

`is_good_crawler(ua)` / `is_bad_crawler(ua)`

Opinionated groupings for quick allow/deny gates.

Good (indexing, previews, archives, feeds, research): search-engine, social-preview, feed-reader, archiver, academic.
Bad (scraping, scanning, unattributed traffic): ai-crawler, scanner, http-library, browser-automation, seo.

advertising and monitoring are intentionally neither: policy-dependent.

Middleware

from is_crawler import is_crawler, crawler_has_tag

@app.before_request
def gate():
    ua = request.headers.get("User-Agent", "")
    if crawler_has_tag(ua, "ai-crawler"):
        abort(403)
    if is_crawler(ua):
        log_crawler(ua)

`robots.txt` helpers

Generate directives from DB tags. Names extracted from DB patterns (slash/URL-only entries skipped).

from is_crawler import build_robots_txt, robots_agents_for_tags, iter_crawlers

robots_agents_for_tags("ai-crawler")
# ['AI2Bot', 'Applebot-Extended', 'Bytespider', 'CCBot', 'ChatGPT-User', 'Claude-Web', 'GPTBot', ...]

print(build_robots_txt(disallow=["ai-crawler", "scanner"]))
# User-agent: GPTBot
# Disallow: /
#
# User-agent: Nikto
# Disallow: /
# ...

build_robots_txt(allow="search-engine", path="/public")
# User-agent: Googlebot
# Allow: /public
# ...

for info, name in iter_crawlers():      # (CrawlerInfo, robots-name) per DB entry
    ...

CLI

python -m is_crawler "Googlebot/2.1 (+http://www.google.com/bot.html)"
tail -f access.log | awk -F'"' '{print $6}' | python -m is_crawler

One JSON object per UA (arg or stdin line) with is_crawler, name, version, url, signals, info.

Caching

Every public function has a 32k-entry LRU cache. Repeat UAs hit in ~40 ns.

Benchmarks

Python 3.14, Linux x86_64. Corpus: 1,231 crawler UAs, 15,812 browser UAs. cua = crawler-user-agents v1.44.

Hot-path (warm cache)

Function	is_crawler	cua	speedup
`is_crawler` (mixed)	0.05 µs	158.9 µs	3000×
`crawler_info`	0.60 µs	732.0 µs	1220×
`crawler_signals`	1.13 µs	-	-
`crawler_name`	0.33 µs	-	-
`crawler_version`	0.32 µs	-	-
`crawler_url`	0.09 µs	-	-
`crawler_has_tag`	0.10 µs	-	-

Cold-cache (per-call, no LRU hits)

Function	Test Case	is_crawler	cua	speedup
`is_crawler`	crawlers	1.94 µs	64.35 µs	33×
`is_crawler`	browsers	1.85 µs	183.76 µs	99×
`is_crawler`	mixed	1.85 µs	176.94 µs	96×
`crawler_info`	-	2.07 µs	733.4 µs	354×
`crawler_name`	-	1.36 µs	-	-
`crawler_version`	-	1.37 µs	-	-
`crawler_url`	-	0.29 µs	-	-

Cold-start

Module	Cold-start
`is_crawler`	1.29 ms
`crawleruseragents`	0.80 ms

DB patterns compile lazily per 48-entry chunk on first match.

Formatting

pip install black isort
isort . && black .
npx prtfm

License

Apache-2.0

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

tn3w.dev

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.5.18

May 6, 2026

1.5.17

May 5, 2026

1.5.16

May 3, 2026

1.5.15

May 3, 2026

1.5.14

May 2, 2026

1.5.13

May 2, 2026

1.5.12

May 1, 2026

1.5.11

May 1, 2026

1.5.10

Apr 29, 2026

1.5.9.4

Apr 29, 2026

1.5.9.3

Apr 28, 2026

1.5.9.2

Apr 27, 2026

1.5.9.1

Apr 26, 2026

1.5.9

Apr 26, 2026

1.5.8

Apr 26, 2026

1.5.7.7

Apr 26, 2026

1.5.7.6

Apr 26, 2026

1.5.7.5

Apr 26, 2026

1.5.7.4

Apr 25, 2026

1.5.7.3

Apr 25, 2026

1.5.7.2

Apr 25, 2026

1.5.7.1

Apr 25, 2026

1.5.7

Apr 25, 2026

1.5.6.1

Apr 25, 2026

1.5.6

Apr 25, 2026

1.5.5.1

Apr 25, 2026

1.5.5

Apr 25, 2026

1.5.4

Apr 25, 2026

1.5.3.4

Apr 25, 2026

1.5.3.3

Apr 25, 2026

1.5.3.2

Apr 24, 2026

1.5.3.1

Apr 24, 2026

1.5.3

Apr 24, 2026

1.5.2.2

Apr 24, 2026

1.5.2.1

Apr 24, 2026

1.5.2

Apr 24, 2026

1.5.1.1

Apr 23, 2026

1.5.1

Apr 23, 2026

1.5.0.7

Apr 22, 2026

1.5.0.6

Apr 22, 2026

1.5.0.5

Apr 22, 2026

1.5.0.4

Apr 22, 2026

1.5.0.3

Apr 22, 2026

1.5.0.2

Apr 22, 2026

1.5.0.1

Apr 22, 2026

1.5.0

Apr 21, 2026

1.4.9.1

Apr 21, 2026

1.4.9

Apr 20, 2026

1.4.8

Apr 20, 2026

1.4.7

Apr 20, 2026

1.4.6

Apr 20, 2026

1.4.5

Apr 19, 2026

This version

1.4.4

Apr 19, 2026

1.4.3

Apr 19, 2026

1.4.2

Apr 19, 2026

1.4.1

Apr 19, 2026

1.4.0

Apr 19, 2026

1.3.7

Apr 18, 2026

1.3.6

Apr 18, 2026

1.3.5

Apr 18, 2026

1.3.4.1

Apr 18, 2026

1.3.4

Apr 18, 2026

1.3.3

Apr 17, 2026

1.3.2

Apr 16, 2026

1.3.1

Apr 16, 2026

1.3.0

Apr 15, 2026

1.2.0

Apr 14, 2026

1.1.4

Apr 14, 2026

1.1.3

Apr 14, 2026

1.1.2

Apr 13, 2026

1.1.1

Apr 13, 2026

1.1.0

Apr 13, 2026

1.0.6

Apr 9, 2026

1.0.5

Apr 9, 2026

1.0.4

Apr 9, 2026

1.0.3

Apr 9, 2026

1.0.2

Apr 9, 2026

1.0.1

Apr 9, 2026

1.0.0

Apr 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

is_crawler-1.4.4.tar.gz (39.7 kB view details)

Uploaded Apr 19, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

is_crawler-1.4.4-py3-none-any.whl (31.7 kB view details)

Uploaded Apr 19, 2026 Python 3

File details

Details for the file is_crawler-1.4.4.tar.gz.

File metadata

Download URL: is_crawler-1.4.4.tar.gz
Upload date: Apr 19, 2026
Size: 39.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for is_crawler-1.4.4.tar.gz
Algorithm	Hash digest
SHA256	`73a84a6309dc8657eabbb6a75e081f0a7b3fdb6e12019c5f3a76c6ef66d9b623`
MD5	`0e386c707f5c13b407461a7337b1d384`
BLAKE2b-256	`7b0c49133c45abbdd63bb38d354ef2004583e1fe5d2b6057a0205779dde11ba9`

See more details on using hashes here.

Provenance

The following attestation bundles were made for is_crawler-1.4.4.tar.gz:

Publisher: publish.yml on tn3w/is-crawler

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: is_crawler-1.4.4.tar.gz
- Subject digest: 73a84a6309dc8657eabbb6a75e081f0a7b3fdb6e12019c5f3a76c6ef66d9b623
- Sigstore transparency entry: 1340722436
- Sigstore integration time: Apr 19, 2026
Source repository:
- Permalink: tn3w/is-crawler@402b562fdfd8bd58e8b347924db18b1e8f98bfd5
- Branch / Tag: refs/heads/master
- Owner: https://github.com/tn3w
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@402b562fdfd8bd58e8b347924db18b1e8f98bfd5
- Trigger Event: push

File details

Details for the file is_crawler-1.4.4-py3-none-any.whl.

File metadata

Download URL: is_crawler-1.4.4-py3-none-any.whl
Upload date: Apr 19, 2026
Size: 31.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for is_crawler-1.4.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8acc71c257044c4a653c7489077f486f5282279409a32ce57b521bbb39c5946b`
MD5	`b21280de44d61eeea3afac6ef15ef13c`
BLAKE2b-256	`beafe37a27d2789e0e531fa58d283a41cd181c3e69f9691e3bfd58869aca3b0e`

See more details on using hashes here.

Provenance

The following attestation bundles were made for is_crawler-1.4.4-py3-none-any.whl:

Publisher: publish.yml on tn3w/is-crawler

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: is_crawler-1.4.4-py3-none-any.whl
- Subject digest: 8acc71c257044c4a653c7489077f486f5282279409a32ce57b521bbb39c5946b
- Sigstore transparency entry: 1340722439
- Sigstore integration time: Apr 19, 2026
Source repository:
- Permalink: tn3w/is-crawler@402b562fdfd8bd58e8b347924db18b1e8f98bfd5
- Branch / Tag: refs/heads/master
- Owner: https://github.com/tn3w
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@402b562fdfd8bd58e8b347924db18b1e8f98bfd5
- Trigger Event: push

is-crawler 1.4.4

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

is-crawler

Why regex-free?

Install

Usage

API

is_crawler(ua: str) -> bool

crawler_signals(ua: str) -> list[str]

crawler_name(ua: str) -> str | None

crawler_version(ua: str) -> str | None

crawler_url(ua: str) -> str | None

crawler_info(ua: str) -> CrawlerInfo | None

crawler_has_tag(ua: str, tags: str | Iterable[str]) -> bool

Category shortcuts

is_good_crawler(ua) / is_bad_crawler(ua)

Middleware

robots.txt helpers

CLI

Caching

Benchmarks

Hot-path (warm cache)

Cold-cache (per-call, no LRU hits)

Cold-start

Formatting

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`is_crawler(ua: str) -> bool`

`crawler_signals(ua: str) -> list[str]`

`crawler_name(ua: str) -> str | None`

`crawler_version(ua: str) -> str | None`

`crawler_url(ua: str) -> str | None`

`crawler_info(ua: str) -> CrawlerInfo | None`

`crawler_has_tag(ua: str, tags: str | Iterable[str]) -> bool`

`is_good_crawler(ua)` / `is_bad_crawler(ua)`

`robots.txt` helpers