Skip to main content

Check whether AI agents can crawl a website. Filter URLs by agent-friendliness before sending them to your computer-use agent.

Project description

guestlist

Filter URLs by whether AI agents can crawl them — before sending them to your computer-use agent.

Most computer-use agents are still routinely blocked by Cloudflare, Akamai, DataDome, and friends. When your agent does a Google search and gets 10 results, the top 3 might all silently fail. guestlist lets you ask, in one call, which of these will actually let an agent in?

from guestlist import check

results = check([
    "https://example.com",
    "https://instagram.com",
    "https://en.wikipedia.org/wiki/Web_scraping",
])

for r in results:
    print(r.url, r.tier, r.success_rate)
https://example.com           green   0.98
https://instagram.com         red     0.04
https://en.wikipedia.org/...  green   0.99

Install

pip install guestlist-tools

Quickstart

Set your API key (get one at guestlist.tools):

export GUESTLIST_API_KEY=gst_...

Then:

from guestlist import check, Tier

results = check(["https://example.com", "https://instagram.com"])

allowed = [r.url for r in results if r.tier in (Tier.GREEN, Tier.YELLOW)]

check() accepts up to 500 URLs per call and auto-batches them into requests of 100. Passing more than 500 raises ValueError. A bare string is treated as a single URL. An empty list returns an empty list with no HTTP request.

What you get back

@dataclass(frozen=True)
class Result:
    url: str                    # echoed from the request
    domain: str                 # registrable domain the API matched against
    tier: Tier                  # green | yellow | orange | red | unknown
    success_rate: float | None  # successes / samples over the last 90 days
    n_samples: int              # how many probes back that rate
    confidence: float           # 0.0 to 1.0, scales with sample count
    blocker_detected: Blocker | None
    last_tested_at: datetime | None

Tier bands

Tier What it means
green Agents work reliably.
yellow Agents usually work; expect some friction.
orange Agents are often blocked.
red Agents are almost always blocked.
unknown Not enough data yet.

About unknown

The dataset doesn't cover every domain on the web yet. You'll see tier="unknown" for sites we haven't probed enough times to be confident. Treat it as "no signal" rather than "safe to crawl." Coverage is expanding.

How URLs are matched

Each URL is resolved to its registrable domain (e.g. https://api.x.com/users/123x.com), and the verdict is for that domain's apex page. The path, query, fragment, and subdomain are not currently part of the lookup. The Result.domain field shows what we matched against.

For example: instagram.com is apex-blocked (hard login wall) even though deep public paths like instagram.com/user/p/<post_id> may load fine. Per-path tiering is on the v2 roadmap.

Different effective TLDs are distinguished: bbc.co.uk is not the same as bbc.com.

Configuration

Setting Constructor arg Env var Default
API key api_key= GUESTLIST_API_KEY (required)
Base URL base_url= GUESTLIST_API_BASE https://api.guestlist.tools
Timeout timeout= 30.0 seconds

For finer control, instantiate Guestlist directly:

from guestlist import Guestlist

with Guestlist(api_key="gst_...", timeout=10.0) as gl:
    results = gl.check([url1, url2])

Errors

All errors inherit from GuestlistError:

Class When
ConfigError Missing API key or malformed base URL.
AuthenticationError API returned 401 (bad key).
RateLimitError API returned 429, retry exhausted. .retry_after may be set.
APIError Other 4xx/5xx after any retries. .status_code, .detail.
NetworkError Connection or timeout error after retry.

Retry policy:

  • 429: respects Retry-After, one retry.
  • 5xx: exponential backoff (250 ms, 1 s), two retries.
  • Network / timeout: one retry.
  • 4xx other than 429: no retry.

Status

guestlist is a small public API on top of an ongoing data-collection effort. The library surface is stable; the underlying dataset is growing. If you hit a domain you'd like covered, open an issue.

License

MIT.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

guestlist_tools-0.2.0.tar.gz (14.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

guestlist_tools-0.2.0-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file guestlist_tools-0.2.0.tar.gz.

File metadata

  • Download URL: guestlist_tools-0.2.0.tar.gz
  • Upload date:
  • Size: 14.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.18

File hashes

Hashes for guestlist_tools-0.2.0.tar.gz
Algorithm Hash digest
SHA256 ffc07c868a404075826b3a0d1f547aa11b62c0056015265d44d9f6a7962ec214
MD5 53fe45b6b84caf8dd829b30237a9bb7c
BLAKE2b-256 69f3681e95b6155e9796109c5815394ac10179e9a90a56280d806315ae206fb8

See more details on using hashes here.

File details

Details for the file guestlist_tools-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for guestlist_tools-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 186dfe0843f7760ff050f963579f1b629deacd729c8acef147919dbf36fc4558
MD5 ee7d8f75e79c978691d5cd64e8d2158c
BLAKE2b-256 f1da5949aa133a4398c4819f52eb474b76818723889d38346aa241b81eef38e3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page