Check whether AI agents can crawl a website. Filter URLs by agent-friendliness before sending them to your computer-use agent.
Project description
guestlist
Filter URLs by whether AI agents can crawl them — before sending them to your computer-use agent.
Most computer-use agents are still routinely blocked by Cloudflare, Akamai, DataDome, and friends. When your agent does a Google search and gets 10 results, the top 3 might all silently fail. guestlist lets you ask, in one call, which of these will actually let an agent in?
from guestlist import check
results = check([
"https://example.com",
"https://instagram.com",
"https://en.wikipedia.org/wiki/Web_scraping",
])
for r in results:
print(r.url, r.tier, r.success_rate)
https://example.com green 0.98
https://instagram.com red 0.04
https://en.wikipedia.org/... green 0.99
Install
pip install guestlist-tools
Quickstart
Set your API key (get one at guestlist.tools):
export GUESTLIST_API_KEY=gst_...
Then:
from guestlist import check, Tier
results = check(["https://example.com", "https://instagram.com"])
allowed = [r.url for r in results if r.tier in (Tier.GREEN, Tier.YELLOW)]
check() accepts up to 500 URLs per call and auto-batches them into requests of 100. Passing more than 500 raises ValueError. A bare string is treated as a single URL. An empty list returns an empty list with no HTTP request.
What you get back
@dataclass(frozen=True)
class Result:
url: str # echoed from the request
domain: str # registrable domain the API matched against
tier: Tier # green | yellow | orange | red | unknown
success_rate: float | None # successes / samples over the last 90 days
n_samples: int # how many probes back that rate
confidence: float # 0.0 to 1.0, scales with sample count
blocker_detected: Blocker | None
last_tested_at: datetime | None
Tier bands
| Tier | What it means |
|---|---|
green |
Agents work reliably. |
yellow |
Agents usually work; expect some friction. |
orange |
Agents are often blocked. |
red |
Agents are almost always blocked. |
unknown |
Not enough data yet. |
About unknown
The dataset doesn't cover every domain on the web yet. You'll see tier="unknown" for sites we haven't probed enough times to be confident. Treat it as "no signal" rather than "safe to crawl." Coverage is expanding.
How URLs are matched
Each URL is resolved to its registrable domain (e.g. https://api.x.com/users/123 → x.com), and the verdict is for that domain's apex page. The path, query, fragment, and subdomain are not currently part of the lookup. The Result.domain field shows what we matched against.
For example: instagram.com is apex-blocked (hard login wall) even though deep public paths like instagram.com/user/p/<post_id> may load fine. Per-path tiering is on the v2 roadmap.
Different effective TLDs are distinguished: bbc.co.uk is not the same as bbc.com.
Configuration
| Setting | Constructor arg | Env var | Default |
|---|---|---|---|
| API key | api_key= |
GUESTLIST_API_KEY |
(required) |
| Base URL | base_url= |
GUESTLIST_API_BASE |
https://api.guestlist.tools |
| Timeout | timeout= |
— | 30.0 seconds |
For finer control, instantiate Guestlist directly:
from guestlist import Guestlist
with Guestlist(api_key="gst_...", timeout=10.0) as gl:
results = gl.check([url1, url2])
Errors
All errors inherit from GuestlistError:
| Class | When |
|---|---|
ConfigError |
Missing API key or malformed base URL. |
AuthenticationError |
API returned 401 (bad key). |
RateLimitError |
API returned 429, retry exhausted. .retry_after may be set. |
APIError |
Other 4xx/5xx after any retries. .status_code, .detail. |
NetworkError |
Connection or timeout error after retry. |
Retry policy:
429: respectsRetry-After, one retry.5xx: exponential backoff (250 ms, 1 s), two retries.- Network / timeout: one retry.
4xxother than 429: no retry.
Status
guestlist is a small public API on top of an ongoing data-collection effort. The library surface is stable; the underlying dataset is growing. If you hit a domain you'd like covered, open an issue.
License
MIT.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file guestlist_tools-0.2.0.tar.gz.
File metadata
- Download URL: guestlist_tools-0.2.0.tar.gz
- Upload date:
- Size: 14.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ffc07c868a404075826b3a0d1f547aa11b62c0056015265d44d9f6a7962ec214
|
|
| MD5 |
53fe45b6b84caf8dd829b30237a9bb7c
|
|
| BLAKE2b-256 |
69f3681e95b6155e9796109c5815394ac10179e9a90a56280d806315ae206fb8
|
File details
Details for the file guestlist_tools-0.2.0-py3-none-any.whl.
File metadata
- Download URL: guestlist_tools-0.2.0-py3-none-any.whl
- Upload date:
- Size: 11.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.18
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
186dfe0843f7760ff050f963579f1b629deacd729c8acef147919dbf36fc4558
|
|
| MD5 |
ee7d8f75e79c978691d5cd64e8d2158c
|
|
| BLAKE2b-256 |
f1da5949aa133a4398c4819f52eb474b76818723889d38346aa241b81eef38e3
|