Domain allowlist, per-domain rate caps, response audit, and a Streamlit dashboard for Bright Data scraping agents. The picks-and-shovels layer for any agent that consumes the live web.

These details have not been verified by PyPI

Project links

Project description

birddog

Audited Bright Data egress for AI agents. Drop one context manager around an agent that scrapes the web and you get:

Domain allowlist — deny everything outside it, log the attempt
Per-domain rate caps — simple token bucket per host
Response audit log — one JSONL line per fetch (url, status, bytes, ms)
Bright Data Web Unlocker proxy — opt-in: route via Bright Data
Streamlit dashboard — point it at the JSONL, get per-host bytes, denial counts, latency p50

Built for the kind of agent that hits live sites: research bots, price trackers, RAG ingest jobs. If you've ever watched an agent rip through a sponsor's free tier in 30 seconds, this is for you.

Install

pip install birddog                    # core
pip install "birddog[dashboard]"       # + Streamlit dashboard

Python 3.10+.

Why

LLM agents don't know what a sane scraping cadence looks like. They'll hammer a site, ignore robots.txt, follow links into spammy subdomains, and burn through a Bright Data quota in a single run.

birddog puts a leash on the egress side:

Concern	What birddog does
Wandering off-domain	Allowlist with `example.com` + `*.example.com`
Burst scraping	Token bucket per host (qps + burst)
"What did it fetch?"	JSONL audit log, one event per fetch
Anti-bot blocks	Optional Bright Data Web Unlocker proxy
Post-run review	Bundled Streamlit dashboard

It does not parse HTML, manage cookies, render JS, or rotate user agents. That's what Bright Data + your scraping code are for.

Usage

from birddog import Birddog

bd = Birddog(
    allowed_domains={"docs.brightdata.com", "*.example.com"},
    per_domain_qps=1.0,
    per_domain_burst=2.0,
    audit_path="runs/scrape.jsonl",
    # Optional — route through Bright Data Web Unlocker:
    bright_data={
        "host": "brd.superproxy.io:33335",
        "username": "brd-customer-...-zone-web_unlocker",
        "password": "...",
    },
)

with bd.session("research-bot") as s:
    r = s.fetch("https://docs.brightdata.com/api")
    print(r.status, r.bytes_len, "bytes")

    # second hit within 1s -> RateLimitedError (qps cap = 1)
    s.fetch("https://docs.brightdata.com/pricing")

    # off-allowlist -> DomainDeniedError, also logged
    s.fetch("https://evil.example/exfil")

FetchResult carries url, status, text, headers, elapsed_ms, and a via_brightdata flag so downstream code can tell whether the response came through the proxy.

Audit log

One JSON object per line, e.g.:

{"ts":1747779600.12,"session_id":"research-bot","kind":"fetch_ok",
 "url":"https://docs.brightdata.com/api","host":"docs.brightdata.com",
 "status":200,"bytes":4221,"elapsed_ms":312.4}
{"ts":1747779600.45,"session_id":"research-bot","kind":"domain_denied",
 "url":"https://evil.example/exfil","host":"evil.example",
 "error":"host 'evil.example' not in allowlist"}

Kinds: session_open, fetch_ok, fetch_failed, domain_denied, rate_limited, session_close.

Dashboard

pip install "birddog[dashboard]"
streamlit run -m birddog.dashboard -- --audit runs/scrape.jsonl

Shows total fetches, denials, bytes, and a per-host breakdown of fetches + bytes + p50 latency.

Demos

Two runnable examples in examples/:

1. Smoke test — scrape_demo.py

python examples/scrape_demo.py

Hits each feature once: happy path, domain denial, rate-limit burst, summary. Offline via httpx.MockTransport.

2. Realistic agent — watchdog_agent.py

python examples/watchdog_agent.py

A small price-tracker agent. Polls a watchlist of product pages, extracts prices, alerts when something moves more than a per-product threshold. Three passes show:

allowlist denials (off-domain mirror URL is dropped)
per-domain rate cap kicking in on pass 3
threshold alerts (Δ -6.4% > 3.0%)
a runs/watchdog.jsonl audit log you can dashboard

Set BIRDDOG_USE_BRIGHTDATA=1 + your Bright Data Web Unlocker env vars to flip the demo to a real proxy.

Companion libraries

birddog is the egress half of a small agent-stack:

agentleash — USD/call budget cap + tool-arg schema gate
agentvet — tool-arg validation with LLM-friendly retry hints
agentsnap — snapshot tests for agent traces
agenttrace — cost + latency aggregation per run

Pair birddog with agentleash and you have egress allowlist + budget cap on the same agent.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.0

May 21, 2026

This version

0.1.0

May 20, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

birddog-0.1.0.tar.gz (12.3 kB view details)

Uploaded May 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

birddog-0.1.0-py3-none-any.whl (9.5 kB view details)

Uploaded May 20, 2026 Python 3

File details

Details for the file birddog-0.1.0.tar.gz.

File metadata

Download URL: birddog-0.1.0.tar.gz
Upload date: May 20, 2026
Size: 12.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for birddog-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`6ed459047b1007606b2e16bb185374e4fc9c7914f0e9b72041610fef9c8b2fe2`
MD5	`fea76af8be4b6107654274d3a17837b4`
BLAKE2b-256	`c3a919a59a1d0932269c9a0b7667f54a90b5ac2915ce371ff2a648bdca9dc80a`

See more details on using hashes here.

File details

Details for the file birddog-0.1.0-py3-none-any.whl.

File metadata

Download URL: birddog-0.1.0-py3-none-any.whl
Upload date: May 20, 2026
Size: 9.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for birddog-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1b89fb5de280bfa895a6e1cb5ebfd55295c695d0d84e79e7836bf54e65565e17`
MD5	`e6d9250a1985ce48df0157705657cf8e`
BLAKE2b-256	`c655170dc27bed0d8dbac921060d8cf2b2e5bc4f77e0ab21588c71170a7d79d7`

See more details on using hashes here.

birddog 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

birddog

Install

Why

Usage

Audit log

Dashboard

Demos

Companion libraries

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes