Python client for the Bulk URL Checker API. Skip the proxy-rotation + rate-limiter + soft-404-detector you would otherwise have to build.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

carlosofscience

These details have not been verified by PyPI

Project links

Project description

bulkurlchecker

Python client for the Bulk URL Checker API.

Skip the proxy-rotation, rate-limiter, soft-404 detector, and retry classifier you would otherwise spend two weeks building. Submit thousands of URLs, get status codes, redirect chains, and broken-link detection back as plain Python objects. Backed by a managed cloud service with residential proxies and per-domain throttling.

Install

pip install bulkurlchecker

5-line example

from bulkurlchecker import Client

client = Client(api_key="uck_live_...")
results = client.check_urls(["https://example.com", "https://example.org"])
for r in results.results:
    print(r.url, r.status_code, "BROKEN" if r.is_broken else "ok")

Get an API key at https://app.bulkurlchecker.com/dashboard/api-keys. First 300 URLs are free, no card required.

What you get back

results = client.check_urls(urls)

results.status            # 'completed' | 'paused' | 'failed' | 'cancelled'
results.timed_out         # True if the wait deadline passed (job still running)
results.total_urls        # how many URLs the engine accepted
results.completed_urls    # how many it finished checking
results.duplicates_removed
results.invalid_urls_rejected

for r in results.results:
    r.url                # the original URL you submitted
    r.final_url          # after redirects
    r.status_code        # 200, 301, 404, 429, 500, ...
    r.redirect_chain     # list of intermediate URLs
    r.is_broken          # True if the engine flagged this as broken
    r.is_soft_404        # True if 200 OK but page content says "not found"
    r.response_time_ms

# Convenience properties:
results.broken           # list of URLResult where is_broken == True
results.soft_404s        # list where is_soft_404 == True

Command-line usage

Install with the [cli] extra to get the bulkurlchecker command:

pip install 'bulkurlchecker[cli]'
export BULKURLCHECKER_API_KEY=uck_live_...

# One URL per line in urls.txt → CSV on stdout
bulkurlchecker check urls.txt > report.csv

# Pipe from anywhere
sitemap-extractor mydomain.com | bulkurlchecker check - --output jsonl > report.jsonl

# Only show broken URLs, ad-hoc input
bulkurlchecker check --urls "https://example.com,https://example.org" --only-broken

# Two-step (submit, then poll & fetch later)
JOB=$(bulkurlchecker submit urls.txt)
bulkurlchecker status "$JOB"
bulkurlchecker results "$JOB" --output csv > report.csv

Output formats: csv (default), json, jsonl. Run bulkurlchecker check --help for the full flag list.

Larger jobs: submit and poll

check_urls() blocks for up to 15 minutes server-side. For lists where the wait would time out, use the two-step pattern:

job = client.submit(my_500k_urls)
print(f"Submitted {job.job_id}, {job.total_urls} URLs queued")

# Poll explicitly, or use the convenience method
done = client.wait_until_done(job.job_id, timeout=3600)

# Stream results in pages
for batch in client.iter_results(job.job_id, page_size=1000):
    for r in batch:
        if r.is_broken:
            print(r.url, r.status_code)

Safe retries with `idempotency_key`

Pass an idempotency key on submit() or check_urls() to make retries safe under network failures. The server caches the response for 24 hours; a retry with the same key + same body returns the original result without creating a duplicate job.

import uuid

key = str(uuid.uuid4())  # generate once per logical request

# First call: creates a new job.
job = client.submit(urls, idempotency_key=key)

# Network blip, no clean response received? Just retry with the same
# key -- you'll get the SAME job summary back, no duplicate submission.
same_job = client.submit(urls, idempotency_key=key)
# job.job_id == same_job.job_id

Same idempotency_key + different urls returns 409 Conflict (raised as ValidationError) so client bugs that reuse a key against a new payload are caught loudly instead of silently mapping to the wrong cached response.

Receiving webhooks

When a job finishes, we POST to your registered endpoint with a signed payload. Verify the signature before trusting the body — anyone who knows your endpoint URL can otherwise send fake events.

from flask import Flask, request
from bulkurlchecker import verify_signature, InvalidSignatureError

SECRET = os.environ["MY_WEBHOOK_SECRET"]  # the signing_secret we showed once

app = Flask(__name__)

@app.route("/webhook/bulkurlchecker", methods=["POST"])
def webhook():
    try:
        verify_signature(
            request.get_data(),  # RAW bytes — not request.get_json()
            request.headers.get("Bulkurlchecker-Signature", ""),
            SECRET,
        )
    except InvalidSignatureError:
        return "", 401
    event = request.get_json()
    if event["type"] == "job.completed":
        job_id = event["data"]["job_id"]
        # ... fetch results, update your DB, ping Slack, etc ...
    return "", 200

Register endpoints + get secrets at https://app.bulkurlchecker.com/dashboard/webhooks (or via POST /api/v2/webhooks/endpoints).

verify_signature() enforces a 5-minute timestamp tolerance by default to defeat replays. Tune via tolerance_seconds=.

Error handling

All errors derive from BulkURLCheckerError. Catch specific subclasses when you want to branch on the failure mode:

from bulkurlchecker import (
    Client,
    BulkURLCheckerError,
    AuthenticationError,
    RateLimitError,
    QuotaError,
    ValidationError,
)

try:
    results = client.check_urls(urls)
except QuotaError as e:
    print(f"Out of credits. Top up at https://app.bulkurlchecker.com/billing")
except RateLimitError as e:
    print(f"Rate limited. Retry after {e.retry_after}s.")
except AuthenticationError:
    print("API key rejected — check it's not revoked.")
except ValidationError as e:
    print(f"Bad request: {e}")  # bad URLs, too many URLs, etc.
except BulkURLCheckerError as e:
    print(f"Other error: {e} (request_id={e.request_id})")

Every error carries status_code, code (server's machine-readable string), request_id (for support), and details (when the server provides them).

Why use this instead of writing your own checker with httpx + asyncio?

Honest answer: for ≤500 URLs you don't need this. The standard requests/httpx toolchain handles it fine.

The wall hits at scale:

Problem	Rolling your own	This SDK
Concurrency	`asyncio` + careful semaphores	done
Proxy rotation across residential IPs	$90+/mo Webshare / Bright Data subscription + custom code	done
Per-domain rate limiting (so you don't hammer one host)	wire it yourself	done
Distinguishing real 403 from "you got blocked" 403	guess and check	done
Detecting soft 404s (200 OK + "not found" body)	regex / heuristic per template	done
Retry classification (transient vs permanent)	tune for weeks	done
Long-running job state (resume after crash)	Redis + queue + worker infra	done
Engineer time, weeks 1-4	$$$	nothing, ship today

If you've already lost a weekend to httpx + proxy rotation, you know what we're talking about.

Pricing

Free tier: 300 URL checks. No signup required.
Starter: $9/month or $90/year (~17% off) — 15,000 URLs/month
Pro: $29/month or $290/year — 50,000 URLs/month, 5 scheduled checks, daily monitoring
Agency: $99/month or $990/year — 200,000 URLs/month, 50 schedules, Slack + webhook alerts

Top-up credit packs available beyond the monthly pool. Credits never expire.

Full pricing: https://bulkurlchecker.com/#pricing

Stability

The SDK follows semver. While we're at 0.x, breaking changes can land in minor releases (we'll always note them in CHANGELOG.md). Once we hit 1.0 you can pin major versions safely.

License

MIT. See LICENSE.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

carlosofscience

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.5.1

May 28, 2026

0.5.0

May 28, 2026

0.2.0

May 27, 2026

0.1.0

May 27, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bulkurlchecker-0.5.1.tar.gz (23.5 kB view details)

Uploaded May 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

bulkurlchecker-0.5.1-py3-none-any.whl (20.5 kB view details)

Uploaded May 28, 2026 Python 3

File details

Details for the file bulkurlchecker-0.5.1.tar.gz.

File metadata

Download URL: bulkurlchecker-0.5.1.tar.gz
Upload date: May 28, 2026
Size: 23.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for bulkurlchecker-0.5.1.tar.gz
Algorithm	Hash digest
SHA256	`5a1ae0f11062e7f4611430e46d8d956939520cb40cc870c2c216bb24a101119d`
MD5	`3b4609bbb3432811a75095645e09f2b4`
BLAKE2b-256	`35c6667e71580d8725d2208d951944f5b671351cc304219e1a944920db94ef1f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for bulkurlchecker-0.5.1.tar.gz:

Publisher: publish.yml on carlosofscience/bulkurlchecker-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: bulkurlchecker-0.5.1.tar.gz
- Subject digest: 5a1ae0f11062e7f4611430e46d8d956939520cb40cc870c2c216bb24a101119d
- Sigstore transparency entry: 1656595897
- Sigstore integration time: May 28, 2026
Source repository:
- Permalink: carlosofscience/bulkurlchecker-python@c60ec6031c250bfe5e3e34cdd4f269afa0147988
- Branch / Tag: refs/tags/v0.5.1
- Owner: https://github.com/carlosofscience
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@c60ec6031c250bfe5e3e34cdd4f269afa0147988
- Trigger Event: push

File details

Details for the file bulkurlchecker-0.5.1-py3-none-any.whl.

File metadata

Download URL: bulkurlchecker-0.5.1-py3-none-any.whl
Upload date: May 28, 2026
Size: 20.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for bulkurlchecker-0.5.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b8f84b63495ed4107a8c87a0bd305d3d3962cce94bccd5fa83dbcd1851334454`
MD5	`9722b6eaf36a46fb125008f1f685c92c`
BLAKE2b-256	`e0457ae39e44b30eb018241b54109afdb184b5320bde3ba84481f0ae1c165ca2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for bulkurlchecker-0.5.1-py3-none-any.whl:

Publisher: publish.yml on carlosofscience/bulkurlchecker-python

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: bulkurlchecker-0.5.1-py3-none-any.whl
- Subject digest: b8f84b63495ed4107a8c87a0bd305d3d3962cce94bccd5fa83dbcd1851334454
- Sigstore transparency entry: 1656596043
- Sigstore integration time: May 28, 2026
Source repository:
- Permalink: carlosofscience/bulkurlchecker-python@c60ec6031c250bfe5e3e34cdd4f269afa0147988
- Branch / Tag: refs/tags/v0.5.1
- Owner: https://github.com/carlosofscience
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@c60ec6031c250bfe5e3e34cdd4f269afa0147988
- Trigger Event: push

bulkurlchecker 0.5.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

bulkurlchecker

Install

5-line example

What you get back

Command-line usage

Larger jobs: submit and poll

Safe retries with idempotency_key

Receiving webhooks

Error handling

Why use this instead of writing your own checker with httpx + asyncio?

Pricing

Links

Stability

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Safe retries with `idempotency_key`