Skip to main content

Python client for the Bulk URL Checker API. Skip the proxy-rotation + rate-limiter + soft-404-detector you would otherwise have to build.

Project description

bulkurlchecker

PyPI version Python versions License: MIT

Python client for the Bulk URL Checker API.

Skip the proxy-rotation, rate-limiter, soft-404 detector, and retry classifier you would otherwise spend two weeks building. Submit thousands of URLs, get status codes, redirect chains, and broken-link detection back as plain Python objects. Backed by a managed cloud service with residential proxies and per-domain throttling.

Install

pip install bulkurlchecker

5-line example

from bulkurlchecker import Client

client = Client(api_key="uck_live_...")
results = client.check_urls(["https://example.com", "https://example.org"])
for r in results.results:
    print(r.url, r.status_code, "BROKEN" if r.is_broken else "ok")

Get an API key at https://app.bulkurlchecker.com/dashboard/api-keys. First 300 URLs are free, no card required.

What you get back

results = client.check_urls(urls)

results.status            # 'completed' | 'paused' | 'failed' | 'cancelled'
results.timed_out         # True if the wait deadline passed (job still running)
results.total_urls        # how many URLs the engine accepted
results.completed_urls    # how many it finished checking
results.duplicates_removed
results.invalid_urls_rejected

for r in results.results:
    r.url                # the original URL you submitted
    r.final_url          # after redirects
    r.status_code        # 200, 301, 404, 429, 500, ...
    r.redirect_chain     # list of intermediate URLs
    r.is_broken          # True if the engine flagged this as broken
    r.is_soft_404        # True if 200 OK but page content says "not found"
    r.response_time_ms

# Convenience properties:
results.broken           # list of URLResult where is_broken == True
results.soft_404s        # list where is_soft_404 == True

Command-line usage

Install with the [cli] extra to get the bulkurlchecker command:

pip install 'bulkurlchecker[cli]'
export BULKURLCHECKER_API_KEY=uck_live_...

# One URL per line in urls.txt → CSV on stdout
bulkurlchecker check urls.txt > report.csv

# Pipe from anywhere
sitemap-extractor mydomain.com | bulkurlchecker check - --output jsonl > report.jsonl

# Only show broken URLs, ad-hoc input
bulkurlchecker check --urls "https://example.com,https://example.org" --only-broken

# Two-step (submit, then poll & fetch later)
JOB=$(bulkurlchecker submit urls.txt)
bulkurlchecker status "$JOB"
bulkurlchecker results "$JOB" --output csv > report.csv

Output formats: csv (default), json, jsonl. Run bulkurlchecker check --help for the full flag list.

Larger jobs: submit and poll

check_urls() blocks for up to 15 minutes server-side. For lists where the wait would time out, use the two-step pattern:

job = client.submit(my_500k_urls)
print(f"Submitted {job.job_id}, {job.total_urls} URLs queued")

# Poll explicitly, or use the convenience method
done = client.wait_until_done(job.job_id, timeout=3600)

# Stream results in pages
for batch in client.iter_results(job.job_id, page_size=1000):
    for r in batch:
        if r.is_broken:
            print(r.url, r.status_code)

Error handling

All errors derive from BulkURLCheckerError. Catch specific subclasses when you want to branch on the failure mode:

from bulkurlchecker import (
    Client,
    BulkURLCheckerError,
    AuthenticationError,
    RateLimitError,
    QuotaError,
    ValidationError,
)

try:
    results = client.check_urls(urls)
except QuotaError as e:
    print(f"Out of credits. Top up at https://app.bulkurlchecker.com/billing")
except RateLimitError as e:
    print(f"Rate limited. Retry after {e.retry_after}s.")
except AuthenticationError:
    print("API key rejected — check it's not revoked.")
except ValidationError as e:
    print(f"Bad request: {e}")  # bad URLs, too many URLs, etc.
except BulkURLCheckerError as e:
    print(f"Other error: {e} (request_id={e.request_id})")

Every error carries status_code, code (server's machine-readable string), request_id (for support), and details (when the server provides them).

Why use this instead of writing your own checker with httpx + asyncio?

Honest answer: for ≤500 URLs you don't need this. The standard requests/httpx toolchain handles it fine.

The wall hits at scale:

Problem Rolling your own This SDK
Concurrency asyncio + careful semaphores done
Proxy rotation across residential IPs $90+/mo Webshare / Bright Data subscription + custom code done
Per-domain rate limiting (so you don't hammer one host) wire it yourself done
Distinguishing real 403 from "you got blocked" 403 guess and check done
Detecting soft 404s (200 OK + "not found" body) regex / heuristic per template done
Retry classification (transient vs permanent) tune for weeks done
Long-running job state (resume after crash) Redis + queue + worker infra done
Engineer time, weeks 1-4 $$$ nothing, ship today

If you've already lost a weekend to httpx + proxy rotation, you know what we're talking about.

Pricing

  • Free tier: 300 URL checks. No signup required.
  • Starter: $9/month or $90/year (~17% off) — 15,000 URLs/month
  • Pro: $29/month or $290/year — 50,000 URLs/month, 5 scheduled checks, daily monitoring
  • Agency: $99/month or $990/year — 200,000 URLs/month, 50 schedules, Slack + webhook alerts

Top-up credit packs available beyond the monthly pool. Credits never expire.

Full pricing: https://bulkurlchecker.com/#pricing

Links

Stability

The SDK follows semver. While we're at 0.x, breaking changes can land in minor releases (we'll always note them in CHANGELOG.md). Once we hit 1.0 you can pin major versions safely.

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bulkurlchecker-0.2.0.tar.gz (17.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bulkurlchecker-0.2.0-py3-none-any.whl (16.1 kB view details)

Uploaded Python 3

File details

Details for the file bulkurlchecker-0.2.0.tar.gz.

File metadata

  • Download URL: bulkurlchecker-0.2.0.tar.gz
  • Upload date:
  • Size: 17.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for bulkurlchecker-0.2.0.tar.gz
Algorithm Hash digest
SHA256 e343ecbde3101febaa40343151fe0279ac5deed7bbc44d20735bf75a9ed65c0b
MD5 445b46770be4fa10c7da3a71e087f6db
BLAKE2b-256 46e88b184bdcef267e03e1dfb0064f6d7f5da373f80cde956644f0048ac62fe5

See more details on using hashes here.

File details

Details for the file bulkurlchecker-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: bulkurlchecker-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 16.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for bulkurlchecker-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 667fec003001afb3a18c919825e53646a3c5b5abf13f85612a9cbfbda5734e9c
MD5 1d9edab8f72321e70a7da56872895c2a
BLAKE2b-256 c15d26d95340f5285bbf896cb0c154d63c37f0605f070fe6c813fdf702c209e9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page