Skip to main content

Check if a URL is reachable

Project description

Reachable checks if a URL exists and is reachable.

Features

  • Use HEADrequest instead of GET to save some bandwidth
  • Follow redirects
  • Handle local redirects (without full URL in location header)
  • Record all the URLs of the redirection chain
  • Check if redirected URL match the TLD of source URL
  • Detect Cloudflare protection
  • Avoid basic bot detectors
    • Use randome Chrome user agent
    • Wait between consecutive requests to the same host
    • Include Host header
    • Can use Playwright to make the request
  • Use of HTTP/2
  • Detect parking domains

Installation

You can install it with pip :

pip install reachable

If you want to use playwright:

pip install reachable[playwright]

Or clone this repository and simply run :

cd reachable/
pip install -e .

Usage

Simple URL

from reachable import is_reachable
result = is_reachable("https://google.com")

The output will look like this:

{
    "original_url": "https://google.com",
    "final_url": "https://www.google.com/",
    "response": null, 
    "status_code": 200,
    "success": true,
    "error_name": null,
    "cloudflare_protection": false,
    "redirect": {
        "chain": ["https://www.google.com/"],
        "final_url": "https://www.google.com/",
        "tld_match": true
    }
}

Multiple URLs

from reachable import is_reachable
result = is_reachable(["https://google.com", "http://bing.com"])

The output will look like this:

[
    {
        "original_url": "https://google.com",
        "final_url": "https://www.google.com/",
        "response": null, 
        "status_code": 200,
        "success": true,
        "error_name": null,
        "cloudflare_protection": false,
        "redirect": {
            "chain": ["https://www.google.com/"],
            "final_url": "https://www.google.com/",
            "tld_match": true
        }
    },
    {
        "original_url": "http://bing.com",
        "final_url": "https://www.bing.com/?toWww=1&redig=16A78C94",
        "response": null,
        "status_code": 200,
        "success": true,
        "error_name": null,
        "cloudflare_protection": false,
        "redirect": {
            "chain": ["https://www.bing.com:443/?toWww=1&redig=16A78C94"],
            "final_url": "https://www.bing.com/?toWww=1&redig=16A78C94",
            "tld_match": true
        }
    }
]

Async

import asyncio
from reachable import is_reachable_async

result = asyncio.run(is_reachable_async("https://google.com"))

or

import asyncio
from reachable import is_reachable_async

urls = ["https://google.com", "https://bing.com"]

try:
    loop = asyncio.get_running_loop()
except RuntimeError:
    # No loop already exists so we crete one
    loop = asyncio.new_event_loop()
    asyncio.set_event_loop(loop)
try:
    result = loop.run_until_complete(asyncio.gather(*[is_reachable_async(url) for url in urls]))
finally:
    loop.close()

Handling high volumes with Taskpool

If you want to process a large number of URLs (> 500) you will quickly hit the limits of your hardware and/or OS because you can only open a defined number of active connections.

To bypass this problem you can use the TaskPool class. It uses Asyncio Semaphores to limit the number of asyncio threads running. It works by acquiring a lock when starting the worker and releasing it when done. It allows to always have a number of asyncio workers without overwhelming the OS.

import asyncio

from reachable import is_reachable_async
from reachable.client import AsyncClient
from reachable.pool import TaskPool


urls = ["https://google.com", "https://bing.com"]


async def worker(url, client):
    result = await is_reachable_async(url, client=client)
    return result


async def workers_builder(urls, pool_size: int = 100):
    async with AsyncClient() as client:
        tasks = TaskPool(workers=pool_size)

        for url in urls:
            await tasks.put(worker(url, client=client))

        await tasks.join()

    return tasks._results


try:
    loop = asyncio.get_running_loop()
except RuntimeError:
    # No loop already exists so we crete one
    loop = asyncio.new_event_loop()
    asyncio.set_event_loop(loop)

try:
    result = loop.run_until_complete(workers_builder(urls))
    print(result)
finally:
    loop.close()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reachable-0.9.1.tar.gz (14.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

reachable-0.9.1-py3-none-any.whl (13.7 kB view details)

Uploaded Python 3

File details

Details for the file reachable-0.9.1.tar.gz.

File metadata

  • Download URL: reachable-0.9.1.tar.gz
  • Upload date:
  • Size: 14.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for reachable-0.9.1.tar.gz
Algorithm Hash digest
SHA256 fe4493543108880e0db12293a567c169b47f90da56fa96a7664db9b42d72921a
MD5 c4bc1a27c000950850f0cefeedd2e020
BLAKE2b-256 aa88289cf4e39e3c4dcedebf245edc8edff7ee1302eaeffa39a2389ff98cb68b

See more details on using hashes here.

Provenance

The following attestation bundles were made for reachable-0.9.1.tar.gz:

Publisher: publish.yml on AlexMili/Reachable

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file reachable-0.9.1-py3-none-any.whl.

File metadata

  • Download URL: reachable-0.9.1-py3-none-any.whl
  • Upload date:
  • Size: 13.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for reachable-0.9.1-py3-none-any.whl
Algorithm Hash digest
SHA256 989aa277d2a7e1daa642a61c01df127e0f0fa4ea139954b3ddf38580c2865a27
MD5 1bdff2d14dae33087374a66cc375f9ea
BLAKE2b-256 495412103dbe877b9c84351c2aadbbb9ef46ae114a02f86a861f79020b62b66d

See more details on using hashes here.

Provenance

The following attestation bundles were made for reachable-0.9.1-py3-none-any.whl:

Publisher: publish.yml on AlexMili/Reachable

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page