Check if a URL is reachable
Project description
Reachable checks if a URL exists and is reachable.
Features
- Use
HEADrequest instead ofGETto save some bandwidth - Follow redirects
- Handle local redirects (without full URL in
locationheader) - Record all the URLs of the redirection chain
- Check if redirected URL match the TLD of source URL
- Detect Cloudflare protection
- Avoid basic bot detectors
- Use randome Chrome user agent
- Wait between consecutive requests to the same host
- Include
Hostheader
- Use of HTTP/2
- Detect parking domains
Installation
You can install it with pip :
pip install reachable
Or clone this repository and simply run :
cd reachable/
pip install -e .
Usage
Simple URL
from reachable import is_reachable
result = is_reachable("https://google.com")
The output will look like this:
{
"original_url": "https://google.com",
"final_url": "https://www.google.com/",
"response": null,
"status_code": 200,
"success": true,
"error_name": null,
"cloudflare_protection": false,
"redirect": {
"chain": ["https://www.google.com/"],
"final_url": "https://www.google.com/",
"tld_match": true
}
}
Multiple URLs
from reachable import is_reachable
result = is_reachable(["https://google.com", "http://bing.com"])
The output will look like this:
[
{
"original_url": "https://google.com",
"final_url": "https://www.google.com/",
"response": null,
"status_code": 200,
"success": true,
"error_name": null,
"cloudflare_protection": false,
"redirect": {
"chain": ["https://www.google.com/"],
"final_url": "https://www.google.com/",
"tld_match": true
}
},
{
"original_url": "http://bing.com",
"final_url": "https://www.bing.com/?toWww=1&redig=16A78C94",
"response": null,
"status_code": 200,
"success": true,
"error_name": null,
"cloudflare_protection": false,
"redirect": {
"chain": ["https://www.bing.com:443/?toWww=1&redig=16A78C94"],
"final_url": "https://www.bing.com/?toWww=1&redig=16A78C94",
"tld_match": true
}
}
]
Async
import asyncio
from reachable import is_reachable_async
result = asyncio.run(is_reachable_async("https://google.com"))
or
import asyncio
from reachable import is_reachable_async
urls = ["https://google.com", "https://bing.com"]
try:
loop = asyncio.get_running_loop()
except RuntimeError:
# No loop already exists so we crete one
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
try:
result = loop.run_until_complete(asyncio.gather(*[is_reachable_async(url) for url in urls]))
finally:
loop.close()
Handling high volumes with Taskpool
If you want to process a large number of URLs (> 500) you will quickly hit the limits of your hardware and/or OS because you can only open a defined number of active connections.
To bypass this problem you can use the TaskPool class. It uses Asyncio Semaphores to limit the number of asyncio threads running. It works by acquiring a lock when starting the worker and releasing it when done. It allows to always have a number of asyncio workers without overwhelming the OS.
import asyncio
from reachable import is_reachable_async
from reachable.client import AsyncClient
from reachable.pool import TaskPool
urls = ["https://google.com", "https://bing.com"]
async def worker(url, client):
result = await is_reachable_async(url, client=client)
return result
async def workers_builder(urls, pool_size: int = 100):
async with AsyncClient() as client:
tasks = TaskPool(workers=pool_size)
for url in urls:
await tasks.put(worker(url, client=client))
await tasks.join()
return tasks._results
try:
loop = asyncio.get_running_loop()
except RuntimeError:
# No loop already exists so we crete one
loop = asyncio.new_event_loop()
asyncio.set_event_loop(loop)
try:
result = loop.run_until_complete(workers_builder(urls))
print(result)
finally:
loop.close()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file reachable-0.6.0.tar.gz.
File metadata
- Download URL: reachable-0.6.0.tar.gz
- Upload date:
- Size: 11.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9605bb5baed8550d763ded2c2dd7921fefec84abdb57d445cf680b866a17c3b4
|
|
| MD5 |
70de1adde246d76eea78e0421f64adcf
|
|
| BLAKE2b-256 |
699ff97e7f16a4a88f3933975636c280e052814d66ad73c9c5cd65e13e873ca3
|
Provenance
The following attestation bundles were made for reachable-0.6.0.tar.gz:
Publisher:
publish.yml on AlexMili/Reachable
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
reachable-0.6.0.tar.gz -
Subject digest:
9605bb5baed8550d763ded2c2dd7921fefec84abdb57d445cf680b866a17c3b4 - Sigstore transparency entry: 152958807
- Sigstore integration time:
-
Permalink:
AlexMili/Reachable@8879d0f6984c745028c49bbcbb92a3c2d0efd82d -
Branch / Tag:
refs/tags/v0.6.0 - Owner: https://github.com/AlexMili
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8879d0f6984c745028c49bbcbb92a3c2d0efd82d -
Trigger Event:
release
-
Statement type:
File details
Details for the file reachable-0.6.0-py3-none-any.whl.
File metadata
- Download URL: reachable-0.6.0-py3-none-any.whl
- Upload date:
- Size: 12.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f577a11c7e9267a37f0bc4eb20afc6e66a800cce1838ebf12d62991d4e2cab60
|
|
| MD5 |
58588244e707601ef5dcd299707d7a3b
|
|
| BLAKE2b-256 |
41486fa036173ba27776117beb53fc022005eb3be39bb9c41b06ec828e2d0a1d
|
Provenance
The following attestation bundles were made for reachable-0.6.0-py3-none-any.whl:
Publisher:
publish.yml on AlexMili/Reachable
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
reachable-0.6.0-py3-none-any.whl -
Subject digest:
f577a11c7e9267a37f0bc4eb20afc6e66a800cce1838ebf12d62991d4e2cab60 - Sigstore transparency entry: 152958808
- Sigstore integration time:
-
Permalink:
AlexMili/Reachable@8879d0f6984c745028c49bbcbb92a3c2d0efd82d -
Branch / Tag:
refs/tags/v0.6.0 - Owner: https://github.com/AlexMili
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@8879d0f6984c745028c49bbcbb92a3c2d0efd82d -
Trigger Event:
release
-
Statement type: