A tool to filter out data from robots.txt restricted URL domains.
Project description
robots-checker
This is a package for convenient robots.txt based compliant filtering. By compliance filtering, we evaluates robots.txt rules specifically for AI training user agents, as shown in the list below.
"AI2Bot", # AI2
"Applebot-Extended", # Apple
"Bytespider", # Bytedance
"CCBot", # Common Crawl
"CCBot/2.0", # Common Crawl
"CCBot/1.0", # Common Crawl
"ClaudeBot", # Anthropic
"cohere-training-data-crawler", # Cohere
"Diffbot", # Diffbot
"Meta-ExternalAgent", # Meta
"Google-Extended", # Google
"GPTBot", # OpenAI
"PanguBot", # Huawei
"*"
Installation
Currently, only robots.txt checking as of January 2025 is supported.
Install the package
pip install robots-checker==1.2.2
Usage
import url_checker
checker = url_checker.RobotsTxtComplianceChecker() # "Jan-2025"
status = checker.is_compliant("https://blog.example.com/some-page")
print(status) # ➜ "Compliant" or "NonCompliant"
More?
For more information, please check our 🕸️ website
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file robots-checker-1.2.3.tar.gz.
File metadata
- Download URL: robots-checker-1.2.3.tar.gz
- Upload date:
- Size: 2.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
dbef6d1826822958eb9946bf11f4b851868abd654c2c097dbc154b9c102c4416
|
|
| MD5 |
6b3d4c5ae804a2935a2905b28f36b839
|
|
| BLAKE2b-256 |
883e913e2c9cfe4beec324c14968b63f0b63c735be86c94c0dbcb27060af028f
|
File details
Details for the file robots_checker-1.2.3-py3-none-any.whl.
File metadata
- Download URL: robots_checker-1.2.3-py3-none-any.whl
- Upload date:
- Size: 3.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
536868a912b92f3446d183f00c50e3bb14f4dfbc453e41f313361fba2e211d90
|
|
| MD5 |
0d15540e328e7644427c0c8d747f78c3
|
|
| BLAKE2b-256 |
9a31fe5fea7c51dd46667809d8f38ec808a7a62e597f6d59366f56702a4e91ec
|