Skip to main content

A tool to filter out data from robots.txt restricted URL domains.

Project description

robots-checker

This is a package for convenient robots.txt based compliant filtering. By compliance filtering, we evaluates robots.txt rules specifically for AI training user agents, as shown in the list below.

"AI2Bot",                       # AI2  
"Applebot-Extended",            # Apple  
"Bytespider",                   # Bytedance  
"CCBot",                        # Common Crawl  
"CCBot/2.0",                    # Common Crawl  
"CCBot/1.0",                    # Common Crawl  
"ClaudeBot",                    # Anthropic  
"cohere-training-data-crawler", # Cohere  
"Diffbot",                      # Diffbot  
"Meta-ExternalAgent",           # Meta  
"Google-Extended",              # Google  
"GPTBot",                       # OpenAI  
"PanguBot",                     # Huawei  
"*"

Installation

Currently, only robots.txt checking as of January 2025 is supported.

Install the package

pip install robots-checker==1.2.2

Usage

import url_checker
checker = url_checker.RobotsTxtComplianceChecker() # "Jan-2025"
status = checker.is_compliant("https://blog.example.com/some-page")
print(status)   # ➜  "Compliant"  or  "NonCompliant"

More?

For more information, please check our 🕸️ website

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

robots-checker-1.2.3.tar.gz (2.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

robots_checker-1.2.3-py3-none-any.whl (3.1 kB view details)

Uploaded Python 3

File details

Details for the file robots-checker-1.2.3.tar.gz.

File metadata

  • Download URL: robots-checker-1.2.3.tar.gz
  • Upload date:
  • Size: 2.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.4

File hashes

Hashes for robots-checker-1.2.3.tar.gz
Algorithm Hash digest
SHA256 dbef6d1826822958eb9946bf11f4b851868abd654c2c097dbc154b9c102c4416
MD5 6b3d4c5ae804a2935a2905b28f36b839
BLAKE2b-256 883e913e2c9cfe4beec324c14968b63f0b63c735be86c94c0dbcb27060af028f

See more details on using hashes here.

File details

Details for the file robots_checker-1.2.3-py3-none-any.whl.

File metadata

  • Download URL: robots_checker-1.2.3-py3-none-any.whl
  • Upload date:
  • Size: 3.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.4

File hashes

Hashes for robots_checker-1.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 536868a912b92f3446d183f00c50e3bb14f4dfbc453e41f313361fba2e211d90
MD5 0d15540e328e7644427c0c8d747f78c3
BLAKE2b-256 9a31fe5fea7c51dd46667809d8f38ec808a7a62e597f6d59366f56702a4e91ec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page