Skip to main content

A Flask extension to detect and block AI crawlers based on User-Agent headers.

Project description

🤖 ConfusedAICrawlers

ConfusedAICrawlers is a lightweight Flask extension designed to detect and block AI web crawlers using their User-Agent strings.

It supports a customizable JSON configuration for blacklisting or whitelisting crawlers, integrates seamlessly with flask-limiter for rate limiting, and includes a simple admin route for reloading configurations without restarting your app.

🚀 Features

  • Block known AI crawlers by inspecting the User-Agent header.
  • Whitelist trusted crawlers (e.g., Googlebot).
  • Plug-and-play integration using a Flask Blueprint.
  • Optional /robots.txt route to discourage all crawlers.
  • JSON config file for flexible control.
  • Reload crawler configuration on the fly with an admin route.

📦 Installation

Install via pip: pip install confusedaicrawlers

🛠️ Usage

1. Basic Integration

from flask import Flask
from confusedaicrawlers import FlaskAIBlocker

app = Flask(__name__)
ai_blocker = FlaskAIBlocker()  # Defaults to ai_crawlers.json
ai_blocker.init_app(app)

@app.route('/')
def index():
    return "Hello, human!"

if __name__ == '__main__':
    app.run()

2. With Flask-Limiter

from flask import Flask
from flask_limiter import Limiter
from flask_limiter.util import get_remote_address
from confusedaicrawlers import FlaskAIBlocker

app = Flask(__name__)
limiter = Limiter(key_func=get_remote_address)
limiter.init_app(app)

ai_blocker = FlaskAIBlocker(config_path="your_config_path.json")
ai_blocker.init_app(app)

⚙️ Configuration (JSON)

The extension expects a JSON file like this:

{
  "blacklist": {
    "chatgpt": "ChatGPT",
    "openai": "OpenAI"
  },
  "whitelist": {
    "google": "Googlebot",
    "bing": "Bingbot"
  }
}
  • Blacklist: Crawler fragments to block (case-insensitive).
  • Whitelist: Allowed bots (checked first).

🔁 Admin Endpoint

Reload the config file at runtime without restarting the app:

POST /admin/reload-config

Response:

{
  "status": "success",
  "message": "Configuration reloaded"
}

📁 Project Structure

confusedaicrawlers/
├── confusedaicrawlers/
│   ├── __init__.py
│   ├── blocker.py
│   └── ai_crawlers.json
├── tests/
│   └── test_blocker.py
├── README.md
├── pyproject.toml
├── requirements.txt
└── MANIFEST.in

✅ License

This project is licensed under the MIT License.

🤝 Contributing

Pull requests are welcome! For major changes, please open an issue first to discuss what you'd like to change.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

blockaicrawlers-0.1.0.tar.gz (4.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

blockaicrawlers-0.1.0-py3-none-any.whl (5.1 kB view details)

Uploaded Python 3

File details

Details for the file blockaicrawlers-0.1.0.tar.gz.

File metadata

  • Download URL: blockaicrawlers-0.1.0.tar.gz
  • Upload date:
  • Size: 4.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for blockaicrawlers-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0de1bd59293028be3f9b730ed8757b89e6f0aec884b07b8e2307d6eab804ac42
MD5 4b51627abcc85522cb7175762fb34039
BLAKE2b-256 9619fbfa290a193c59a7159404e83434e0da3e23b954e0b9f3ba752e9613d218

See more details on using hashes here.

File details

Details for the file blockaicrawlers-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for blockaicrawlers-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 19bf87ce8a6d56349d84c135d4f20e6936483b23553e5d5b71744481794008d6
MD5 b0e77f2098e33eebcd9c7734aaadea75
BLAKE2b-256 78e4523633d6e1424dc14056920d939be78cfbf7536557eb6f8c6e9729a48fec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page