A Flask extension to detect and block AI crawlers based on User-Agent headers.
Project description
🤖 ConfusedAICrawlers
ConfusedAICrawlers is a lightweight Flask extension designed to detect and block AI web crawlers using their User-Agent strings.
It supports a customizable JSON configuration for blacklisting or whitelisting crawlers, integrates seamlessly with flask-limiter for rate limiting, and includes a simple admin route for reloading configurations without restarting your app.
🚀 Features
- Block known AI crawlers by inspecting the
User-Agentheader. - Whitelist trusted crawlers (e.g., Googlebot).
- Plug-and-play integration using a Flask Blueprint.
- Optional
/robots.txtroute to discourage all crawlers. - JSON config file for flexible control.
- Reload crawler configuration on the fly with an admin route.
📦 Installation
Install via pip: pip install confusedaicrawlers
🛠️ Usage
1. Basic Integration
from flask import Flask
from confusedaicrawlers import FlaskAIBlocker
app = Flask(__name__)
ai_blocker = FlaskAIBlocker() # Defaults to ai_crawlers.json
ai_blocker.init_app(app)
@app.route('/')
def index():
return "Hello, human!"
if __name__ == '__main__':
app.run()
2. With Flask-Limiter
from flask import Flask
from flask_limiter import Limiter
from flask_limiter.util import get_remote_address
from confusedaicrawlers import FlaskAIBlocker
app = Flask(__name__)
limiter = Limiter(key_func=get_remote_address)
limiter.init_app(app)
ai_blocker = FlaskAIBlocker(config_path="your_config_path.json")
ai_blocker.init_app(app)
⚙️ Configuration (JSON)
The extension expects a JSON file like this:
{
"blacklist": {
"chatgpt": "ChatGPT",
"openai": "OpenAI"
},
"whitelist": {
"google": "Googlebot",
"bing": "Bingbot"
}
}
- Blacklist: Crawler fragments to block (case-insensitive).
- Whitelist: Allowed bots (checked first).
🔁 Admin Endpoint
Reload the config file at runtime without restarting the app:
POST /admin/reload-config
Response:
{
"status": "success",
"message": "Configuration reloaded"
}
📁 Project Structure
confusedaicrawlers/
├── confusedaicrawlers/
│ ├── __init__.py
│ ├── blocker.py
│ └── ai_crawlers.json
├── tests/
│ └── test_blocker.py
├── README.md
├── pyproject.toml
├── requirements.txt
└── MANIFEST.in
✅ License
This project is licensed under the MIT License.
🤝 Contributing
Pull requests are welcome! For major changes, please open an issue first to discuss what you'd like to change.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file blockaicrawlers-0.1.0.tar.gz.
File metadata
- Download URL: blockaicrawlers-0.1.0.tar.gz
- Upload date:
- Size: 4.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0de1bd59293028be3f9b730ed8757b89e6f0aec884b07b8e2307d6eab804ac42
|
|
| MD5 |
4b51627abcc85522cb7175762fb34039
|
|
| BLAKE2b-256 |
9619fbfa290a193c59a7159404e83434e0da3e23b954e0b9f3ba752e9613d218
|
File details
Details for the file blockaicrawlers-0.1.0-py3-none-any.whl.
File metadata
- Download URL: blockaicrawlers-0.1.0-py3-none-any.whl
- Upload date:
- Size: 5.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
19bf87ce8a6d56349d84c135d4f20e6936483b23553e5d5b71744481794008d6
|
|
| MD5 |
b0e77f2098e33eebcd9c7734aaadea75
|
|
| BLAKE2b-256 |
78e4523633d6e1424dc14056920d939be78cfbf7536557eb6f8c6e9729a48fec
|