Make it slightly harder for bots to steal your content

These details have not been verified by PyPI

Project description

MkDocs Anti AI Scraper Plugin

This plugin tries to prevent AI scrapers from easily ingesting your website's contents. It is probably implemented pretty badly and by design it can be bypassed by anyone that invests a bit of time, but it is probably better than nothing.

Installation

Install the plugin with pip:

pip install mkdocs-anti-ai-scraper-plugin

Then add the plugin to your mkdocs.yml:

plugins:
- search
- anti_ai_scraper

Or with all config options:

plugins:
- search
- anti_ai_scraper:
    robots_txt: True
    sitemap_xml: True
    encode_html: True
    debug: False

Implemented Techniques

Technique | Scraper Protection | Impact on human visitors | Enabled by default --- | --- | --- Add robots.txt | weak | none | yes Remove sitemap.xml | very weak | none | yes Encode HTML | only against simple HTML parser based scrapers | slows down page loading, may break page events | true

Add robots.txt

This technique is enabled by default, and can be disabled by setting the option robots_txt: False in mkdocs.yml. If enabled, it adds a robots.txt with the following contents to the output directory:

User-agent: *
Disallow: /

This hints to crawlers that they should not crawl your site.

This technique does not hinder normal users from using the site at all. However, the robots.txt is not enforcing anything. It just tells well-behaved bots how you would like them to behave. Many AI bots may just ignore it (Source).

Remove sitemap.xml

This technique is enabled by default, and can be disabled by setting the option robots_txt: False in mkdocs.yml. If enabled, it removes the sitemap.xml and sitemap.xml.gz files. This prevents leaking the paths to pages not referenced by your navigation.

Encode HTML

This technique is enabled by default, and can be disabled by setting the option robots_txt: False in mkdocs.yml. If enabled, it encodes (zip + ASCII85) each page's contents and will decode it in the user's browser with JavaScript. This obscures the page contents to simple scrapers that just download and parse your HTML. It will not work against any bots that use remote controlled browsers (using selenium or other tech).

The decoding takes some time and will result in browser events (like onload) being fired before the page is decoded. This may break some functionality, that listens to these events and expects them to happen.

Planned Techniques

remove sitemap.xml(.gz): just obscures a bit, the nav will still point to most pages.
Encoding the page contents and decode with JS: Will prevent basic HTML parsers from getting the contents, but anything using a browser (selenium, pupeteer, etc) will still work.
Encrypt page contents and adding client side "CAPTCHA" to generate the key: Should help against primitive browser based bots. It would probably make sense to just let the user solve the CAPTCHA once and cache the key as a cookie or in localStorage.
Bot detection JS: Will be a cat and mouse game, but should help against badly written crawlers

Suggestions welcome: If you know bot detection mechanisms, that can be used with static websites, feel free to open an issue :D

Problems and Considerations

Similar to the encryption plugin, the encryption of the search index is hard. So best disable search to prevent anyone from accessing its index.
Obviously, to protect your contents from scraping, you should not have their source code hosted in public repos ;D
By blocking bots, you also prevent search engines like Google from properly endexing your site.

Notable changes

Version 0.1.0

Added encode_html option
Added sitemap_xml option

Version 0.0.1

Added robots_txt option

Development Commands

This repo is managed using poetry. You can install poetry with pip install poetry or pipx install poetry.

Clone repo:

git clone git@github.com:six-two/mkdocs-anti-ai-scraper-plugin.git

Install/update extension locally:

poetry install

Build test site:

poetry run mkdocs build

Serve test site:

poetry run mkdocs serve

Release

Set PyPI API token (only needed once):

poetry config pypi-token.pypi YOUR_PYPI_TOKEN_HERE

Build extension:

poetry build

Upload extension:

poetry publish

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Aug 28, 2025

0.0.1

Aug 27, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mkdocs_anti_ai_scraper_plugin-0.1.0.tar.gz (4.0 kB view details)

Uploaded Aug 28, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mkdocs_anti_ai_scraper_plugin-0.1.0-py3-none-any.whl (4.9 kB view details)

Uploaded Aug 28, 2025 Python 3

File details

Details for the file mkdocs_anti_ai_scraper_plugin-0.1.0.tar.gz.

File metadata

Download URL: mkdocs_anti_ai_scraper_plugin-0.1.0.tar.gz
Upload date: Aug 28, 2025
Size: 4.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.4 CPython/3.9.6 Darwin/24.6.0

File hashes

Hashes for mkdocs_anti_ai_scraper_plugin-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`e31f41d4593d2557d2a727547d8a3726d214793f729dc7c81e9b711bf52aca79`
MD5	`407ac5d4e226f53ba0a45124fad31094`
BLAKE2b-256	`d76c762190c005e081223db2ae03320844e4fb42ace280bd1f9fdeea608e8cd8`

See more details on using hashes here.

File details

Details for the file mkdocs_anti_ai_scraper_plugin-0.1.0-py3-none-any.whl.

File metadata

Download URL: mkdocs_anti_ai_scraper_plugin-0.1.0-py3-none-any.whl
Upload date: Aug 28, 2025
Size: 4.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.4 CPython/3.9.6 Darwin/24.6.0

File hashes

Hashes for mkdocs_anti_ai_scraper_plugin-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2ddcefc73d47db96038cd7f88b59dde7016ebb8356c4c180bf540cf380361ee5`
MD5	`b7d750eee1a1bdc63820a74262ebc35b`
BLAKE2b-256	`6fa8c5075deff5fe2d3f0a3231f6d2381dbc83f6f8cba9d727aa182cf03cf170`

See more details on using hashes here.

mkdocs-anti-ai-scraper-plugin 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

MkDocs Anti AI Scraper Plugin

Installation

Implemented Techniques

Add robots.txt

Remove sitemap.xml

Encode HTML

Planned Techniques

Problems and Considerations

Notable changes

Version 0.1.0

Version 0.0.1

Development Commands

Release

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes