Skip to main content

Datasette plugin that blocks all robots using robots.txt

Project description

datasette-block-robots

PyPI Changelog License

Datasette plugin that blocks robots and crawlers using robots.txt

Installation

Install this plugin in the same environment as Datasette.

$ pip install datasette-block-robots

Usage

Having installed the plugin, /robots.txt on your Datasette instance will return the following:

User-agent: *
Disallow: /

This will request all robots and crawlers not to visit any of the pages on your site.

Here's a demo of the plugin in action: https://sqlite-generate-demo.datasette.io/robots.txt

Configuration

By default the plugin will block all access to the site, using Disallow: /.

If you want the index page to be indexed by search engines without crawling the database, table or row pages themselves, you can use the following:

{
    "plugins": {
        "datasette-block-robots": {
            "allow_only_index": true
        }
    }
}

This will return a /robots.txt like so:

User-agent: *
Disallow: /db1
Disallow: /db2

With a Disallow line for every attached database.

To block access to specific areas of the site using custom paths, add this to your metadata.json configuration file:

{
    "plugins": {
        "datasette-block-robots": {
            "disallow": ["/mydatabase/mytable"]
        }
    }
}

This will result in a /robots.txt that looks like this:

User-agent: *
Disallow: /mydatabase/mytable

Alternatively you can set the full contents of the robots.txt file using the literal configuration option. Here's how to do that if you are using YAML rather than JSON and have a metadata.yml file:

plugins:
    datasette-block-robots:
        literal: |-
            User-agent: *
            Disallow: /
            User-agent: Bingbot
            User-agent: Googlebot
            Disallow:

This example would block all crawlers with the exception of Googlebot and Bingbot, which are allowed to crawl the entire site.

Extending this with other plugins

This plugin adds a new plugin hook to Datasete called block_robots_extra_lines() which can be used by other plugins to add their own additional lines to the robots.txt file.

The hook can optionally accept these parameters:

  • datasette: The current Datasette instance. You can use this to execute SQL queries or read plugin configuration settings.
  • request: The Request object representing the incoming request to /robots.txt.

The hook should return a list of strings, each representing a line to be added to the robots.txt file.

It can also return an async def function, which will be awaited and used to generate a list of lines. Use this option if you need to make await calls inside you hook implementation.

This example uses the hook to add a Sitemap: http://example.com/sitemap.xml line to the robots.txt file:

from datasette import hookimpl

@hookimpl
def block_robots_extra_lines(datasette, request):
    return [
        "Sitemap: {}".format(datasette.absolute_url(request, "/sitemap.xml")),
    ]

This example blocks access to paths based on a database query:

@hookimpl
def block_robots_extra_lines(datasette):
    async def inner():
        db = datasette.get_database()
        result = await db.execute("select path from mytable")
        return [
            "Disallow: /{}".format(row["path"]) for row in result
        ]
    return inner

datasette-sitemap is an example of a plugin that uses this hook.

Development

To set up this plugin locally, first checkout the code. Then create a new virtual environment:

cd datasette-block-robots
python3 -mvenv venv
source venv/bin/activate

Or if you are using pipenv:

pipenv shell

Now install the dependencies and tests:

pip install -e '.[test]'

To run the tests:

pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datasette-block-robots-1.1.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

datasette_block_robots-1.1-py3-none-any.whl (4.4 kB view details)

Uploaded Python 3

File details

Details for the file datasette-block-robots-1.1.tar.gz.

File metadata

  • Download URL: datasette-block-robots-1.1.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.13

File hashes

Hashes for datasette-block-robots-1.1.tar.gz
Algorithm Hash digest
SHA256 e679fc43c5694194a6e393902bc9e06d611563f4947a8ad7ac3127877f0a9f74
MD5 02d9fef22e47b885b0ec2082ca450a2d
BLAKE2b-256 009b983c94277d304381ee875500e8146e3d3c1456e01cd584c81e3bbcccf1c7

See more details on using hashes here.

File details

Details for the file datasette_block_robots-1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for datasette_block_robots-1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ea1e4046fd1aab1db13db09ea89dba95714aabf034f464cff92f2b6f0a123ef8
MD5 b7255c9a09ba4b04bc70701c8e131a85
BLAKE2b-256 7d99189fc74fc96a2c223fbc384962df0d68ca8b260457bf23e63402b82eb8aa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page