Skip to main content

Library for detecting if a HTTP User Agent header is likely to be a bot

Project description

# robot_detection

robot_detection is a python module to detect if a given HTTP User Agent is a web crawler. It uses the list of registered robots from http://www.robotstxt.org: (Robots Database)[http://www.robotstxt.org/db.html)

## Usage

There is only one, function, is_robot that takes a string (unicode or not) and returns True iff that string matches a known robot in the robotstxt.org robot database

### Example

>>> import robot_detection
>>> robot_detection.is_robot(user_agent_string)

## Updating

You can download a new version of the Robot Database from (this link)[http://www.robotstxt.org/dbexport.html].

Download the database dump, and run the file robot_detection.py with the file as first argument.

$ wget http://www.robotstxt.org/db/all.txt $ python robot_detection.py all.txt

If the database has changed, it’ll print out the new version of robot_useragents variable that you need to put into the source code.

## Tests

Some simple unittests are included. Running the tests.py file will run the tests.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

robot-detection-0.2.8.tar.gz (77.3 kB view details)

Uploaded Source

File details

Details for the file robot-detection-0.2.8.tar.gz.

File metadata

File hashes

Hashes for robot-detection-0.2.8.tar.gz
Algorithm Hash digest
SHA256 90793053beefdf9d8177abed6632acdb2089ec3bb3938f055f00689f1ac5c0c7
MD5 7981ed222fcbf43dd322d9cf724a7b3e
BLAKE2b-256 1e42049b4338c370161568b4f3c76a2be81691a1b0c46b6569b33dcd08958ebb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page