Skip to main content

CrawlerDetect is a Python class for detecting bots/crawlers/spiders via the user agent.

Project description

About CrawlerDetect

CrawlerDetect is a Python version of PHP class @CrawlerDetect.

It helps to detect bots/crawlers/spiders via the user agent and other HTTP-headers. Currently able to detect 1,000's of bots/spiders/crawlers.

Installation

Run pip install crawlerdetect

Usage

Variant 1

from crawlerdetect import CrawlerDetect
crawler_detect = CrawlerDetect()
crawler_detect.isCrawler('Mozilla/5.0 (compatible; Sosospider/2.0; +http://help.soso.com/webspider.htm)')
# true if crawler user agent detected

Variant 2

from crawlerdetect import CrawlerDetect
crawler_detect = CrawlerDetect(user_agent='Mozilla/5.0 (iPhone; CPU iPhone OS 7_1 like Mac OS X) AppleWebKit (KHTML, like Gecko) Mobile (compatible; Yahoo Ad monitoring; https://help.yahoo.com/kb/yahoo-ad-monitoring-SLN24857.html)')
crawler_detect.isCrawler()
# true if crawler user agent detected

Variant 3

from crawlerdetect import CrawlerDetect
crawler_detect = CrawlerDetect(headers={'DOCUMENT_ROOT': '/home/test/public_html', 'GATEWAY_INTERFACE': 'CGI/1.1', 'HTTP_ACCEPT': '*/*', 'HTTP_ACCEPT_ENCODING': 'gzip, deflate', 'HTTP_CACHE_CONTROL': 'no-cache', 'HTTP_CONNECTION': 'Keep-Alive', 'HTTP_FROM': 'googlebot(at)googlebot.com', 'HTTP_HOST': 'www.test.com', 'HTTP_PRAGMA': 'no-cache', 'HTTP_USER_AGENT': 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10_8_4) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/28.0.1500.71 Safari/537.36', 'PATH': '/bin:/usr/bin', 'QUERY_STRING': 'order=closingDate', 'REDIRECT_STATUS': '200', 'REMOTE_ADDR': '127.0.0.1', 'REMOTE_PORT': '3360', 'REQUEST_METHOD': 'GET', 'REQUEST_URI': '/?test=testing', 'SCRIPT_FILENAME': '/home/test/public_html/index.php', 'SCRIPT_NAME': '/index.php', 'SERVER_ADDR': '127.0.0.1', 'SERVER_ADMIN': 'webmaster@test.com', 'SERVER_NAME': 'www.test.com', 'SERVER_PORT': '80', 'SERVER_PROTOCOL': 'HTTP/1.1', 'SERVER_SIGNATURE': '', 'SERVER_SOFTWARE': 'Apache', 'UNIQUE_ID': 'Vx6MENRxerBUSDEQgFLAAAAAS', 'PHP_SELF': '/index.php', 'REQUEST_TIME_FLOAT': 1461619728.0705, 'REQUEST_TIME': 1461619728})
crawler_detect.isCrawler()
# true if crawler user agent detected

Output the name of the bot that matched (if any)

from crawlerdetect import CrawlerDetect
crawler_detect = CrawlerDetect()
crawler_detect.isCrawler('Mozilla/5.0 (compatible; Sosospider/2.0; +http://help.soso.com/webspider.htm)')
# true if crawler user agent detected
crawler_detect.getMatches()
# Sosospider

Contributing

If you find a bot/spider/crawler user agent that CrawlerDetect fails to detect, please submit a pull request with the regex pattern added to the array in providers/crawlers.py and add the failing user agent to tests/crawlers.txt.

Failing that, just create an issue with the user agent you have found, and we'll take it from there :)

ES6 Library

To use this library with NodeJS or any ES6 application based, check out es6-crawler-detect.

.NET Library

To use this library in a .net standard (including .net core) based project, check out NetCrawlerDetect.

Nette Extension

To use this library with the Nette framework, checkout NetteCrawlerDetect.

Ruby Gem

To use this library with Ruby on Rails or any Ruby-based application, check out crawler_detect gem.

Parts of this class are based on the brilliant MobileDetect

Analytics

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crawlerdetect-coreteam-0.1.5.tar.gz (17.6 kB view details)

Uploaded Source

Built Distribution

crawlerdetect_coreteam-0.1.5-py2.py3-none-any.whl (17.2 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file crawlerdetect-coreteam-0.1.5.tar.gz.

File metadata

  • Download URL: crawlerdetect-coreteam-0.1.5.tar.gz
  • Upload date:
  • Size: 17.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.6.12

File hashes

Hashes for crawlerdetect-coreteam-0.1.5.tar.gz
Algorithm Hash digest
SHA256 b1220d573409861abab40a01ce09a5780744d16667438071d7bbc3e21e1bf87e
MD5 d893536ed157ef5f66f76ddf36a36c70
BLAKE2b-256 25bdf8c19076101ffe40b92f65962b48c16b8e0aa349d406785db793bac2cea4

See more details on using hashes here.

File details

Details for the file crawlerdetect_coreteam-0.1.5-py2.py3-none-any.whl.

File metadata

  • Download URL: crawlerdetect_coreteam-0.1.5-py2.py3-none-any.whl
  • Upload date:
  • Size: 17.2 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.6.12

File hashes

Hashes for crawlerdetect_coreteam-0.1.5-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 d655966bcc348d913ac9f14c8e9b7afceac5187487019026e06064d3dd375f3c
MD5 31075a2e15750d7aa5d220f7fa32e4c3
BLAKE2b-256 03f4d3e72284794c634b8293e4e5616a0b471eff851187cbed39ab9909e21c8f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page