Skip to main content

No project description provided

Project description

crawler-user-agents

This repository contains a list of of HTTP user-agents used by robots, crawlers, and spiders as in single JSON file.

Install

Direct download

Download the crawler-user-agents.json file from this repository directly.

Npm / Yarn

crawler-user-agents is deployed on npmjs.com: https://www.npmjs.com/package/crawler-user-agents

To use it using npm or yarn:

npm install --save crawler-user-agents
# OR
yarn add crawler-user-agents

In Node.js, you can require the package to get an array of crawler user agents.

const crawlers = require('crawler-user-agents');
console.log(crawlers);

Usage

Each pattern is a regular expression. It should work out-of-the-box wih your favorite regex library:

  • JavaScript: if (RegExp(entry.pattern).test(req.headers['user-agent']) { ... }
  • PHP: add a slash before and after the pattern: if (preg_match('/'.$entry['pattern'].'/', $_SERVER['HTTP_USER_AGENT'])): ...
  • Python: if re.search(entry['pattern'], ua): ...
  • Go: use this package, it provides global variable Crawlers (it is synchronized with crawler-user-agents.json), functions IsCrawler and MatchingCrawlers.

Example of Go program:

package main

import (
	"fmt"

	"github.com/monperrus/crawler-user-agents"
)

func main() {
	userAgent := "Mozilla/5.0 (compatible; Discordbot/2.0; +https://discordapp.com)"

	isCrawler := agents.IsCrawler(userAgent)
	fmt.Println("isCrawler:", isCrawler)

	indices := agents.MatchingCrawlers(userAgent)
	fmt.Println("crawlers' indices:", indices)
	fmt.Println("crawler' URL:", agents.Crawlers[indices[0]].URL)
}

Output:

isCrawler: true
crawlers' indices: [237]
crawler' URL: https://discordapp.com

Contributing

I do welcome additions contributed as pull requests.

The pull requests should:

  • contain a single addition
  • specify a discriminant relevant syntactic fragment (for example "totobot" and not "Mozilla/5 totobot v20131212.alpha1")
  • contain the pattern (generic regular expression), the discovery date (year/month/day) and the official url of the robot
  • result in a valid JSON file (don't forget the comma between items)

Example:

{
  "pattern": "rogerbot",
  "addition_date": "2014/02/28",
  "url": "http://moz.com/help/pro/what-is-rogerbot-",
  "instances" : ["rogerbot/2.3 example UA"]
}

License

The list is under a MIT License. The versions prior to Nov 7, 2016 were under a CC-SA license.

Related work

There are a few wrapper libraries that use this data to detect bots:

Other systems for spotting robots, crawlers, and spiders that you may want to consider are:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crawler_user_agents-0.3.0.tar.gz (48.8 kB view details)

Uploaded Source

Built Distribution

crawler_user_agents-0.3.0-py3-none-any.whl (3.6 kB view details)

Uploaded Python 3

File details

Details for the file crawler_user_agents-0.3.0.tar.gz.

File metadata

  • Download URL: crawler_user_agents-0.3.0.tar.gz
  • Upload date:
  • Size: 48.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.0.0 CPython/3.12.3

File hashes

Hashes for crawler_user_agents-0.3.0.tar.gz
Algorithm Hash digest
SHA256 6e3aecce347615a5cf82d3c95f92c03a6bce3d34a1ed959f1b890aca0ed561b3
MD5 bfc1a36da38669d050f943bc4de2ddad
BLAKE2b-256 192d1e46cd19f20c2d8b9ae4284f543f727b3845f0467def01b24b6b81108d66

See more details on using hashes here.

File details

Details for the file crawler_user_agents-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for crawler_user_agents-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c4128ada991c29a33c9dca57c307aab97971fb79aa662f5d277c3268d2fb8e8c
MD5 61918280a2aa77f61b21833dca8ae60a
BLAKE2b-256 9eb223aeab2d939793ad6ec2937ea6a5f0a7e0b8668c600836506d3e10263a91

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page