Skip to main content

A python package that enhances speed and simplicity of parsing robots files.

Project description

robotsparse

Pepy Total Downlods
A python package that enhances speed and simplicity of parsing robots files.

Usage

Basic usage, such as getting robots contents:

import robotsparse

#NOTE: The `find_url` parameter will redirect the url to the default robots location.
robots = robotsparse.getRobots("https://github.com/", find_url=True)
print(list(robots)) # output: ['user-agents']

The user-agents key will contain each user-agent found in the robots file contents along with information associated with them.

Alternatively, we can assign the robots contents as an object, which allows faster accessability:

import robotsparse

# This function returns a class.
robots = robotsparse.getRobotsObject("https://duckduckgo.com/", find_url=True)
assert isinstance(robots, object)
print(robots.allow) # Prints allowed locations
print(robots.disallow) # Prints disallowed locations
print(robots.crawl_delay) # Prints found crawl-delays
print(robots.robots) # This output is equivalent to the above example

Additional Features

When parsing robots files, it sometimes may be useful to parse sitemap files:

import robotsparse
sitemap = robotsparse.getSitemap("https://pypi.org/", find_url=True)

The above code contains a variable named sitemap which contains information that looks like this:

[{"url": "", "lastModified": ""}]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

robotsparse-1.0.tar.gz (4.9 kB view details)

Uploaded Source

Built Distribution

robotsparse-1.0-py3-none-any.whl (5.6 kB view details)

Uploaded Python 3

File details

Details for the file robotsparse-1.0.tar.gz.

File metadata

  • Download URL: robotsparse-1.0.tar.gz
  • Upload date:
  • Size: 4.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.8

File hashes

Hashes for robotsparse-1.0.tar.gz
Algorithm Hash digest
SHA256 2bed0da0873c055653e39cc67bbea96fb8c9de3d1e7c5ada77003d7b86615479
MD5 ccda89d76500ae098ca82b54d9468837
BLAKE2b-256 9afcd560faeb84d68802cc3ab4459a5353cedaadf021e9aee6ed08626936a577

See more details on using hashes here.

File details

Details for the file robotsparse-1.0-py3-none-any.whl.

File metadata

  • Download URL: robotsparse-1.0-py3-none-any.whl
  • Upload date:
  • Size: 5.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.8

File hashes

Hashes for robotsparse-1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 aad90a9604b8ca94f47e0a151f6352e356512c48dc52140245d7a8591996d736
MD5 a40feb6f4ea4395b979ced91cc822402
BLAKE2b-256 6d309ee2722e62100da6ac9f15fcbdb75d818aa06cdf2bc401e86a85e1e1275e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page