Skip to main content

A python package that enhances speed and simplicity of parsing robots files.

Project description

robotsparse

Pepy Total Downlods
A python package that enhances speed and simplicity of parsing robots files.

Usage

Basic usage, such as getting robots contents:

import robotsparse

#NOTE: The `find_url` parameter will redirect the url to the default robots location.
robots = robotsparse.getRobots("https://github.com/", find_url=True)
print(list(robots)) # output: ['user-agents']

The user-agents key will contain each user-agent found in the robots file contents along with information associated with them.

Alternatively, we can assign the robots contents as an object, which allows faster accessability:

import robotsparse

# This function returns a class.
robots = robotsparse.getRobotsObject("https://duckduckgo.com/", find_url=True)
assert isinstance(robots, object)
print(robots.allow) # Prints allowed locations
print(robots.disallow) # Prints disallowed locations
print(robots.crawl_delay) # Prints found crawl-delays
print(robots.robots) # This output is equivalent to the above example

Additional Features

When parsing robots files, it sometimes may be useful to parse sitemap files:

import robotsparse
sitemap = robotsparse.getSitemap("https://pypi.org/", find_url=True)

The above code contains a variable named sitemap which contains information that looks like this:

[{"url": "", "lastModified": ""}]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

robotsparse-1.0.tar.gz (4.9 kB view hashes)

Uploaded Source

Built Distribution

robotsparse-1.0-py3-none-any.whl (5.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page