A python package that enhances speed and simplicity of parsing robots files.
Project description
robotsparse
A python package that enhances speed and simplicity of parsing robots files.
Usage
Basic usage, such as getting robots contents:
import robotsparse
#NOTE: The `find_url` parameter will redirect the url to the default robots location.
robots = robotsparse.urlRobots("https://github.com/", find_url=True)
print(list(robots)) # output: ['user-agents']
The user-agents
key will contain each user-agent found in the robots file contents along with information associated with them.
Alternatively, we can assign the robots contents as an object, which allows faster accessability:
import robotsparse
# This function returns a class.
robots = robotsparse.getRobots("https://duckduckgo.com/", find_url=True)
assert isinstance(robots, object)
print(robots.allow) # Prints allowed locations
print(robots.disallow) # Prints disallowed locations
print(robots.crawl_delay) # Prints found crawl-delays
print(robots.robots) # This output is equivalent to the above example
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
robotsparse-0.1.tar.gz
(4.2 kB
view hashes)
Built Distribution
Close
Hashes for robotsparse-0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | bd95c9a78d12a69b6e00f1f063448aa81f5d893753b7da359df84f0ff426dfcc |
|
MD5 | 619c45b604167a63d6b50a8ee25c2d48 |
|
BLAKE2b-256 | bf6e1515da793dbab42debbdaab8d508f9e5200a29651f5bd5c84746cfe32547 |