A python package that enhances speed and simplicity of parsing robots files.
Project description
robotsparse
A python package that enhances speed and simplicity of parsing robots files.
Usage
Basic usage, such as getting robots contents:
import robotsparse
#NOTE: The `find_url` parameter will redirect the url to the default robots location.
robots = robotsparse.getRobots("https://github.com/", find_url=True)
print(list(robots)) # output: ['user-agents']
The user-agents
key will contain each user-agent found in the robots file contents along with information associated with them.
Alternatively, we can assign the robots contents as an object, which allows faster accessability:
import robotsparse
# This function returns a class.
robots = robotsparse.getRobotsObject("https://duckduckgo.com/", find_url=True)
assert isinstance(robots, object)
print(robots.allow) # Prints allowed locations
print(robots.disallow) # Prints disallowed locations
print(robots.crawl_delay) # Prints found crawl-delays
print(robots.robots) # This output is equivalent to the above example
Additional Features
When parsing robots files, it sometimes may be useful to parse sitemap files:
import robotsparse
sitemap = robotsparse.getSitemap("https://pypi.org/", find_url=True)
The above code contains a variable named sitemap
which contains information that looks like this:
[{"url": "", "lastModified": ""}]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file robotsparse-1.0.tar.gz
.
File metadata
- Download URL: robotsparse-1.0.tar.gz
- Upload date:
- Size: 4.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2bed0da0873c055653e39cc67bbea96fb8c9de3d1e7c5ada77003d7b86615479 |
|
MD5 | ccda89d76500ae098ca82b54d9468837 |
|
BLAKE2b-256 | 9afcd560faeb84d68802cc3ab4459a5353cedaadf021e9aee6ed08626936a577 |
File details
Details for the file robotsparse-1.0-py3-none-any.whl
.
File metadata
- Download URL: robotsparse-1.0-py3-none-any.whl
- Upload date:
- Size: 5.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | aad90a9604b8ca94f47e0a151f6352e356512c48dc52140245d7a8591996d736 |
|
MD5 | a40feb6f4ea4395b979ced91cc822402 |
|
BLAKE2b-256 | 6d309ee2722e62100da6ac9f15fcbdb75d818aa06cdf2bc401e86a85e1e1275e |