Parse robots.txt files and find indexed urls

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 1 - Planning
Intended Audience
- Developers
Operating System
Programming Language
- Python :: 3

Project description

robotsparser

Python library that parses robots.txt files

Functionalities

Automatically discover all sitemap files
Unzip gziped files
Fetch all URLs from sitemaps

Install

pip install robotsparser

Usage

from robotsparser.parser import Robotparser

robots_url = "https://www.example.com/robots.txt"
rb = Robotparser(url=robots_url, verbose=True)
rb.read() # To initiate the crawl of sitemaps and indexed urls

# Show information
rb.get_urls() # returns a list of all urls
rb.get_sitemaps() # Returns all sitemap locations
rb.get_sitemap_entries() # Returns all sitemap indexes that contain urls

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 1 - Planning
Intended Audience
- Developers
Operating System
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.0.12

Feb 22, 2023

0.0.11

Feb 7, 2023

0.0.10

Feb 7, 2023

0.0.9

Feb 7, 2023

0.0.7

Feb 6, 2023

0.0.6

Feb 6, 2023

0.0.5

Jan 13, 2023

0.0.4

Jan 10, 2023

0.0.3

Jan 10, 2023

This version

0.0.2

Jan 10, 2023

0.0.1

Jan 6, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

robotsparser-0.0.2.tar.gz (3.7 kB view hashes)

Uploaded Jan 10, 2023 Source

Built Distribution

robotsparser-0.0.2-py3-none-any.whl (3.8 kB view hashes)

Uploaded Jan 10, 2023 Python 3

Hashes for robotsparser-0.0.2.tar.gz

Hashes for robotsparser-0.0.2.tar.gz
Algorithm	Hash digest
SHA256	`dfb913ea4b39ed44248f1ab7a98cc42c53246899e4601edef38661a1edc4a51e`
MD5	`020d02e6cecdc9134e236a3d9f789e6f`
BLAKE2b-256	`8e4a8682f3afe1d2a7215575bdf90e908fd6be0fa23f2ac332a58c3a3f81a68f`

Hashes for robotsparser-0.0.2-py3-none-any.whl

Hashes for robotsparser-0.0.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1e24e9125ff8fd3f6d9b111d5e0b53f1de786e441ce24ed7713f6c6103068d29`
MD5	`dde6d9dd666e3152a5850ee26a8e14a6`
BLAKE2b-256	`77d6f2bbaa2c36652f7aa9ee31db36363f941d6744d75ed05c9e22888c64b805`