A package to bulk match urls to robots.txt files
Project description
DeepCrawl Robots.txt live checker
> cat urls.txt
https://www.ebay.at/
https://www.ebay.at/adchoice
https://www.ebay.at/sl/sell
https://www.ebay.at/mye/myebay/watchlist
https://www.ebay.at/sch/ebayadvsearch
https://www.ebay.at/sch/Kleidung-Accessoires-/11450/i.html
https://www.ebay.at/sch/Auto-Tuning-Styling-/107059/i.html
https://www.ebay.at/sch/Modeschmuck-/10968/i.html
https://www.ebay.at/sch/Damenschuhe-/3034/i.html
> cat robots.txt
User-agent: *
Disallow: /sch/
pip install deepcrawl_robots
from deepcrawl_robots import Processor
urls_path = "Path to urls file"
robots_txt_path = "Path to robots.txt file"
processor = Processor(
user_agent="User agent",
urls_file_path=urls_path,
robots_file_path=robots_txt_path
)
> cat result.txt
https://www.ebay.at/mye/myebay/watchlist,true
https://www.ebay.at/adchoice,true
https://www.ebay.at/,true
https://www.ebay.at/sl/sell,true
https://www.ebay.at/sch/Kleidung-Accessoires-/11450/i.html,false
https://www.ebay.at/sch/ebayadvsearch,true
https://www.ebay.at/sch/Modeschmuck-/10968/i.html,false
https://www.ebay.at/sch/Auto-Tuning-Styling-/107059/i.html,false
https://www.ebay.at/sch/Damenschuhe-/3034/i.html,false
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
deepcrawl_robots-0.0.5.tar.gz
(1.3 MB
view hashes)
Built Distribution
Close
Hashes for deepcrawl_robots-0.0.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a40dfb7d24230d5fd8121284d41f6b8d8f625613e348a55274e32895eaadbbe0 |
|
MD5 | ae5a9392a6e8cb9b7e554bd65b9280cd |
|
BLAKE2b-256 | a1d2596eac60e310048a99b4ef1fc9f3a23abf0bc969df7f2d011749dadc54ac |