A package to bulk match urls to robots.txt files
Project description
DeepCrawl Robots.txt live checker
> cat urls.txt
https://www.ebay.at/
https://www.ebay.at/adchoice
https://www.ebay.at/sl/sell
https://www.ebay.at/mye/myebay/watchlist
https://www.ebay.at/sch/ebayadvsearch
https://www.ebay.at/sch/Kleidung-Accessoires-/11450/i.html
https://www.ebay.at/sch/Auto-Tuning-Styling-/107059/i.html
https://www.ebay.at/sch/Modeschmuck-/10968/i.html
https://www.ebay.at/sch/Damenschuhe-/3034/i.html
> cat robots.txt
User-agent: *
Disallow: /sch/
pip install deepcrawl_robots
from deepcrawl_robots import Processor
urls_path = "Path to urls file"
robots_txt_path = "Path to robots.txt file"
processor = Processor(
user_agent="User agent",
urls_file_path=urls_path,
robots_file_path=robots_txt_path
)
> cat result.txt
https://www.ebay.at/mye/myebay/watchlist,true
https://www.ebay.at/adchoice,true
https://www.ebay.at/,true
https://www.ebay.at/sl/sell,true
https://www.ebay.at/sch/Kleidung-Accessoires-/11450/i.html,false
https://www.ebay.at/sch/ebayadvsearch,true
https://www.ebay.at/sch/Modeschmuck-/10968/i.html,false
https://www.ebay.at/sch/Auto-Tuning-Styling-/107059/i.html,false
https://www.ebay.at/sch/Damenschuhe-/3034/i.html,false
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file deepcrawl_robots-0.0.5.tar.gz
.
File metadata
- Download URL: deepcrawl_robots-0.0.5.tar.gz
- Upload date:
- Size: 1.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.7.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 82480e1c42d0dca6aa5557dceb443ff32a521b5c07d41604edaaf48cda24f228 |
|
MD5 | d649370c0cdf922698b98c1e0f3153aa |
|
BLAKE2b-256 | 0471fd97748d4cbfdfa749747fd2024e62b1e6ee82cc92c50204ced9986e616b |
File details
Details for the file deepcrawl_robots-0.0.5-py3-none-any.whl
.
File metadata
- Download URL: deepcrawl_robots-0.0.5-py3-none-any.whl
- Upload date:
- Size: 1.3 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.25.1 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.54.1 CPython/3.7.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a40dfb7d24230d5fd8121284d41f6b8d8f625613e348a55274e32895eaadbbe0 |
|
MD5 | ae5a9392a6e8cb9b7e554bd65b9280cd |
|
BLAKE2b-256 | a1d2596eac60e310048a99b4ef1fc9f3a23abf0bc969df7f2d011749dadc54ac |