a webpage scraping tool
Project description
# scrape
## a webpage scraping tool
## Installation * pip install scrape
## Usage usage: scrape.py [-h] [-c [CRAWL [CRAWL …]]] [-ca] [-l LIMIT] [-t] url [keywords [keywords …]]
a webpage scraping tool
positional arguments:
url url to scrape
keywords keywords to search
optional arguments:
-h, –help show this help message and exit
-c [CRAWL [CRAWL …]], –crawl [CRAWL [CRAWL …]] crawl links based on these keywords
-ca, –crawl-all crawl all links
-l LIMIT, –limit LIMIT crawl page limit
-t, –text write to text instead of pdf
## Author * Hunter Hammond (huntrar@gmail.com)
## Notes * Unless specified using the –text flag, all webpages are saved as pdf files using pdfkit.
Entering keyword arguments while using the –text flag allows users to save only lines matching one of the given keywords.
You can crawl subsequent webpages using by passing a substring of the url you wish to match using –crawl, or by using –crawl-all.
There is no limit to the number of pages to be crawled unless one is set using the –limit flag.
News
0.0.5
added –verbose argument for use with pdfkit
improved output file name processing
0.0.4
accepts 0 or 1 url’s, allowing a call with just –version
0.0.3
Moved utils.py to scrape/
0.0.2
First entry
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.