Skip to main content

a webpage scraping tool

Project description

# scrape

## a webpage scraping tool

## Installation * pip install scrape

## Usage usage: scrape.py [-h] [-c [CRAWL [CRAWL …]]] [-ca] [-l LIMIT] [-t]       url [keywords [keywords …]]

a webpage scraping tool

  positional arguments:

    url   url to scrape

    keywords   keywords to search

  optional arguments:

    -h, –help  show this help message and exit

    -c [CRAWL [CRAWL …]], –crawl [CRAWL [CRAWL …]]       crawl links based on these keywords

    -ca, –crawl-all  crawl all links

  -l LIMIT, –limit LIMIT    crawl page limit

  -t, –text    write to text instead of pdf

## Author * Hunter Hammond (huntrar@gmail.com)

## Notes * Unless specified using the –text flag, all webpages are saved as pdf files using pdfkit.

  • Entering keyword arguments while using the –text flag allows users to save only lines matching one of the given keywords.

  • You can crawl subsequent webpages using by passing a substring of the url you wish to match using –crawl, or by using –crawl-all.

  • There is no limit to the number of pages to be crawled unless one is set using the –limit flag.

News

0.0.5

  • added –verbose argument for use with pdfkit

  • improved output file name processing

0.0.4

  • accepts 0 or 1 url’s, allowing a call with just –version

0.0.3

  • Moved utils.py to scrape/

0.0.2

  • First entry

Project details


Release history Release notifications | RSS feed

This version

0.0.5

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

scrape-0.0.5-py2-none-any.whl (6.2 kB view hashes)

Uploaded Python 2

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page