A library which does some basic web scraping operations
Project description
easierscrape
A library for basic web scraping.
Overview
easierscrape is a library which helps users do some basic web scraping operations. Oftentimes when doing webscraping code is written and re-written with slightly changed parameters to fit the website to be scraped from. This library is an easy to use tool that can scrape essentials from websites (tables, links, files, etc.). It also has the ability to generate hyperlink trees using anytree.
Details
This project is a pure python project using modern tooling. It uses a Makefile
as a command registry, with the following commands:
make
: list available commandsmake develop
: install and build this library and its dependencies usingpip
make build
: build the library usingsetuptools
make lint
: perform static analysis of this library withflake8
andblack
make format
: autoformat this library usingblack
make annotate
: run type checking usingmypy
make test
: run automated tests withpytest
make coverage
: run automated tests withpytest
and collect coverage informationmake dist
: package library for distribution
Basic Usage
Install with pip: pip install easierscrape
Import Scraper
from easierscrape
and instantiate it with a url as seen below:
from easierscrape import Scraper
scraper = Scraper("https://quotes.toscrape.com/login")
From there, call class methods to scrape varying resources.
Usage examples:
>>> scraper.parse_text()
["Quotes to Scrape", "Quotes to Scrape", "Login", "Username", "Password", "Quotes by:", "GoodReads.com", "Made with", "❤", "by", "Scrapinghub",]
>>> scraper.print_tree(1)
https://toscrape.com
├── http://books.toscrape.com
├── http://quotes.toscrape.com
├── http://quotes.toscrape.com/scroll
├── http://quotes.toscrape.com/js
├── http://quotes.toscrape.com/js-delayed
├── http://quotes.toscrape.com/tableful
├── http://quotes.toscrape.com/login
├── http://quotes.toscrape.com/search.aspx
└── http://quotes.toscrape.com/random
Command Line Usage
When installed, you can invoke easierscrape from the command-line to generate a hyperlink tree, get a screenshot, download all image, txt, and pdf files, and scrape any tables for a given url and depth:
usage: python -m easierscrape [-h] url depth download_path
positional arguments:
url the url to scrape
depth the depth of the scrape tree
download_path the location to download files to
optional arguments:
-h, --help show this help message and exit
Usage example:
>>> python -m easierscrape https://toscrape.com/ 1 example_down_path
https://toscrape.com
├── http://books.toscrape.com
├── http://quotes.toscrape.com
├── http://quotes.toscrape.com/scroll
├── http://quotes.toscrape.com/js
├── http://quotes.toscrape.com/js-delayed
├── http://quotes.toscrape.com/tableful
├── http://quotes.toscrape.com/login
├── http://quotes.toscrape.com/search.aspx
└── http://quotes.toscrape.com/random
Contributing
See CONTRIBUTING for more information.
License
This software is licensed under the Apache 2.0 license. Please see LICENSE for more information.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file easierscrape-0.2.2.tar.gz
.
File metadata
- Download URL: easierscrape-0.2.2.tar.gz
- Upload date:
- Size: 337.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d97964bbf3022568241004398cea885c3e2105168d535c8d6e19c29059925c7d |
|
MD5 | 0222ac244ed7c55ccb2857dd7f4b266f |
|
BLAKE2b-256 | 0234a7bffc196bd3ed256ab2ece3203beca7de6b71e8b00193c938e9052a14da |