`scrape_files` is a tool to help scrape things online to your local machine.
Project description
scrape_files
is a tool to help scrape things online to your local machine.
Currently, it supports scraping and converting htmls to well-formatted markdowns for easy reading as well as scraping and downloading images of various formats in a web page.
Scraping htmls to your local machine
The html parsing logic is similar to a browser's easyread extension's, which trims off all the unnecessary decorations from a web page, only keeping the title and the article content. The main difference is that the file is downloaded and converted as a pretty formatted markdown.
Also support scraping links under the <p>
tag in the current page concurrently.
Terminal usage:
scrape html <url> # specify a url for scraping
scrape html <url> -d # specify a directory name for saving files in current folder
scrape html <url> -l # specify a level: 1 by default for the current page; 2 for links in the current page
Scraping images to your local machine
Images are scraped and downloaded concurrently. Supported formats: jpg, png, gif, svg, jpeg, webp; defaults to all supported formats.
Terminal usage:
scrape image <url> # specify a url for scraping
scrape image <url> -d # specify a diretory name for saving files in current folder
scrape image <url> -f # specify image formats separated with space
Installation
pip install scrape_files
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for scrape_files-0.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 49edf14e5deda8af9f7714f01621d9e49f8d56006e7dbdf2f42c96574feec316 |
|
MD5 | f8d74b8321522603714cbd3382f4d295 |
|
BLAKE2b-256 | dfd2e74469c6ceb878d17c8a4c8b18c1b461cce6a25820be6b16564c6bb53e37 |