`scrape_files` is a tool to help scrape things online to your local machine.
Project description
scrape_files
is a tool to help scrape things online to your local machine.
Currently, it supports scraping and converting htmls to well-formatted markdowns for easy reading as well as scraping and downloading images of various formats in a web page.
Scraping htmls to your local machine
The html parsing logic is similar to a browser's easyread extension's, which trims off all the unnecessary decorations from a web page, only keeping the title and the article content. The main difference is that the file is downloaded as pretty formatted markdown.
Also support scraping links under the <p>
tag in the current page concurrently.
Terminal usage:
scrape html <url> # specify a url for scraping
scrape html <url> -d # specify a directory name for saving files in current folder
scrape html <url> -l # specify a level: 1 by default for the current page; 2 for links in the current page
Scraping images to your local machine
Images are scraped and downloaded concurrently. Supported formats: jpg, png, gif, svg, jpeg, webp; defaults to all supported formats.
Terminal usage:
scrape image <url> # specify a url for scraping
scrape image <url> -d # specify a diretory name for saving files in current folder
scrape image <url> -f # specify image formats separated with space
Installation
pip install scrape_files
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for scrape_files-0.1.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e49b743121b953a06addbd43d5b9f62f7dc986b6ced25ee448ef2e63f4d7ca6d |
|
MD5 | 317b40b9f57627661b3a34630d2d5c78 |
|
BLAKE2b-256 | f9bad0cb940c3c10511674695d883f1067b8c889f3f3592cf4ad3a5142118b96 |