`scrape_files` is a tool to help scrape things online to your local machine.
Project description
scrape_files
is a tool to help scrape things online to your local machine.
Currently, it supports scraping and converting htmls to well-formatted markdowns for easy reading as well as scraping and downloading images of various formats in a web page.
Scraping htmls to your local machine
The html parsing logic is similar to a browser's easyread extension's, which trims off all the unnecessary decorations from a web page, only keeping the title and the article content. The main difference is that the file is downloaded and converted as a pretty formatted markdown.
Also support scraping links under the <p>
tag in the current page concurrently.
Terminal usage:
scrape html <url> # specify a url for scraping
scrape html <url> -d # specify a directory name for saving files in current folder
scrape html <url> -l # specify a level: 1 by default for the current page; 2 for links in the current page
Scraping images to your local machine
Images are scraped and downloaded concurrently. Supported formats: jpg, png, gif, svg, jpeg, webp; defaults to all supported formats.
Terminal usage:
scrape image <url> # specify a url for scraping
scrape image <url> -d # specify a diretory name for saving files in current folder
scrape image <url> -f # specify image formats separated with space
Installation
pip install scrape_files
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file scrape_files-0.2.0.tar.gz
.
File metadata
- Download URL: scrape_files-0.2.0.tar.gz
- Upload date:
- Size: 8.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.28.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | c7fbb757c549c82f0a707a7a35c2b4824a52c80c3670f693bb05a2d4a8523595 |
|
MD5 | f413573c9e978a1c40ee6b908a4ca734 |
|
BLAKE2b-256 | 33153a6271aa3bfdf4fa3e30debc001743fb66aac86d71da51775de988191a4b |
File details
Details for the file scrape_files-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: scrape_files-0.2.0-py3-none-any.whl
- Upload date:
- Size: 10.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.28.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 49edf14e5deda8af9f7714f01621d9e49f8d56006e7dbdf2f42c96574feec316 |
|
MD5 | f8d74b8321522603714cbd3382f4d295 |
|
BLAKE2b-256 | dfd2e74469c6ceb878d17c8a4c8b18c1b461cce6a25820be6b16564c6bb53e37 |