Project description

scrape_files is a tool to help scrape things online to your local machine. Currently, it supports scraping and converting htmls to well-formatted markdowns for easy reading as well as scraping and downloading images of various formats in a web page.

Scraping htmls to your local machine

The html parsing logic is similar to a browser's easyread extension's, which trims off all the unnecessary decorations from a web page, only keeping the title and the article content. The main difference is that the file is downloaded and converted as a pretty formatted markdown.

Also support scraping links under the <p> tag in the current page concurrently.

Terminal usage:

scrape html <url>     # specify a url for scraping
scrape html <url> -d  # specify a directory name for saving files in current folder
scrape html <url> -l  # specify a level: 1 by default for the current page; 2 for links in the current page

Scraping images to your local machine

Images are scraped and downloaded concurrently. Supported formats: jpg, png, gif, svg, jpeg, webp; defaults to all supported formats.

Terminal usage:

scrape image <url>     # specify a url for scraping
scrape image <url> -d  # specify a diretory name for saving files in current folder 
scrape image <url> -f  # specify image formats separated with space

Installation

pip install scrape_files

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.2.0

Mar 23, 2023

0.1.5

Nov 1, 2022

0.1.4

Nov 1, 2022

0.1.3

Oct 31, 2022

0.1.2

Oct 26, 2022

0.1.1

Oct 26, 2022

0.1.0

Oct 26, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrape_files-0.2.0.tar.gz (8.9 kB view hashes)

Uploaded Mar 23, 2023 Source

Built Distribution

scrape_files-0.2.0-py3-none-any.whl (10.3 kB view hashes)

Uploaded Mar 23, 2023 Python 3

Hashes for scrape_files-0.2.0.tar.gz

Hashes for scrape_files-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`c7fbb757c549c82f0a707a7a35c2b4824a52c80c3670f693bb05a2d4a8523595`
MD5	`f413573c9e978a1c40ee6b908a4ca734`
BLAKE2b-256	`33153a6271aa3bfdf4fa3e30debc001743fb66aac86d71da51775de988191a4b`

Hashes for scrape_files-0.2.0-py3-none-any.whl

Hashes for scrape_files-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`49edf14e5deda8af9f7714f01621d9e49f8d56006e7dbdf2f42c96574feec316`
MD5	`f8d74b8321522603714cbd3382f4d295`
BLAKE2b-256	`dfd2e74469c6ceb878d17c8a4c8b18c1b461cce6a25820be6b16564c6bb53e37`