Skip to main content

`scrape_files` is a tool to help scrape things online to your local machine.

Project description

scrape_files is a tool to help scrape things online to your local machine. Currently, it supports scraping and converting htmls to well-formatted markdowns for easy reading as well as scraping and downloading images of various formats in a web page.

Scraping htmls to your local machine

The html parsing logic is similar to a browser's easyread extension's, which trims off all the unnecessary decorations from a web page, only keeping the title and the article content. The main difference is that the file is downloaded and converted as a pretty formatted markdown.

Also support scraping links under the <p> tag in the current page concurrently.

Terminal usage:

scrape html <url>     # specify a url for scraping
scrape html <url> -d  # specify a directory name for saving files in current folder
scrape html <url> -l  # specify a level: 1 by default for the current page; 2 for links in the current page

Scraping images to your local machine

Images are scraped and downloaded concurrently. Supported formats: jpg, png, gif, svg, jpeg, webp; defaults to all supported formats.

Terminal usage:

scrape image <url>     # specify a url for scraping
scrape image <url> -d  # specify a diretory name for saving files in current folder 
scrape image <url> -f  # specify image formats separated with space 

Installation

pip install scrape_files

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

scrape_files-0.2.0.tar.gz (8.9 kB view details)

Uploaded Source

Built Distribution

scrape_files-0.2.0-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file scrape_files-0.2.0.tar.gz.

File metadata

  • Download URL: scrape_files-0.2.0.tar.gz
  • Upload date:
  • Size: 8.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.28.1

File hashes

Hashes for scrape_files-0.2.0.tar.gz
Algorithm Hash digest
SHA256 c7fbb757c549c82f0a707a7a35c2b4824a52c80c3670f693bb05a2d4a8523595
MD5 f413573c9e978a1c40ee6b908a4ca734
BLAKE2b-256 33153a6271aa3bfdf4fa3e30debc001743fb66aac86d71da51775de988191a4b

See more details on using hashes here.

File details

Details for the file scrape_files-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for scrape_files-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 49edf14e5deda8af9f7714f01621d9e49f8d56006e7dbdf2f42c96574feec316
MD5 f8d74b8321522603714cbd3382f4d295
BLAKE2b-256 dfd2e74469c6ceb878d17c8a4c8b18c1b461cce6a25820be6b16564c6bb53e37

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page