Skip to main content

Package and CLI for downloading media from a webpage.

Project description

Pixelripper

Package and CLI for downloading media from a webpage.
Install with:

pip install pixelripper

Pixelripper contains a class called PixelRipper and a subclass called PixelRipperSelenium.
PixelRipper uses the requests library to fetch webpages and PixelRipperSelenium uses a selenium based engine to do the same.
The selenium engine is slower and requires more resources, but is useful for webpages that don't render their media content without a JavaScript engine.
It can use either Firefox or Chrome browsers.
Note: You must have the appropriate webdriver for your machine and browser version installed in order to use PixelRipperSelenium.
pixelripper can be used programmatically or from the command line.

Programmatic usage:

from pixelripper import PixelRipper
from pathlib import Path
ripper = PixelRipper()
# Scrape the page for image, video, and audio urls.
ripper.rip("https://somewebsite.com")
# Any content urls found will now be accessible as members of ripper.
print(ripper.image_urls)
print(ripper.video_urls)
print(ripper.audio_urls)
# All the urls found on a page can be accessed through the ripper.scraper member.
all_urls = ripper.scraper.get_links("all")
# The urls can also be filtered according to a list of extensions 
# with the filter_by_extensions function.
# The following will return only .jpg and .mp3 file urls.
urls = ripper.filter_by_extensions([".jpg", ".mp3"])
# The content can then be downloaded.
ripper.download_files(urls, Path.cwd()/"somewebsite")
# Alternatively, everything in ripper.image_urls, ripper.video_urls, and ripper.audio_urls
# can be downloaded with just a call to ripper.download_all()
ripper.download_all(Path.cwd()/"somewebsite")
# Separate subfolders named "images", "videos", and "audio"
# will be created inside the "somewebsite" folder when using this function.

Command line usage:

>pixelripper -h
usage: pixelripper [-h] [-s] [-nh] [-b BROWSER] [-o OUTPUT_PATH] [-eh [EXTRA_HEADERS ...]] url

positional arguments:
  url                   The url to scrape for media.

options:
  -h, --help            show this help message and exit
  -s, --selenium        Use selenium to get page content instead of requests.
  -nh, --no_headless    Don't use headless mode when using -s/--selenium.
  -b BROWSER, --browser BROWSER
                        The browser to use when using -s/--selenium. Can be 'firefox' or 'chrome'. You must have the appropriate webdriver installed for your machine and browser version in order to use the selenium engine.
  -o OUTPUT_PATH, --output_path OUTPUT_PATH
                        Output directory to save results to. If not specified, a folder with the name of the webpage will be created in the current working directory.
  -eh [EXTRA_HEADERS ...], --extra_headers [EXTRA_HEADERS ...]
                        Extra headers to use when requesting files as key, value pairs. Keys and values whould be colon separated and pairs should be space separated. e.g. -eh Referer:website.com/page Host:website.com

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pixelripper-0.0.1.tar.gz (57.4 kB view details)

Uploaded Source

Built Distribution

pixelripper-0.0.1-py3-none-any.whl (7.8 kB view details)

Uploaded Python 3

File details

Details for the file pixelripper-0.0.1.tar.gz.

File metadata

  • Download URL: pixelripper-0.0.1.tar.gz
  • Upload date:
  • Size: 57.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.0

File hashes

Hashes for pixelripper-0.0.1.tar.gz
Algorithm Hash digest
SHA256 d2cce23ca51db8dcdd9c8acb7c25f7a08d128194bf1b3b296f4dd7de56dde687
MD5 0b13cb0ff105214ab2bbd5513827b207
BLAKE2b-256 db32c14da54fbe9c0dd0b85f2e563478d40457f6c97d52ef12c88b62c8b175f0

See more details on using hashes here.

File details

Details for the file pixelripper-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: pixelripper-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 7.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.0

File hashes

Hashes for pixelripper-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 aa7dd6b59fce552b162fc9147c1cb4d72b2897ef620f639d89330f4df8e7e010
MD5 ac5af15e3cb6694152c2b0ed0114006e
BLAKE2b-256 9f203441aaaa8dcee8feab509f4b4c486f8adb39c36b41b00c8ef479bc671250

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page