Skip to main content

Scans web pages for images

Project description

Tests PyPI - Python Version PyPI License: MIT Code style: black Ruff codecov CodeFactor

imgscraper: yet another web scraper

imgscraper is a simple library that allows you to retrieve image information from meme sites.

Installation

To install imgscraper, you can use pip:

pip install imgscraper

ImgScraper supports Python 3.10+

Usage

from imgscraper.scraper_constructor import create_scraper

img_scraper = create_scraper(
    website_url="https://imagocms.webludus.pl/",
    container_class="image-holder",
    pagination_class="pagination"
)

img_scraper.start_sync()

print(img_scraper.synchronization_data)

Output:

[
    Image(
        source="https://imagocms.webludus.pl/images/01/",
        url_address="https://imagocms.webludus.pl/img/01.jpg",
        title="String"
    ),
    Image(
        source="https://imagocms.webludus.pl/images/02/",
        url_address="https://imagocms.webludus.pl/img/02.jpg",
        title="String"
    )
]

Pages to scan and scraper

The user can specify how many subpages should be scraped and what tool the application should use.

from imgscraper.scraper_constructor import create_scraper

img_scraper = create_scraper(
    website_url="https://imagocms.webludus.pl/",
    container_class="image-holder",
    pagination_class="pagination",
    pages_to_scan=1,
    scraper="bs4"
)

Last sync data

When starting the synchronization process, the user can provide data from the last synchronization (img.src). If the application encounters a provided image, the process is terminated. All previously synced images are available.

scraper.start_sync(
    (
        "https://imagocms.webludus.pl/img/01.jpg",
        "https://imagocms.webludus.pl/img/02.jpg",
    )
)

Image Object

The Image object provides the .as_dict() method to turn it into a dictionary.

img = Image(
    source="https://imagocms.webludus.pl/images/01/",
    url_address="https://imagocms.webludus.pl/img/01.jpg",
    title="String"
).as_dict()

Output:

img = {
    "source": "https://imagocms.webludus.pl/images/01/",
    "url_address": "https://imagocms.webludus.pl/img/01.jpg",
    "title": "String"
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

imgscraper-0.3.0.tar.gz (10.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

imgscraper-0.3.0-py3-none-any.whl (10.4 kB view details)

Uploaded Python 3

File details

Details for the file imgscraper-0.3.0.tar.gz.

File metadata

  • Download URL: imgscraper-0.3.0.tar.gz
  • Upload date:
  • Size: 10.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for imgscraper-0.3.0.tar.gz
Algorithm Hash digest
SHA256 1676eb024116d67d96a50678c3d9353d2da258054f415be4292a0c10435008db
MD5 5bc556c076a7ed0ed778ae5f55f85958
BLAKE2b-256 c89e4f2796000510e0198015ccbb24ff6c901d4abab1e6a04c3e274a77f1c76d

See more details on using hashes here.

File details

Details for the file imgscraper-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: imgscraper-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 10.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for imgscraper-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ede39317f923d77b5e19c3b70b4e830d1d60c4dc1c5ca665576e6398d3277f75
MD5 401097150ae57328153ae5ed2c2a4077
BLAKE2b-256 a606998e1c5ee4023afecdd16bc3862cf9fca6f9db955db44ae29244d85aba91

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page