Skip to main content

Scans web pages for images

Project description

Tests PyPI - Python Version PyPI License: MIT Code style: black Ruff codecov CodeFactor

imgscraper: yet another web scraper

imgscraper is a simple library that allows you to retrieve image information from meme sites.

Installation

To install imgscraper, you can use pip:

pip install imgscraper

ImgScraper supports Python 3.10+

Usage

from imgscraper.scraper_constructor import create_scraper

img_scraper = create_scraper(
    website_url="https://imagocms.webludus.pl/",
    container_class="image-holder",
    pagination_class="pagination"
)

img_scraper.start_sync()

print(img_scraper.synchronization_data)

Output:

[
    Image(
        source="https://imagocms.webludus.pl/images/01/",
        url_address="https://imagocms.webludus.pl/img/01.jpg",
        title="String"
    ),
    Image(
        source="https://imagocms.webludus.pl/images/02/",
        url_address="https://imagocms.webludus.pl/img/02.jpg",
        title="String"
    )
]

Pages to scan and scraper

The user can specify how many subpages should be scraped and what tool the application should use.

from imgscraper.scraper_constructor import create_scraper

img_scraper = create_scraper(
    website_url="https://imagocms.webludus.pl/",
    container_class="image-holder",
    pagination_class="pagination",
    pages_to_scan=1,
    scraper="bs4"
)

Last sync data

When starting the synchronization process, the user can provide data from the last synchronization (img.src). If the application encounters a provided image, the process is terminated. All previously synced images are available.

scraper.start_sync(
    (
        "https://imagocms.webludus.pl/img/01.jpg",
        "https://imagocms.webludus.pl/img/02.jpg",
    )
)

Image Object

The Image object provides the .as_dict() method to turn it into a dictionary.

img = Image(
    source="https://imagocms.webludus.pl/images/01/",
    url_address="https://imagocms.webludus.pl/img/01.jpg",
    title="String"
).as_dict()

Output:

img = {
    "source": "https://imagocms.webludus.pl/images/01/",
    "url_address": "https://imagocms.webludus.pl/img/01.jpg",
    "title": "String"
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

imgscraper-0.2.0.tar.gz (10.8 kB view hashes)

Uploaded Source

Built Distribution

imgscraper-0.2.0-py3-none-any.whl (10.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page