Scans web pages for images
Project description
imgscraper: yet another web scraper
imgscraper is a simple library that allows you to retrieve image information from meme sites.
Installation
To install imgscraper, you can use pip:
pip install imgscraper
ImgScraper supports Python 3.10+
Usage
from imgscraper.scraper_constructor import create_scraper
img_scraper = create_scraper(
website_url="https://imagocms.webludus.pl/",
container_class="image-holder",
pagination_class="pagination"
)
img_scraper.start_sync()
print(img_scraper.synchronization_data)
Output:
[
Image(
source="https://imagocms.webludus.pl/images/01/",
url_address="https://imagocms.webludus.pl/img/01.jpg",
title="String"
),
Image(
source="https://imagocms.webludus.pl/images/02/",
url_address="https://imagocms.webludus.pl/img/02.jpg",
title="String"
)
]
Pages to scan and scraper
The user can specify how many subpages should be scraped and what tool the application should use.
from imgscraper.scraper_constructor import create_scraper
img_scraper = create_scraper(
website_url="https://imagocms.webludus.pl/",
container_class="image-holder",
pagination_class="pagination",
pages_to_scan=1,
scraper="bs4"
)
Last sync data
When starting the synchronization process, the user can provide data from the last synchronization (img.src). If the application encounters a provided image, the process is terminated. All previously synced images are available.
scraper.start_sync(
(
"https://imagocms.webludus.pl/img/01.jpg",
"https://imagocms.webludus.pl/img/02.jpg",
)
)
Image Object
The Image object provides the .as_dict()
method to turn it into a dictionary.
img = Image(
source="https://imagocms.webludus.pl/images/01/",
url_address="https://imagocms.webludus.pl/img/01.jpg",
title="String"
).as_dict()
Output:
img = {
"source": "https://imagocms.webludus.pl/images/01/",
"url_address": "https://imagocms.webludus.pl/img/01.jpg",
"title": "String"
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
imgscraper-0.2.0.tar.gz
(10.8 kB
view hashes)
Built Distribution
imgscraper-0.2.0-py3-none-any.whl
(10.4 kB
view hashes)
Close
Hashes for imgscraper-0.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c59ef16e78c1a47d768be7e3b4190ac939622f32149d1529eaa935418e8812e1 |
|
MD5 | d7ce682431db3a720f99490da1697412 |
|
BLAKE2b-256 | dd94b7b2c7cd27851f7fbc7fa616a3b3b86945e3d8bf480679e5ec372dd1e773 |