Scans web pages for images
Project description
imgscraper: yet another web scraper
imgscraper is a simple library that allows you to retrieve image information from meme sites.
Installation
To install imgscraper, you can use pip:
pip install imgscraper
ImgScraper supports Python 3.10+
Usage
from imgscraper.scraper_constructor import create_scraper
img_scraper = create_scraper(
website_url="https://imagocms.webludus.pl/",
container_class="image-holder",
pagination_class="pagination"
)
img_scraper.start_sync()
print(img_scraper.synchronization_data)
Output:
[
Image(
source="https://imagocms.webludus.pl/images/01/",
url_address="https://imagocms.webludus.pl/img/01.jpg",
title="String"
),
Image(
source="https://imagocms.webludus.pl/images/02/",
url_address="https://imagocms.webludus.pl/img/02.jpg",
title="String"
)
]
Pages to scan and scraper
The user can specify how many subpages should be scraped and what tool the application should use.
from imgscraper.scraper_constructor import create_scraper
img_scraper = create_scraper(
website_url="https://imagocms.webludus.pl/",
container_class="image-holder",
pagination_class="pagination",
pages_to_scan=1,
scraper="bs4"
)
Last sync data
When starting the synchronization process, the user can provide data from the last synchronization (img.src). If the application encounters a provided image, the process is terminated. All previously synced images are available.
scraper.start_sync(
(
"https://imagocms.webludus.pl/img/01.jpg",
"https://imagocms.webludus.pl/img/02.jpg",
)
)
Image Object
The Image object provides the .as_dict() method to turn it into a dictionary.
img = Image(
source="https://imagocms.webludus.pl/images/01/",
url_address="https://imagocms.webludus.pl/img/01.jpg",
title="String"
).as_dict()
Output:
img = {
"source": "https://imagocms.webludus.pl/images/01/",
"url_address": "https://imagocms.webludus.pl/img/01.jpg",
"title": "String"
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file imgscraper-0.3.0.tar.gz.
File metadata
- Download URL: imgscraper-0.3.0.tar.gz
- Upload date:
- Size: 10.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1676eb024116d67d96a50678c3d9353d2da258054f415be4292a0c10435008db
|
|
| MD5 |
5bc556c076a7ed0ed778ae5f55f85958
|
|
| BLAKE2b-256 |
c89e4f2796000510e0198015ccbb24ff6c901d4abab1e6a04c3e274a77f1c76d
|
File details
Details for the file imgscraper-0.3.0-py3-none-any.whl.
File metadata
- Download URL: imgscraper-0.3.0-py3-none-any.whl
- Upload date:
- Size: 10.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ede39317f923d77b5e19c3b70b4e830d1d60c4dc1c5ca665576e6398d3277f75
|
|
| MD5 |
401097150ae57328153ae5ed2c2a4077
|
|
| BLAKE2b-256 |
a606998e1c5ee4023afecdd16bc3862cf9fca6f9db955db44ae29244d85aba91
|