Ecoindex_scraper module provides a way to scrape data from given website while simulating a real web browser
Project description
ECOINDEX SCRAPER PYTHON
This module provides a simple interface to get the Ecoindex of a given webpage using module ecoindex-python
Requirements
- Python ^3.8 with pip
- Google Chrome installed on your computer
Install
pip install ecoindex-scraper
Use
Get a page analysis
You can run a page analysis by calling the function get_page_analysis()
:
(function) get_page_analysis: (url: HttpUrl, window_size: WindowSize | None = WindowSize(width=1920, height=1080), wait_before_scroll: int | None = 1, wait_after_scroll: int | None = 1) -> Coroutine[Any, Any, Result]
Example:
import asyncio
from pprint import pprint
from ecoindex_scraper.scrap import EcoindexScraper
pprint(
asyncio.run(
EcoindexScraper(url="http://ecoindex.fr")
.init_chromedriver()
.get_page_analysis()
)
)
Result example:
Result(width=1920, height=1080, url=HttpUrl('http://ecoindex.fr', ), size=549.253, nodes=52, requests=12, grade='A', score=90.0, ges=1.2, water=1.8, ecoindex_version='5.0.0', date=datetime.datetime(2022, 9, 12, 10, 54, 46, 773443), page_type=None)
Default behaviour: By default, the page analysis simulates:
- Window size of 1920x1080 pixels (can be set with parameter
window_size
)- Wait for 1 second when page is loaded (can be set with parameter
wait_before_scroll
)- Scroll to the bottom of the page (if it is possible)
- Wait for 1 second after having scrolled to the bottom of the page (can be set with parameter
wait_after_scroll
)
Get a page analysis and generate a screenshot
It is possible to generate a screenshot of the analyzed page by adding a ScreenShot
property to the EcoindexScraper
object.
You have to define an id (can be a string, but it is recommended to use a unique id) and a path to the screenshot file (if the folder does not exist, it will be created).
import asyncio
from pprint import pprint
from uuid import uuid1
from ecoindex_scraper.models import ScreenShot
from ecoindex_scraper.scrap import EcoindexScraper
pprint(
asyncio.run(
EcoindexScraper(
url="http://www.ecoindex.fr/",
screenshot=ScreenShot(id=str(uuid1()), folder="./screenshots"),
)
.init_chromedriver()
.get_page_analysis()
)
)
Contribute
You need poetry to install and manage dependencies. Once poetry installed, run :
poetry install
Tests
poetry run pytest
Contributing
Code of conduct
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ecoindex_scraper-2.4.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 40097f9998e47ec166f5a8fc5689c4866c6648b0a892bfa61ebd4a8329e37be0 |
|
MD5 | b081ada64310b864497bb23aa215c3d0 |
|
BLAKE2b-256 | 7a6e580e8062dc705d95c939ac7d0b2ea4e90655e72198bd3ad293de597bf6c3 |