Ecoindex_scraper module provides a way to scrape data from given website while simulating a real web browser
Project description
ECOINDEX SCRAPER PYTHON
This module provides a simple interface to get the Ecoindex of a given webpage using module ecoindex-python
Requirements
- Python ^3.10 with pip
- Google Chrome installed on your computer
Install
pip install ecoindex-scraper
Use
Get a page analysis
You can run a page analysis by calling the function get_page_analysis()
:
(function) get_page_analysis: (url: HttpUrl, window_size: WindowSize | None = WindowSize(width=1920, height=1080), wait_before_scroll: int | None = 1, wait_after_scroll: int | None = 1) -> Coroutine[Any, Any, Result]
Example:
import asyncio
from pprint import pprint
from ecoindex_scraper.scrap import EcoindexScraper
pprint(
asyncio.run(
EcoindexScraper(url="http://ecoindex.fr")
.init_chromedriver()
.get_page_analysis()
)
)
Result example:
Result(width=1920, height=1080, url=HttpUrl('http://ecoindex.fr', ), size=549.253, nodes=52, requests=12, grade='A', score=90.0, ges=1.2, water=1.8, ecoindex_version='5.0.0', date=datetime.datetime(2022, 9, 12, 10, 54, 46, 773443), page_type=None)
Default behaviour: By default, the page analysis simulates:
- Window size of 1920x1080 pixels (can be set with parameter
window_size
)- Wait for 1 second when page is loaded (can be set with parameter
wait_before_scroll
)- Scroll to the bottom of the page (if it is possible)
- Wait for 1 second after having scrolled to the bottom of the page (can be set with parameter
wait_after_scroll
)
Get a page analysis and generate a screenshot
It is possible to generate a screenshot of the analyzed page by adding a ScreenShot
property to the EcoindexScraper
object.
You have to define an id (can be a string, but it is recommended to use a unique id) and a path to the screenshot file (if the folder does not exist, it will be created).
import asyncio
from pprint import pprint
from uuid import uuid1
from ecoindex_scraper.models import ScreenShot
from ecoindex_scraper.scrap import EcoindexScraper
pprint(
asyncio.run(
EcoindexScraper(
url="http://www.ecoindex.fr/",
screenshot=ScreenShot(id=str(uuid1()), folder="./screenshots"),
)
.init_chromedriver()
.get_page_analysis()
)
)
Contribute
You need poetry to install and manage dependencies. Once poetry installed, run :
poetry install
Tests
poetry run pytest
Disclaimer
The LCA values used by ecoindex_scraper to evaluate environmental impacts are not under free license - ©Frédéric Bordage Please also refer to the mentions provided in the code files for specifics on the IP regime.
License
Contributing
Code of conduct
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for ecoindex_scraper-2.7.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 890dfeba66c710b17deb508e072dbc25fd1401460062085c2c86769b73c2ea73 |
|
MD5 | 09b0c9f009ad8b8deb9e356f4f2a151a |
|
BLAKE2b-256 | 8545e19aa8edf626a391f9816ac74ff031e9ffc85fea712c9212ff59d5e11fde |