Selenium-based web scraper to extract data from E-REDES website and load it into database storage.
Project description
E-REDES Scraper
Description
This is a web scraper that collects data from the E-REDES website and stores it in a database. Since there is no exposed interface to the data, the web scraper is the only approach available to collect it. A high-level of the process is:
- The scraper collects the data from the E-REDES website.
- A file with the energy consumption readings is downloaded.
- The file is parsed and the data is compared to the data in the database to determine if there are new readings.
- If there are new readings, they are stored in the database.
This package supports E-REDES website available at time of writing 23/10/2023. The entrypoint for the scraper is the page https://balcaodigital.e-redes.pt/login.
Installation
The package can be installed using pip:
pip install eredesscraper
Configuration
Usage is based on a YAML configuration file.
A config.yml
is used to specify the credentials for the E-REDES website and [Optionally]
the database connection. Currently, only InfluxDB is supported as a database sink.
Template config.yml
:
eredes:
# eredes credentials
nif: <my-eredes-nif>
pwd: <my-eredes-password>
# CPE to monitor. e.g. PT00############04TW (where # is a digit). CPE can be found in your bill details
cpe: PT00############04TW
influxdb:
# url to InfluxDB. e.g. http://localhost or https://influxdb.my-domain.com
host: http://localhost
# default port is 8086
port: 8086
bucket: <my-influx-bucket>
org: <my-influx-org>
# access token with write access
token: <token>
Usage
Python script:
from eredesscraper.workflows import switchboard
from pathlib import Path
switchboard(name="current_month",
db="influxdb",
config_path=Path("./config.yml"))
CLI:
ers config load "/path/to/config.yml"
ers run
Limitations
Available workflows:
current_month
: Collects the current month consumption.previous_month
: Collects the previous month consumption data.select_month
: Collects the consumption data from an arbitrary month parsed by the user.
Available databases:
influxdb
: Loads the data in an InfluxDB database. (https://docs.influxdata.com/influxdb/v2/get-started/)
Roadmap
-
Add workflow for retrieving previous month data. -
Add workflow for retrieving data form an arbitrary month. -
Build CLI. - Containerize app.
- Documentation.
- Add CI/CD.
- Add logging.
- Add tests.
- Add runtime support for multiple CPEs.
Contributing
Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.
License
See LICENSE file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file eredesscraper-0.1.5.tar.gz
.
File metadata
- Download URL: eredesscraper-0.1.5.tar.gz
- Upload date:
- Size: 45.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.6.1 CPython/3.11.2 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 50ac2be03a77608a5e6f6d1d9941b0e46bf45577d3eb0ee613274e2192b41a71 |
|
MD5 | 2ebbb9fa735c1716d6ac0951764d9072 |
|
BLAKE2b-256 | 4c7694f263825fe2780a05ae053f702b44ae5d0053853b34a66337053855a97c |
File details
Details for the file eredesscraper-0.1.5-py3-none-any.whl
.
File metadata
- Download URL: eredesscraper-0.1.5-py3-none-any.whl
- Upload date:
- Size: 22.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.6.1 CPython/3.11.2 Windows/10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5c3b7c8c4e73b2b5db3978bcbb31616966f704db1edc0be0f9189f7f203c5613 |
|
MD5 | 8755e64af56076771d15b03a467c4706 |
|
BLAKE2b-256 | e7a87c2966a9b1c6c414010fc65347ac2f1ff647de5d9beae3c018247914df63 |