Skip to main content

Selenium-based web scraper to extract data from E-REDES website and load it into database storage.

Project description

E-REDES Scraper

Description

This is a web scraper that collects data from the E-REDES website and can upload it to a database. Since there is no exposed interface to the data, this web scraper was developed as approach to collect it programatically. A high-level of the process is:

  1. The scraper collects the data from the E-REDES website.
  2. A file with the energy consumption readings is downloaded.
  3. [ Optional ] The file is parsed and the data is uploaded to the selected database.
  4. [ Optional ] A feature supporting only the insertion of "deltas" is available.

This package supports E-REDES website available at time of writing 23/10/2023. The entrypoint for the scraper is the page https://balcaodigital.e-redes.pt/login.

Installation

The package can be installed using pip:

pip install eredesscraper

Configuration

Usage is based on a YAML configuration file.
config.yml holds the credentials for the E-REDES website and the database connection. Currently, only InfluxDB is supported as a database sink.

Template config.yml:

eredes:
  # eredes credentials
  nif: <my-eredes-nif>
  pwd: <my-eredes-password>
  # CPE to monitor. e.g. PT00############04TW (where # is a digit). CPE can be found in your bill details
  cpe: PT00############04TW


influxdb:
  # url to InfluxDB.  e.g. http://localhost or https://influxdb.my-domain.com
  host: http://localhost
  # default port is 8086
  port: 8086
  bucket: <my-influx-bucket>
  org: <my-influx-org>
  # access token with write access
  token: <token>

Usage

CLI:

ers config load "/path/to/config.yml"

# get current month readings
ers run -d influxdb

# get only deltas from last month readings 
ers run -w previous -d influxdb --delta

# get readings from May
ers run -w select -d influxdb -m 5

Docker:

# get readings from May

# docker args
docker run --rm -v config.yml:/config.yml \
  # latest `ers` image
  ghcr.io/rf-santos/eredesscraper:latest \
  # calling `ers` 
  ers run -w current -d influxdb

Python:

from eredesscraper.workflows import switchboard
from pathlib import Path

# get deltas from current month readings
switchboard(name="current",
            db="influxdb",
            config_path=Path("./config.yml")y
            delta=True)

# get readings from May 2023
switchboard(name="select",
            db="influxdb",
            config_path=Path("./config.yml"),
            month=5,
            year=2023)

Features

Available workflows:

  • current: Collects the current month consumption.
  • previous: Collects the previous month consumption data.
  • select: Collects the consumption data from an arbitrary month parsed by the user.

Available databases:

Roadmap

  • Add workflow for retrieving previous month data.
  • Add workflow for retrieving data form an arbitrary month.
  • Build CLI.
  • Containerize app.
  • Documentation.
  • Add CI/CD.
  • Add logging.
  • Add tests.
  • Add runtime support for multiple CPEs.

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

License

See LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

eredesscraper-0.1.9.tar.gz (43.8 kB view details)

Uploaded Source

Built Distribution

eredesscraper-0.1.9-py3-none-any.whl (23.0 kB view details)

Uploaded Python 3

File details

Details for the file eredesscraper-0.1.9.tar.gz.

File metadata

  • Download URL: eredesscraper-0.1.9.tar.gz
  • Upload date:
  • Size: 43.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: poetry/1.7.1 CPython/3.11.7 Linux/6.2.0-1018-azure

File hashes

Hashes for eredesscraper-0.1.9.tar.gz
Algorithm Hash digest
SHA256 5b35d39ca356b9d43e073a151d26cc5e3aaafb13c3a982e6b8e9386be80338e3
MD5 428f7234fb78029d1475932fcbbad4f4
BLAKE2b-256 8e12b396127246791a8411b6d5dfaff58a4be5b80ca9b67aca94ab2a514b9196

See more details on using hashes here.

File details

Details for the file eredesscraper-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: eredesscraper-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 23.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: poetry/1.7.1 CPython/3.11.7 Linux/6.2.0-1018-azure

File hashes

Hashes for eredesscraper-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 6c83f9da42cf5e50fae8bb0d545ec2f723d15313eca30b1f7d8ebf42a83f9147
MD5 6a5843cc6137417ac7b461d97420ad05
BLAKE2b-256 94a1c97036a69e5a7253d55e93264211705c9b1d42f8870b52102d93a12d0851

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page