Skip to main content

a scraper to mirror edi-energy.de

Project description

edi-energy.de scraper

License: MIT Unittests status badge Coverage status badge Linting status badge Black status badge PyPi Status Badge Python Versions (officially) supported

The Python package edi_energy_scraper provides easy to use methods to mirror the free documents on bdew-mako.de.

Rationale / Why?

If you'd like to be informed about new regulations or data formats being published on bdew-mako.de you can either

  • visit the site every day and hope that you see the changes if this is your favourite hobby,
  • or automate the task.

This repository helps you with the latter. It allows you to create an up-to-date copy of edi-energy.de on your local computer. Other than if you mirrored the files using wget or curl, you'll get a clean and intuitive directory structure.

From there you can e.g. commit the files into a VCS (like e.g. our edi_energy_mirror), scrape the PDF/Word files for later use...

We're all hoping for the day of true digitization on which this repository will become obsolete.

See also

There is a similar project in C# by Fabian Wetzel: fabsenet/edi-energy-extracto. Other than this project, it stores the downloaded data in a database instead of a file system. It also works with bdew-mako.de.

How to use the Package (as a user)

Install via pip:

pip install edi_energy_scraper

Create a directory in which you'd like to save the mirrored data:

mkdir edi_energy_de

Then import it and start the download:

import asyncio
from edi_energy_scraper import EdiEnergyScraper


# add the following lines to enable debug logging to stdout (CLI)
# import logging
# import sys
# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)

async def mirror():
    scraper = EdiEnergyScraper(path_to_mirror_directory="edi_energy_de")
    await scraper.mirror()


if __name__ == "__main__":
    loop = asyncio.new_event_loop()
    asyncio.set_event_loop(loop)
    asyncio.run(mirror())

This creates a directory structure:

-|-your_script_cwd.py
 |-edi_energy_de
    |- FV2310 (contains files valid since 2023-10-01)
        |- ahb.pdf
        |- ahb.docx
        |- ...
    |- FV2404 (contains files valid since 2024-04-03)
        |- mig.pdf
        |- mig.docx
        |- ...
    |- FV2504 (contains files valid since 2025-06-06)
        |- allgemeine_festlegungen.pdf
        |- schema.xsd
        |- ...

[!TIP] You can extract the information encoded into the filenames:

from edi_energy_scraper import DocumentMetadata
structured_information = DocumentMetadata.from_filename("AHB_COMDIS_1.0f_99991231_20250605_20250605_8872.pdf")
# DocumentMetadata(kind='MIG', edifact_format=<EdifactFormat.REQOTE: 'REQOTE'>, valid_from=datetime.date(2023, 9, 30), valid_unt...traordinary_publication=True, is_error_correction=False, is_informational_reading_version=True, additional_text=None, id=10071)

## How to use this Repository on Your Machine (for development)

Please follow the instructions in
our [Python Template Repository](https://github.com/Hochfrequenz/python_template_repository#how-to-use-this-repository-on-your-machine)
. And for further information, see the [Tox Repository](https://github.com/tox-dev/tox).

## Contribute

You are very welcome to contribute to this template repository by opening a pull request against the main branch.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

edi_energy_scraper-2.1.0.tar.gz (19.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

edi_energy_scraper-2.1.0-py3-none-any.whl (14.3 kB view details)

Uploaded Python 3

File details

Details for the file edi_energy_scraper-2.1.0.tar.gz.

File metadata

  • Download URL: edi_energy_scraper-2.1.0.tar.gz
  • Upload date:
  • Size: 19.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for edi_energy_scraper-2.1.0.tar.gz
Algorithm Hash digest
SHA256 3443b2015b62941aea122fda5c5e9fa9f234c74a4d1d54e91d0659632ccb73b5
MD5 d8fd8208ea8da11d847a092673c8225a
BLAKE2b-256 c3f5448e68e5f582702b7e8893f98879b588b74ea931fdc8a5dc67528d715da4

See more details on using hashes here.

Provenance

The following attestation bundles were made for edi_energy_scraper-2.1.0.tar.gz:

Publisher: python-publish.yml on Hochfrequenz/edi_energy_scraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file edi_energy_scraper-2.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for edi_energy_scraper-2.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e73cf2f2c8425a9bc195bd030feaebdc242a6922c7b065d9fd4703b1ffd31bac
MD5 780743818ad78939aafb99f66a2ba0f3
BLAKE2b-256 0ee8ba9ceb24f19dfe46b1e8d3059c69a3160d1dc723bb7f07efa1fc5a305fa8

See more details on using hashes here.

Provenance

The following attestation bundles were made for edi_energy_scraper-2.1.0-py3-none-any.whl:

Publisher: python-publish.yml on Hochfrequenz/edi_energy_scraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page