Skip to main content

A scraper to library to scrape .docx files with 'Entscheidungsbaumdiagramm' tables into a truely machine readable structure

Project description

ebdamame

License: GPL Python Versions (officially) supported Unittests status badge Coverage status badge Linting status badge Formatting status badge PyPi Status Badge

🇩🇪 Dieses Repository enthält ein Python-Paket namens ebdamame (früher: ebddocx2table), das genutzt werden kann, um aus .docx-Dateien maschinenlesbare Tabellen, die einen Entscheidungsbaum (EBD) modellieren, zu extrahieren (scrapen). Diese Entscheidungsbäume sind Teil eines regulatorischen Regelwerks für die deutsche Energiewirtschaft und kommen in der Eingangsprüfung der Marktkommunikation zum Einsatz. Die mit diesem Paket erstellten maschinenlesbaren Tabellen können mit rebdhuhn (früher: ebdtable2graph) in echte Graphen und Diagramme umgewandelt werden. Exemplarische Ergebnisse des Scrapings finden sich als .json-Dateien im Repository machine-readable_entscheidungsbaumdiagramme.

🇬🇧 This repository contains the source code of the Python package ebdamame (formerly published as ebddocx2table).

Rationale

Assume that you want to analyse or visualize the Entscheidungsbaumdiagramme (EBD) by EDI@Energy. The website edi-energy.de, as always, only provides you with PDF or Word files instead of really digitized data.

The package ebdamame scrapes the .docx files and returns data in a model defined in the "sister" package rebdhuhn (formerly known as ebdtable2graph).

Once you scraped the data (using this package) you can plot it with rebdhuhn. Both packages together form the ebd_toolchain which scrapes EBD.docx files from the edi_energy_mirror and pushes them to machine_readable-entscheidungsbaumdiagramme.

How to use the package

In any case, install the repo from PyPI:

pip install ebdamame

Use as a library

import json
from pathlib import Path

from ebdamame import get_ebd_docx_tables
from ebdamame.docxtableconverter import DocxTableConverter

docx_file_path = Path("unittests/test_data/ebd20230629_v34.docx")
# download this .docx File from edi-energy.de or find it in the unittests of this repository.
# https://github.com/Hochfrequenz/ebddocx2table/blob/main/unittests/test_data/ebd20230629_v34.docx
docx_tables = get_ebd_docx_tables(docx_file_path, ebd_key="E_0003")
converter = DocxTableConverter(
    docx_tables,
    ebd_key="E_0003",
    ebd_name="E_0003_Bestellung der Aggregationsebene RZ prüfen",
    chapter="MaBiS",
    section="7.42.1"
)
result = converter.convert_docx_tables_to_ebd_table()
with open(Path("E_0003.json"), "w+", encoding="utf-8") as result_file:
    # the result file can be found here:
    # https://github.com/Hochfrequenz/machine-readable_entscheidungsbaumdiagramme/tree/main/FV2310
    json.dump(result.model_dump(), result_file, ensure_ascii=False, indent=2, sort_keys=True)

Use as a CLI tool

to be written

How to use this Repository on Your Machine (for development)

Please follow the instructions in our Python Template Repository. And for further information, see the Tox Repository.

Contribute

You are very welcome to contribute to this template repository by opening a pull request against the main branch.

Related Tools and Context

This repository is part of the Hochfrequenz Libraries and Tools for a truly digitized market communication.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ebdamame-1.0.0.tar.gz (34.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ebdamame-1.0.0-py3-none-any.whl (30.7 kB view details)

Uploaded Python 3

File details

Details for the file ebdamame-1.0.0.tar.gz.

File metadata

  • Download URL: ebdamame-1.0.0.tar.gz
  • Upload date:
  • Size: 34.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ebdamame-1.0.0.tar.gz
Algorithm Hash digest
SHA256 121500162543d483d6243cd9366b1ea1f8533fb53f1083d5a2119dd5160b0302
MD5 1779fbe8bb4b2033ab9357a5828631fa
BLAKE2b-256 e5e110b8d89b508b72ea484e63ff3c6be3644f5a900fd42d542e6c5160fdf98a

See more details on using hashes here.

Provenance

The following attestation bundles were made for ebdamame-1.0.0.tar.gz:

Publisher: python-publish.yml on Hochfrequenz/ebdamame

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ebdamame-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: ebdamame-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 30.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for ebdamame-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ec59eea5c22790bbe62f879860f8582807131f983ca5ef307ad5fc8672fe9f9f
MD5 d417cc5452eeeeb12f6d01163f161cef
BLAKE2b-256 0201eb8f7153c66382ba074c5e22bdec6350648bc806a6862add81d230040488

See more details on using hashes here.

Provenance

The following attestation bundles were made for ebdamame-1.0.0-py3-none-any.whl:

Publisher: python-publish.yml on Hochfrequenz/ebdamame

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page