Skip to main content

Converts (already scraped) Entscheidungsbaumdiagramm tables to real graphs

Project description

rebdhuhn

License: GPL Python Versions (officially) supported Unittests status badge Coverage status badge Linting status badge Formatting status badge PyPi Status Badge

🇩🇪 Dieses Repository enthält ein Python-Paket namens rebdhuhn, das genutzt werden kann, um aus .docx-Dateien extrahierte maschinenlesbare Tabellen, die einen Entscheidungsbaum (EBD) modellieren, in echte Graphen zu konvertieren. Diese Entscheidungsbäume sind Teil eines regulatorischen Regelwerks für die deutsche Energiewirtschaft und kommen in der Eingangsprüfung der Marktkommunikation zum Einsatz.

🇬🇧 This repository contains the source code of the Python package rebdhuhn.

Rationale

Assume, that you scraped the Entscheidungsbaumdiagramm tables by EDI@Energy from their somewhat "digitized" PDF/DOCX files. (To do so, you can use the package ebdamame.) Also assume, that the result of your scraping is a rebdhuhn.models.EbdTable.

The package rebdhuhn contains logic to convert your scraped data into a graph. This graph can then be exported e.g. as SVG and/or UML. ebdamame and rebdhuhn combined are the core of our ebd_toolchain which scrapes EBD.docx files from the edi_energy_mirror and pushes them to machine_readable-entscheidungsbaumdiagramme.

How to use rebdhuhn?

Install the package from pypi:

pip install rebdhuhn

Create an Instance of EbdTable

EbdTable contains the raw data by BDEW in a machine-readable format. Creating instances of EbdTable is out of scope for this package. Ask Hochfrequenz for support on this topic. In the following example we hard code the information.

from rebdhuhn.graph_conversion import convert_table_to_graph
from rebdhuhn.models import EbdCheckResult, EbdTable, EbdTableMetaData, EbdTableRow, EbdTableSubRow, EbdGraph

ebd_table: EbdTable  # this is the result of scraping the docx file
ebd_table = EbdTable(  # this data shouldn't be handwritten
    metadata=EbdTableMetaData(
        ebd_code="E_0003",
        chapter="MaBiS",
        section="7.39 AD: Bestellung der Aggregationsebene der Bilanzkreissummenzeitreihe auf Ebene der Regelzone",
        ebd_name="Bestellung der Aggregationsebene RZ prüfen",
        role="ÜNB",
    ),
    rows=[
        EbdTableRow(
            step_number="1",
            description="Erfolgt der Eingang der Bestellung fristgerecht?",
            sub_rows=[
                EbdTableSubRow(
                    check_result=EbdCheckResult(result=False, subsequent_step_number=None),
                    result_code="A01",
                    note="Fristüberschreitung",
                ),
                EbdTableSubRow(
                    check_result=EbdCheckResult(result=True, subsequent_step_number="2"),
                    result_code=None,
                    note=None,
                ),
            ],
        ),
        EbdTableRow(
            step_number="2",
            description="Erfolgt die Bestellung zum Monatsersten 00:00 Uhr?",
            sub_rows=[
                EbdTableSubRow(
                    check_result=EbdCheckResult(result=False, subsequent_step_number=None),
                    result_code="A02",
                    note="Gewählter Zeitpunkt nicht zulässig",
                ),
                EbdTableSubRow(
                    check_result=EbdCheckResult(result=True, subsequent_step_number="Ende"),
                    result_code=None,
                    note=None,
                ),
            ],
        ),
    ],
)
assert isinstance(ebd_table, EbdTable)

ebd_graph = convert_table_to_graph(ebd_table)
assert isinstance(ebd_graph, EbdGraph)

Export as PlantUML

from rebdhuhn import convert_graph_to_plantuml

plantuml_code = convert_graph_to_plantuml(ebd_graph)
with open("e_0003.puml", "w+", encoding="utf-8") as uml_file:
    uml_file.write(plantuml_code)

The file e_0003.puml now looks like this:

@startuml
...
if (<b>1: </b> Erfolgt der Eingang der Bestellung fristgerecht?) then (ja)
else (nein)
    :A01;
    note left
        Fristüberschreitung
    endnote
    kill;
endif
if (<b>2: </b> Erfolgt die Bestellung zum Monatsersten 00:00 Uhr?) then (ja)
    end
else (nein)
    :A02;
    note left
        Gewählter Zeitpunkt nicht zulässig
    endnote
    kill;
endif
@enduml

Export the graph as SVG

To export the graph as SVG, you need a Kroki instance. You can either:

  • Use the public instance at https://kroki.io
  • Run a local instance via Docker: docker run -p 8125:8000 yuzutech/kroki:0.24.1

Then use

from rebdhuhn import convert_plantuml_to_svg_kroki
from rebdhuhn.kroki import Kroki

kroki_client = Kroki()
svg_code = convert_plantuml_to_svg_kroki(plantuml_code, kroki_client)
with open("e_0003.svg", "w+", encoding="utf-8") as svg_file:
    svg_file.write(svg_code)

Error Handling

rebdhuhn provides three base exception classes to help you distinguish between errors in different pipeline stages:

Exception Pipeline Stage Description
GraphConversionError table → graph Errors during table-to-graph conversion. Affects both SVG and PlantUML.
PlantumlConversionError graph → puml Errors specific to PlantUML generation.
SvgConversionError graph → dot → svg Errors specific to SVG/DOT generation via Kroki.

This allows you to handle PlantUML failures gracefully while still generating SVG output:

from rebdhuhn import (
    convert_table_to_graph,
    convert_graph_to_plantuml,
    convert_graph_to_dot,
    convert_dot_to_svg_kroki,
    GraphConversionError,
    PlantumlConversionError,
    SvgConversionError,
)
from rebdhuhn.kroki import Kroki

# ebd_table is an instance of EbdTable (see above for how to create one)
kroki_client = Kroki()  # requires a running Kroki instance

try:
    graph = convert_table_to_graph(ebd_table)
except GraphConversionError:
    # Table-to-graph conversion failed - neither SVG nor PlantUML will work
    raise

# SVG generation (primary)
try:
    dot_code = convert_graph_to_dot(graph)
    svg = convert_dot_to_svg_kroki(dot_code, kroki_client)
except SvgConversionError:
    print("SVG generation failed")

# PlantUML generation (secondary)
try:
    puml_code = convert_graph_to_plantuml(graph)
except PlantumlConversionError:
    print("PlantUML generation failed (non-critical)")

How to use this Repository on Your Machine (for development)

Please follow the instructions in our Python Template Repository . And for further information, see the Tox Repository.

Running Tests

Tests use testcontainers to automatically start a Kroki instance when needed. Make sure Docker is installed and running. Tests that require Kroki will be skipped if Docker is not available.

Contribute

You are very welcome to contribute to this template repository by opening a pull request against the main branch.

Related Tools and Context

This repository is part of the Hochfrequenz Libraries and Tools for a truly digitized market communication.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rebdhuhn-1.0.1.tar.gz (66.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rebdhuhn-1.0.1-py3-none-any.whl (59.6 kB view details)

Uploaded Python 3

File details

Details for the file rebdhuhn-1.0.1.tar.gz.

File metadata

  • Download URL: rebdhuhn-1.0.1.tar.gz
  • Upload date:
  • Size: 66.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rebdhuhn-1.0.1.tar.gz
Algorithm Hash digest
SHA256 91b16ef06e6e7d5ebfc9e0396cd192726e2d1456429e0dad79d41fce7710ed7a
MD5 6e397eb14c721fc8b0cab867b5121b6e
BLAKE2b-256 390d14a23fd4e73f2f4bfa2f4cefedaf01cbeda377cbf93cda3964c3f9a8318b

See more details on using hashes here.

Provenance

The following attestation bundles were made for rebdhuhn-1.0.1.tar.gz:

Publisher: python-publish.yml on Hochfrequenz/rebdhuhn

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rebdhuhn-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: rebdhuhn-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 59.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for rebdhuhn-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 de2efa469f85a34b8324d8af74c68b49244e7ebb1c9d49a02003a0e660e8ed5c
MD5 3832ca5a22387f2926b0839d1c3876d1
BLAKE2b-256 8d199be0cbd991e8c334b79153df5d0511762b9aaca9841a83f554d9dea6f5c4

See more details on using hashes here.

Provenance

The following attestation bundles were made for rebdhuhn-1.0.1-py3-none-any.whl:

Publisher: python-publish.yml on Hochfrequenz/rebdhuhn

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page