Skip to main content

A scraper to library to scrape .docx files with 'Entscheidungsbaumdiagramm' tables into a truely machine readable structure

Project description

ebdamame

License: MIT Python Versions (officially) supported Unittests status badge Coverage status badge Linting status badge Formatting status badge PyPi Status Badge

🇩🇪 Dieses Repository enthält ein Python-Paket namens ebdamame (früher: ebddocx2table), das genutzt werden kann, um aus .docx-Dateien maschinenlesbare Tabellen, die einen Entscheidungsbaum (EBD) modellieren, zu extrahieren (scrapen). Diese Entscheidungsbäume sind Teil eines regulatorischen Regelwerks für die deutsche Energiewirtschaft und kommen in der Eingangsprüfung der Marktkommunikation zum Einsatz. Die mit diesem Paket erstellten maschinenlesbaren Tabellen können mit rebdhuhn (früher: ebdtable2graph) in echte Graphen und Diagramme umgewandelt werden. Exemplarische Ergebnisse des Scrapings finden sich als .json-Dateien im Repository machine-readable_entscheidungsbaumdiagramme.

🇬🇧 This repository contains the source code of the Python package ebdamame (formerly published as ebddocx2table).

Rationale

Assume that you want to analyse or visualize the Entscheidungsbaumdiagramme (EBD) by EDI@Energy. The website edi-energy.de, as always, only provides you with PDF or Word files instead of really digitized data.

The package ebdamame scrapes the .docx files and returns data in a model defined in the "sister" package rebdhuhn (formerly known as ebdtable2graph).

Once you scraped the data (using this package) you can plot it with rebdhuhn.

How to use the package

In any case, install the repo from PyPI:

pip install ebdamame

Use as a library

import json
from pathlib import Path

import cattrs

from ebdamame import TableNotFoundError, get_all_ebd_keys, get_ebd_docx_tables  # type:ignore[import]
from ebdamame.docxtableconverter import DocxTableConverter  # type:ignore[import]

docx_file_path = Path("unittests/test_data/ebd20230629_v34.docx")
# download this .docx File from edi-energy.de or find it in the unittests of this repository.
# https://github.com/Hochfrequenz/ebddocx2table/blob/main/unittests/test_data/ebd20230629_v34.docx
docx_tables = get_ebd_docx_tables(docx_file_path, ebd_key="E_0003")
converter = DocxTableConverter(
    docx_tables,
    ebd_key="E_0003",
    chapter="MaBiS",
    sub_chapter="7.42.1: AD: Bestellung der Aggregationsebene der Bilanzkreissummenzeitreihe auf Ebene der Regelzone",
)
result = converter.convert_docx_tables_to_ebd_table()
with open(Path("E_0003.json"), "w+", encoding="utf-8") as result_file:
    # the result file can be found here:
    # https://github.com/Hochfrequenz/machine-readable_entscheidungsbaumdiagramme/tree/main/FV2310
    json.dump(cattrs.unstructure(result), result_file, ensure_ascii=False, indent=2, sort_keys=True)

Use as a CLI tool

to be written

How to use this Repository on Your Machine (for development)

Please follow the instructions in our Python Template Repository. And for further information, see the Tox Repository.

Contribute

You are very welcome to contribute to this template repository by opening a pull request against the main branch.

Related Tools and Context

This repository is part of the Hochfrequenz Libraries and Tools for a truly digitized market communication.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ebdamame-0.2.0.tar.gz (29.7 kB view details)

Uploaded Source

Built Distribution

ebdamame-0.2.0-py3-none-any.whl (25.0 kB view details)

Uploaded Python 3

File details

Details for the file ebdamame-0.2.0.tar.gz.

File metadata

  • Download URL: ebdamame-0.2.0.tar.gz
  • Upload date:
  • Size: 29.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for ebdamame-0.2.0.tar.gz
Algorithm Hash digest
SHA256 71a5b1bed69af602b5885b848c9a294f1a3cd8d881d96d5780d6304bd7574e3d
MD5 3c61a46c62b4c6091cb2c56164b89057
BLAKE2b-256 6513a191295342f837e3b0d86cd206028babbb500d5786208ce605b7432c06f8

See more details on using hashes here.

File details

Details for the file ebdamame-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: ebdamame-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 25.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for ebdamame-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e7720447ed5ba10a94a298f028dcb6bb57ab1194922bb12ad5e34bfcaa60957b
MD5 6c43e430a3c7ebaea746f45d5161d990
BLAKE2b-256 c4e7103b7fde6ad3e763fc94cc395dcdb980ed664e2f1cb02f21e2f70a5e2fa0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page