Skip to main content

Scrapes documents of Bundesnetzagentur Beschlusskammer 6 into a structured, git-diffable mirror

Project description

bnetza_bk6_scraper

License: MIT Python Versions (officially) supported PyPI Status Badge Unittests status badge Coverage status badge Linting status badge Formatting status badge

bnetza_bk6_scraper mirrors the documents published by the German Bundesnetzagentur (BNetzA) Beschlusskammer 6 (BK6) into a structured, git-diffable directory tree. BK6 regulates electricity network access and is a constant source of consultations, rulings (Festlegungen) and their attachments. Because the agency publishes these as loose PDFs on HTML pages with no changelog, tracking what changed and when is painful. This tool discovers every BK6 proceeding, downloads its PDFs and a normalized HTML snapshot of each phase page, and records structured metadata. Committing the output to git turns every regulatory update into a reviewable diff.

Installation

pip install bnetza_bk6_scraper

Usage

The package installs a single console command, bnetza-bk6-scraper, with a mirror subcommand:

bnetza-bk6-scraper mirror --target <dir> [--concurrency N] [--year YYYY] [-v]
Option Default Description
--target (required) Output directory (the mirror repository root).
--concurrency 4 Number of parallel HTTP fetches.
--year (all) Restrict the run to a single year, e.g. 2023.
-v, --verbose off Enable debug logging.

Example — mirror only the 2023 proceedings into ./mirror:

bnetza-bk6-scraper mirror --target ./mirror --year 2023 -v

Each run logs a summary such as run summary: 7 proceedings, 16 documents written, 0 failures.

Output layout

Proceedings are written under /{year}/{aktenzeichen}/, with a top-level index.json listing every mirrored proceeding:

<target>/
├── index.json                          # summary of all proceedings
└── 2023/
    └── BK6-23-241/
        ├── metadata.json               # structured proceeding metadata
        ├── BK6-23-241_beschluss.html   # normalized HTML snapshot of a phase page
        ├── BK6-23-241_beschluss_vom_07.05.26.pdf
        ├── BK6-23-241_bilarem.pdf
        └── BK6-23-241_anlage_bilarem.pdf
  • metadata.json captures the Aktenzeichen, year, title, status, Stand (last-modified date), any submission deadline (Frist), the phase pages, and one entry per document (title, type, source URL, filename).
  • The normalized *.html files are trimmed, stable snapshots of the source phase pages so that content changes surface as small diffs.
  • The PDFs are the proceeding's documents, downloaded verbatim.

Change detection is intentionally "dumb": the tool always writes the current state, and git diff in the mirror repository reveals what changed.

Mirror repository

The scraper is designed to feed a separate mirror repository, Hochfrequenz/bnetza_bk6_mirror. A scheduled GitHub Action there will periodically:

pip install bnetza_bk6_scraper
bnetza-bk6-scraper mirror --target .
git add -A && git commit -m "update BK6 mirror"

so that regulatory changes at BK6 become visible as reviewable git diffs and commit history. That Action is future work and does not live in this repository.

WAF / browser User-Agent

The BNetzA website sits behind a Web Application Firewall that rejects non-browser clients by serving a 200 OK "The requested URL was rejected" page instead of the real content. To get through, the scraper sends browser-like User-Agent and Accept headers and treats the rejection page as a retryable error. No credentials or API keys are required.

Contribute

This project uses tox for all quality gates. Create a one-shot development environment with everything installed:

tox -e dev

Individual gates: tox -e tests, tox -e linting, tox -e type_check, tox -e coverage, and tox -e spell_check. Run the full suite with tox.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bnetza_bk6_scraper-0.0.1.tar.gz (32.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bnetza_bk6_scraper-0.0.1-py3-none-any.whl (13.8 kB view details)

Uploaded Python 3

File details

Details for the file bnetza_bk6_scraper-0.0.1.tar.gz.

File metadata

  • Download URL: bnetza_bk6_scraper-0.0.1.tar.gz
  • Upload date:
  • Size: 32.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for bnetza_bk6_scraper-0.0.1.tar.gz
Algorithm Hash digest
SHA256 186510447552dd457afae5d853947142decb84f8af810c73d58f7eac9fe18c21
MD5 ae433e0607c1ab707a5439d2614d3edf
BLAKE2b-256 8526d1471150ad26268850b38cbb56af54ad162845c6e67d26644edf5dfcd224

See more details on using hashes here.

Provenance

The following attestation bundles were made for bnetza_bk6_scraper-0.0.1.tar.gz:

Publisher: python-publish.yml on Hochfrequenz/bnetza_bk6_scraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bnetza_bk6_scraper-0.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for bnetza_bk6_scraper-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 45b497de67a24fa633fb1cad104869a6b0e8f262ad6b210d222c4a53e0816c4b
MD5 9b5185dfe3bde9696a43a0213f0e47c6
BLAKE2b-256 213a2b36fadfa4d35a802d9adcc6c4f29362c236dfa5e53aa03ac94aeeae534c

See more details on using hashes here.

Provenance

The following attestation bundles were made for bnetza_bk6_scraper-0.0.1-py3-none-any.whl:

Publisher: python-publish.yml on Hochfrequenz/bnetza_bk6_scraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page