Skip to main content

Scrapes documents of Bundesnetzagentur Beschlusskammer 6 into a structured, git-diffable mirror

Project description

bnetza_bk6_scraper

License: MIT Python Versions (officially) supported PyPI Status Badge Unittests status badge Coverage status badge Linting status badge Formatting status badge

bnetza_bk6_scraper mirrors the documents published by the German Bundesnetzagentur (BNetzA) Beschlusskammer 6 (BK6) into a structured, git-diffable directory tree. BK6 regulates electricity network access and is a constant source of consultations, rulings (Festlegungen) and their attachments. Because the agency publishes these as loose PDFs on HTML pages with no changelog, tracking what changed and when is painful. This tool discovers every BK6 proceeding, downloads its PDFs and a normalized HTML snapshot of each phase page, and records structured metadata. Committing the output to git turns every regulatory update into a reviewable diff.

Installation

pip install bnetza_bk6_scraper

Usage

The package installs a single console command, bnetza-bk6-scraper, with a mirror subcommand:

bnetza-bk6-scraper mirror --target <dir> [--concurrency N] [--year YYYY] [-v]
Option Default Description
--target (required) Output directory (the mirror repository root).
--concurrency 4 Number of parallel HTTP fetches.
--year (all) Restrict the run to a single year, e.g. 2023.
-v, --verbose off Enable debug logging.

Example — mirror only the 2023 proceedings into ./mirror:

bnetza-bk6-scraper mirror --target ./mirror --year 2023 -v

Each run logs a summary such as run summary: 7 proceedings, 16 documents written, 0 failures.

Output layout

Proceedings are written under /{year}/{aktenzeichen}/, with a top-level index.json listing every mirrored proceeding:

<target>/
├── index.json                          # summary of all proceedings
└── 2023/
    └── BK6-23-241/
        ├── metadata.json               # structured proceeding metadata
        ├── BK6-23-241_beschluss.html   # normalized HTML snapshot of a phase page
        ├── BK6-23-241_beschluss_vom_07.05.26.pdf
        ├── BK6-23-241_bilarem.pdf
        └── BK6-23-241_anlage_bilarem.pdf
  • metadata.json captures the Aktenzeichen, year, title, status, Stand (last-modified date), any submission deadline (Frist), the phase pages, and one entry per document (title, type, source URL, filename).
  • The normalized *.html files are trimmed, stable snapshots of the source phase pages so that content changes surface as small diffs.
  • The PDFs are the proceeding's documents, downloaded verbatim.

Change detection is intentionally "dumb": the tool always writes the current state, and git diff in the mirror repository reveals what changed.

Mirror repository

The scraper is designed to feed a separate mirror repository, Hochfrequenz/bnetza_bk6_mirror. A scheduled GitHub Action there will periodically:

pip install bnetza_bk6_scraper
bnetza-bk6-scraper mirror --target .
git add -A && git commit -m "update BK6 mirror"

so that regulatory changes at BK6 become visible as reviewable git diffs and commit history. That Action is future work and does not live in this repository.

WAF / browser User-Agent

The BNetzA website sits behind a Web Application Firewall that rejects non-browser clients by serving a 200 OK "The requested URL was rejected" page instead of the real content. To get through, the scraper sends browser-like User-Agent and Accept headers and treats the rejection page as a retryable error. No credentials or API keys are required.

Contribute

This project uses tox for all quality gates. Create a one-shot development environment with everything installed:

tox -e dev

Individual gates: tox -e tests, tox -e linting, tox -e type_check, tox -e coverage, and tox -e spell_check. Run the full suite with tox.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bnetza_bk6_scraper-0.0.2.tar.gz (32.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bnetza_bk6_scraper-0.0.2-py3-none-any.whl (13.9 kB view details)

Uploaded Python 3

File details

Details for the file bnetza_bk6_scraper-0.0.2.tar.gz.

File metadata

  • Download URL: bnetza_bk6_scraper-0.0.2.tar.gz
  • Upload date:
  • Size: 32.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for bnetza_bk6_scraper-0.0.2.tar.gz
Algorithm Hash digest
SHA256 84cf038db85eb3df9dcbf78bbc77c30c7a742ec3575da1b92207fe5baf34f07e
MD5 38b614e52d82d8482aaded15b4d9d951
BLAKE2b-256 4c4af7e0014f6dd1676ec98cb27da5d34cac4bf3386c25e56a310aad6abc0a51

See more details on using hashes here.

Provenance

The following attestation bundles were made for bnetza_bk6_scraper-0.0.2.tar.gz:

Publisher: python-publish.yml on Hochfrequenz/bnetza_bk6_scraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file bnetza_bk6_scraper-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for bnetza_bk6_scraper-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 71f28cb13b3501e50757cdd4f20b09bf0b59cce890e779b1ccd2bb06c646cc63
MD5 36f5cd6d17d3e4e5368ad687647d0a33
BLAKE2b-256 e36fb4f086afac03fb848fb86313421115ced9bef5a403dde2db142f38d2a384

See more details on using hashes here.

Provenance

The following attestation bundles were made for bnetza_bk6_scraper-0.0.2-py3-none-any.whl:

Publisher: python-publish.yml on Hochfrequenz/bnetza_bk6_scraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page