Scrapes documents of Bundesnetzagentur Beschlusskammer 6 into a structured, git-diffable mirror

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

bnetza_bk6_scraper

Python Versions (officially) supported PyPI Status Badge Unittests status badge Coverage status badge Linting status badge Formatting status badge

bnetza_bk6_scraper mirrors the documents published by the German Bundesnetzagentur (BNetzA) Beschlusskammer 6 (BK6) into a structured, git-diffable directory tree. BK6 regulates electricity network access and is a constant source of consultations, rulings (Festlegungen) and their attachments. Because the agency publishes these as loose PDFs on HTML pages with no changelog, tracking what changed and when is painful. This tool discovers every BK6 proceeding, downloads its PDFs and a normalized HTML snapshot of each phase page, and records structured metadata. Committing the output to git turns every regulatory update into a reviewable diff.

Installation

pip install bnetza_bk6_scraper

Usage

The package installs a single console command, bnetza-bk6-scraper, with a mirror subcommand:

bnetza-bk6-scraper mirror --target <dir> [--concurrency N] [--year YYYY] [-v]

Option	Default	Description
`--target`	(required)	Output directory (the mirror repository root).
`--concurrency`	`4`	Number of parallel HTTP fetches.
`--year`	(all)	Restrict the run to a single year, e.g. `2023`.
`-v`, `--verbose`	off	Enable debug logging.

Example — mirror only the 2023 proceedings into ./mirror:

bnetza-bk6-scraper mirror --target ./mirror --year 2023 -v

Each run logs a summary such as run summary: 7 proceedings, 16 documents written, 0 failures.

Scoped mirroring (Python API)

The mirror CLI mirrors all proceedings. To mirror only a curated subset (e.g. the electricity GPKE/WiM/MaBiS Prozessdokumente), use the Python API: give the scraper a list of seed pages to crawl and a list of predicates (OR semantics — a document is downloaded if any predicate returns True). Predicates receive a CandidateDocument.

import asyncio
from bnetza_bk6_scraper import BnetzaBk6Scraper, CandidateDocument

GPKE = "https://www.bundesnetzagentur.de/DE/Beschlusskammern/BK06/BK6_83_Zug_Mess/831_gpke/gpke_node.html"

def is_prozessdokument_lesefassung(c: CandidateDocument) -> bool:
    name = c.filename.lower()
    return any(fw in name for fw in ("_gpke_", "_wim_", "_mabis_")) and "lesefassung" in name

asyncio.run(
    BnetzaBk6Scraper().mirror_seeds(
        target_dir=".", seeds=[GPKE], keep=[is_prozessdokument_lesefassung]
    )
)

Documents that carry an Aktenzeichen are written under <year>/<aktenzeichen>/; documents without one (e.g. the PID-Liste in the Datenformate tree) go under _other/…. A root manifest.json lists the kept documents.

Output layout

Proceedings are written under /{year}/{aktenzeichen}/, with a top-level index.json listing every mirrored proceeding:

<target>/
├── index.json                          # summary of all proceedings
└── 2023/
    └── BK6-23-241/
        ├── metadata.json               # structured proceeding metadata
        ├── BK6-23-241_beschluss.html   # normalized HTML snapshot of a phase page
        ├── BK6-23-241_beschluss_vom_07.05.26.pdf
        ├── BK6-23-241_bilarem.pdf
        └── BK6-23-241_anlage_bilarem.pdf

metadata.json captures the Aktenzeichen, year, title, status, Stand (last-modified date), any submission deadline (Frist), the phase pages, and one entry per document (title, type, source URL, filename).
The normalized *.html files are trimmed, stable snapshots of the source phase pages so that content changes surface as small diffs.
The PDFs are the proceeding's documents, downloaded verbatim.

Change detection is intentionally "dumb": the tool always writes the current state, and git diff in the mirror repository reveals what changed.

Mirror repository

The scraper is designed to feed a separate mirror repository, Hochfrequenz/bnetza_bk6_mirror. A scheduled GitHub Action there will periodically:

pip install bnetza_bk6_scraper
bnetza-bk6-scraper mirror --target .
git add -A && git commit -m "update BK6 mirror"

so that regulatory changes at BK6 become visible as reviewable git diffs and commit history. That Action is future work and does not live in this repository.

WAF / browser User-Agent

The BNetzA website sits behind a Web Application Firewall that rejects non-browser clients by serving a 200 OK "The requested URL was rejected" page instead of the real content. To get through, the scraper sends browser-like User-Agent and Accept headers and treats the rejection page as a retryable error. No credentials or API keys are required.

Contribute

This project uses tox for all quality gates. Create a one-shot development environment with everything installed:

tox -e dev

Individual gates: tox -e tests, tox -e linting, tox -e type_check, tox -e coverage, and tox -e spell_check. Run the full suite with tox.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

hochfrequenz unlimitedfox

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.1.0

Jul 4, 2026

0.0.3

Jul 3, 2026

0.0.2

Jul 3, 2026

0.0.1

Jul 3, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bnetza_bk6_scraper-0.1.0.tar.gz (51.4 kB view details)

Uploaded Jul 4, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

bnetza_bk6_scraper-0.1.0-py3-none-any.whl (17.5 kB view details)

Uploaded Jul 4, 2026 Python 3

File details

Details for the file bnetza_bk6_scraper-0.1.0.tar.gz.

File metadata

Download URL: bnetza_bk6_scraper-0.1.0.tar.gz
Upload date: Jul 4, 2026
Size: 51.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for bnetza_bk6_scraper-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`463ab8c8e45d61f1acc87497cad740c36fab24d66f12c9547c0d3f40363fdcf9`
MD5	`1a8a92c02f3a218cad3b8941305d021c`
BLAKE2b-256	`f49b8ecab753cbf5e323436c177bf0d6d19aeca2f38f72b7e48831504636135b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for bnetza_bk6_scraper-0.1.0.tar.gz:

Publisher: python-publish.yml on Hochfrequenz/bnetza_bk6_scraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: bnetza_bk6_scraper-0.1.0.tar.gz
- Subject digest: 463ab8c8e45d61f1acc87497cad740c36fab24d66f12c9547c0d3f40363fdcf9
- Sigstore transparency entry: 2069468051
- Sigstore integration time: Jul 4, 2026
Source repository:
- Permalink: Hochfrequenz/bnetza_bk6_scraper@8213cfdd8dde16ac99a3a646c17de6480ddde262
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/Hochfrequenz
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@8213cfdd8dde16ac99a3a646c17de6480ddde262
- Trigger Event: release

File details

Details for the file bnetza_bk6_scraper-0.1.0-py3-none-any.whl.

File metadata

Download URL: bnetza_bk6_scraper-0.1.0-py3-none-any.whl
Upload date: Jul 4, 2026
Size: 17.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for bnetza_bk6_scraper-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`db0e3a7bd150aff53aa38a8a5a71e0a027fab5782522ae4a779a63ae7e576d71`
MD5	`42006c7e9b0b2351adabb5708cc2e397`
BLAKE2b-256	`2cef1d5516a4417083e1ffc9aa8c8f10077be13db72bf037d322bc7aa5437329`

See more details on using hashes here.

Provenance

The following attestation bundles were made for bnetza_bk6_scraper-0.1.0-py3-none-any.whl:

Publisher: python-publish.yml on Hochfrequenz/bnetza_bk6_scraper

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: bnetza_bk6_scraper-0.1.0-py3-none-any.whl
- Subject digest: db0e3a7bd150aff53aa38a8a5a71e0a027fab5782522ae4a779a63ae7e576d71
- Sigstore transparency entry: 2069468702
- Sigstore integration time: Jul 4, 2026
Source repository:
- Permalink: Hochfrequenz/bnetza_bk6_scraper@8213cfdd8dde16ac99a3a646c17de6480ddde262
- Branch / Tag: refs/tags/v0.1.0
- Owner: https://github.com/Hochfrequenz
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@8213cfdd8dde16ac99a3a646c17de6480ddde262
- Trigger Event: release

bnetza-bk6-scraper 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

bnetza_bk6_scraper

Installation

Usage

Scoped mirroring (Python API)

Output layout

Mirror repository

WAF / browser User-Agent

Contribute

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance