Scrapes documents of Bundesnetzagentur Beschlusskammer 6 into a structured, git-diffable mirror
Project description
bnetza_bk6_scraper
bnetza_bk6_scraper mirrors the documents published by the German
Bundesnetzagentur (BNetzA) Beschlusskammer 6 (BK6) into a structured,
git-diffable directory tree. BK6 regulates electricity network access and is a
constant source of consultations, rulings (Festlegungen) and their attachments.
Because the agency publishes these as loose PDFs on HTML pages with no changelog,
tracking what changed and when is painful. This tool discovers every BK6
proceeding, downloads its PDFs and a normalized HTML snapshot of each phase page,
and records structured metadata. Committing the output to git turns every
regulatory update into a reviewable diff.
Installation
pip install bnetza_bk6_scraper
Usage
The package installs a single console command, bnetza-bk6-scraper, with a
mirror subcommand:
bnetza-bk6-scraper mirror --target <dir> [--concurrency N] [--year YYYY] [-v]
| Option | Default | Description |
|---|---|---|
--target |
(required) | Output directory (the mirror repository root). |
--concurrency |
4 |
Number of parallel HTTP fetches. |
--year |
(all) | Restrict the run to a single year, e.g. 2023. |
-v, --verbose |
off | Enable debug logging. |
Example — mirror only the 2023 proceedings into ./mirror:
bnetza-bk6-scraper mirror --target ./mirror --year 2023 -v
Each run logs a summary such as
run summary: 7 proceedings, 16 documents written, 0 failures.
Output layout
Proceedings are written under /{year}/{aktenzeichen}/, with a top-level
index.json listing every mirrored proceeding:
<target>/
├── index.json # summary of all proceedings
└── 2023/
└── BK6-23-241/
├── metadata.json # structured proceeding metadata
├── BK6-23-241_beschluss.html # normalized HTML snapshot of a phase page
├── BK6-23-241_beschluss_vom_07.05.26.pdf
├── BK6-23-241_bilarem.pdf
└── BK6-23-241_anlage_bilarem.pdf
metadata.jsoncaptures the Aktenzeichen, year, title, status,Stand(last-modified date), any submission deadline (Frist), the phase pages, and one entry per document (title, type, source URL, filename).- The normalized
*.htmlfiles are trimmed, stable snapshots of the source phase pages so that content changes surface as small diffs. - The PDFs are the proceeding's documents, downloaded verbatim.
Change detection is intentionally "dumb": the tool always writes the current
state, and git diff in the mirror repository reveals what changed.
Mirror repository
The scraper is designed to feed a separate mirror repository,
Hochfrequenz/bnetza_bk6_mirror. A scheduled GitHub Action there will
periodically:
pip install bnetza_bk6_scraper
bnetza-bk6-scraper mirror --target .
git add -A && git commit -m "update BK6 mirror"
so that regulatory changes at BK6 become visible as reviewable git diffs and commit history. That Action is future work and does not live in this repository.
WAF / browser User-Agent
The BNetzA website sits behind a Web Application Firewall that rejects
non-browser clients by serving a 200 OK "The requested URL was rejected" page
instead of the real content. To get through, the scraper sends browser-like
User-Agent and Accept headers and treats the rejection page as a retryable
error. No credentials or API keys are required.
Contribute
This project uses tox for all quality gates. Create a one-shot development environment with everything installed:
tox -e dev
Individual gates: tox -e tests, tox -e linting, tox -e type_check,
tox -e coverage, and tox -e spell_check. Run the full suite with tox.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bnetza_bk6_scraper-0.0.1.tar.gz.
File metadata
- Download URL: bnetza_bk6_scraper-0.0.1.tar.gz
- Upload date:
- Size: 32.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
186510447552dd457afae5d853947142decb84f8af810c73d58f7eac9fe18c21
|
|
| MD5 |
ae433e0607c1ab707a5439d2614d3edf
|
|
| BLAKE2b-256 |
8526d1471150ad26268850b38cbb56af54ad162845c6e67d26644edf5dfcd224
|
Provenance
The following attestation bundles were made for bnetza_bk6_scraper-0.0.1.tar.gz:
Publisher:
python-publish.yml on Hochfrequenz/bnetza_bk6_scraper
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bnetza_bk6_scraper-0.0.1.tar.gz -
Subject digest:
186510447552dd457afae5d853947142decb84f8af810c73d58f7eac9fe18c21 - Sigstore transparency entry: 2063861039
- Sigstore integration time:
-
Permalink:
Hochfrequenz/bnetza_bk6_scraper@e3eb3b49b9c20b3ad3899d5fd292165e6f92ec70 -
Branch / Tag:
refs/tags/v0.0.1 - Owner: https://github.com/Hochfrequenz
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@e3eb3b49b9c20b3ad3899d5fd292165e6f92ec70 -
Trigger Event:
release
-
Statement type:
File details
Details for the file bnetza_bk6_scraper-0.0.1-py3-none-any.whl.
File metadata
- Download URL: bnetza_bk6_scraper-0.0.1-py3-none-any.whl
- Upload date:
- Size: 13.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
45b497de67a24fa633fb1cad104869a6b0e8f262ad6b210d222c4a53e0816c4b
|
|
| MD5 |
9b5185dfe3bde9696a43a0213f0e47c6
|
|
| BLAKE2b-256 |
213a2b36fadfa4d35a802d9adcc6c4f29362c236dfa5e53aa03ac94aeeae534c
|
Provenance
The following attestation bundles were made for bnetza_bk6_scraper-0.0.1-py3-none-any.whl:
Publisher:
python-publish.yml on Hochfrequenz/bnetza_bk6_scraper
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
bnetza_bk6_scraper-0.0.1-py3-none-any.whl -
Subject digest:
45b497de67a24fa633fb1cad104869a6b0e8f262ad6b210d222c4a53e0816c4b - Sigstore transparency entry: 2063861095
- Sigstore integration time:
-
Permalink:
Hochfrequenz/bnetza_bk6_scraper@e3eb3b49b9c20b3ad3899d5fd292165e6f92ec70 -
Branch / Tag:
refs/tags/v0.0.1 - Owner: https://github.com/Hochfrequenz
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
python-publish.yml@e3eb3b49b9c20b3ad3899d5fd292165e6f92ec70 -
Trigger Event:
release
-
Statement type: