a scraper to mirror edi-energy.de
Project description
edi-energy.de scraper
The Python package edi_energy_scraper
provides easy to use methods to mirror the website edi-energy.de.
Rationale / Why?
If you'd like to be informed about new regulations or data formats being published on edi-energy.de you can either
- visit the site every day and hope that you see the changes if this is your favourite hobby,
- or automate the task.
This repository helps you with the latter. It allows you to create an up-to-date copy of edi-energy.de on your local
computer. Other than if you mirrored the files using wget
or curl
, you'll get a clean and intuitive directory
structure.
From there you can e.g. commit the files into a VCS, scrape the PDF/Word files for later use...
We're all hoping for the day of true digitization on which this repository will become obsolete.
How to use the Package (as a user)
Install via pip:
pip install edi_energy_scraper
Create a directory in which you'd like to save the mirrored data:
mkdir edi_energy_de
Then import it and start the download:
from edi_energy_scraper import EdiEnergyScraper
scraper = EdiEnergyScraper(path_to_mirror_directory="edi_energy_de")
scraper.mirror()
This creates a directory structure:
-|-your_script_cwd.py
|-edi_energy_de
|- past (contains archived files)
|- ahb.pdf
|- ahb.docx
|- ...
|- current (contains files valid as of today)
|- mig.pdf
|- mig.docx
|- ...
|- future (contains files valid in the future)
|- allgemeine_festlegungen.pdf
|- schema.xsd
|- ...
To prevent a DOS, by default the script waits a random time in between 1 and 10 seconds between each file download. You can override this behaviour by providing your own "slow down" method:
from edi_energy_scraper import EdiEnergyScraper
from time import sleep
scraper = EdiEnergyScraper(path_to_mirror_directory="edi_energy_de",
dos_waiter=lambda: sleep(0)) # disable DOS protection
How to use this Repository on Your Machine (for development)
Please follow the instructions in our Python Template Repository . And for further information, see the Tox Repository.
Contribute
You are very welcome to contribute to this template repository by opening a pull request against the main branch.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for edi_energy_scraper-0.0.8-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | b773d8ad4d9f21afaa3300661c8d2baece36ae50f0e324e462feb549b0d0b89b |
|
MD5 | 3b80886eeb622a1f2c451961358fe456 |
|
BLAKE2b-256 | 4ca207db3de49211de12f9dbc0f5ac5e981c0e66de174f1dcccfa87e9344a9c2 |