Scraper for ALLRIS
Project description
This scraper offers both public and private scraping. The latter requires your username and password and performs the following tasks for you:
login
download of all agendas and motions related to upcoming meetings of committees and plenary sessions
Only considers meetings where you already have been invited formally through ALLRIS.
IMPORTANT: All districts are supported but official committee abbreviations will only work for Eimsbüttel as of now.
The public scraper ought to be used with care as it accesses all accessible pages of an entire month. Currently, June 2020 is hardcoded and it can only be used in a GUI environment.
Requirements
Python 3.7+
Firefox installed
geckodriver binary in PATH
Initial setup
Install ALLRIS scraper pip install twomartens.allrisscraper (you need Python 3.7+)
Run ALLRIS scraper a first time tm-allrisscraper (creates config ini in your current working directory)
Fill out the config file with your login credentials and an absolute path on your system to store PDFs of files
Configuration
[Default]
; possible values for district: Altona, Bergedorf, Eimsbüttel, Hamburg-Nord,
; Hamburg-Mitte, Harburg, Wandsbek
district = Eimsbüttel
; if you are not from Eimsbüttel your domain ending will differ
username = max.mustermann@eimsbuettel.de
; password is stored in clear text, therefore ini file should have most
; restrictive read permissions
password = VerySecurePassword
; location for storage of PDFs (trailing slash is IMPORTANT)
pdflocation = /path/to/storage/of/PDFs/
; location of the firefox binary
firefoxBinary = /Pfad/zur/firefox.exe
; location of the geckodriver binary
geckodriver = /Pfad/zum/geckodriver
Usage after initial setup
Run ALLRIS scraper: tm-allrisscraper (takes a few seconds to finish)
In the specified location for download you will find the following structure:
YYYY-MM-DD_Abbreviation of committee or plenary session/ (one directory for each meeting)
files inside the directory: Einladung.pdf (contains invitation), Mappe.pdf (contains all motions in one document), and Tagesordnung.pdf (agenda)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file twomartens.allrisscraper-0.5.5.tar.gz
.
File metadata
- Download URL: twomartens.allrisscraper-0.5.5.tar.gz
- Upload date:
- Size: 20.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6e858d82eab40b74424ef57548f248833653ad0ba810c466f56544b3003253cd |
|
MD5 | 57f8befae4fd6e15c4aa37eb867169b5 |
|
BLAKE2b-256 | 761136d47a9d779cb07069a755c956a5a85c0587c47bbd37168fbee5701da19b |
File details
Details for the file twomartens.allrisscraper-0.5.5-py3-none-any.whl
.
File metadata
- Download URL: twomartens.allrisscraper-0.5.5-py3-none-any.whl
- Upload date:
- Size: 21.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/44.0.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b802f7156f848450ba459d0ed0b7971c7d0a953630f6929db268b4c015e5f2ed |
|
MD5 | 858e61852201143f4fab243c99f47f97 |
|
BLAKE2b-256 | 93d7c067e4f00bdc79a2c324854bfdaee7930028b294d54a1c81b0fb31602f70 |