Parse PDF-to-XML converted lobby list of German Bundestag
Project description
Use pdftohtml to get an XML file from the pdf.
pdftohtml -xml input.pdf output.xml
Then use the extractor with first and last relevant page number to convert to parsed JSON:
python extract_lobby.py 4 690 < lobbylist.xml > lobbylist.json
Here is [extracted JSON (15th of June 2012)](http://stefanwehrmeyer.com/projects/verbaendeliste/20120615.json).
License: MIT-License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for verbaendeliste-bundestag-0.1.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8be6cd3c9afe8a097e2ec9f42b45163b257cdb01a129fe84efbe7d5642989f3d |
|
MD5 | c5a8f33a6de4b4718094a42cea20c97e |
|
BLAKE2b-256 | 933b30c36f7c73c7e8e7a6ae9c36fbe771b0f7192173de1323f2faa1fd1a86da |
Close
Hashes for verbaendeliste_bundestag-0.1.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7f03ba7d810b2f70759afb85168045a5458814f740b6876bf554af9c4f2570f2 |
|
MD5 | c4cbfbb7be7f58a195a726f64d9e4a50 |
|
BLAKE2b-256 | d06539a2d87598578f6be3c62af800dd0ae97c606ab9a7787632f51f123741d1 |