Extract amendments from European Parliament docx files
Project description
EuroParl-Amendment-Extract
This is a simple script to convert the amendments from the EP to JSON. It follow the format define in that dataset https://zenodo.org/record/4709248#.YXesJS8itqs for the article War Of Word https://github.com/indy-lab/war-of-words
prerequisites
Build the MEPs dataset with the script
python3 meps.py
Debugger
To run the debugger we simply need to run 'streamlit run diff_visualizer.py', it will run the am labeler on the ep8 dataset and visualize the first error that it comes across.
Sequence matcher update
In diff.py there is now a extract_opcodes and extract_opcodes_v2. The v2 takes into account to merge consecutive 'replace' operations as well as 'delete' operations followed by a replace. We also display now at the end an accuracy metric that sort of roughly sketches the am labeler's performance. The accuracy is calculated by penalizing the algorithm for each edit that it gets wrong relative to the size / length of text that was attributed to the edit in question. With this evaluation metric we observed a high accuracy (99+ %) on the ep8 dataset.
Usage
from ep_amendment_extract import extract_amendments
extract_amendments('file.docx')
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file europarl_amendment_extract-1.1.1.tar.gz
.
File metadata
- Download URL: europarl_amendment_extract-1.1.1.tar.gz
- Upload date:
- Size: 10.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 82397bad3b2660745326b141eae972e69189a2a47d3644f51a3b23857eb11e61 |
|
MD5 | d9e718b8467923f06afe6e9895d97681 |
|
BLAKE2b-256 | f8deb432cace178d5ddc69c3549a4efbbcc498e72fe1919030226dcb5b7b49a0 |
File details
Details for the file europarl_amendment_extract-1.1.1-py3-none-any.whl
.
File metadata
- Download URL: europarl_amendment_extract-1.1.1-py3-none-any.whl
- Upload date:
- Size: 10.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.17
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ea4c65d09869916c716aa14453404b553c947571b0a33750c79c471b26a5a42f |
|
MD5 | f52201b79e431c408aa2e057a24d1a44 |
|
BLAKE2b-256 | 48d3d7274075f2183c05646be8f743daa34c8049548e6c89ad47431ca620cd9a |