Skip to main content

Extract amendments from European Parliament docx files

Project description

EuroParl-Amendment-Extract

This is a simple script to convert the amendments from the EP to JSON. It follow the format define in that dataset https://zenodo.org/record/4709248#.YXesJS8itqs for the article War Of Word https://github.com/indy-lab/war-of-words

prerequisites

Build the MEPs dataset with the script python3 meps.py

Debugger

To run the debugger we simply need to run 'streamlit run diff_visualizer.py', it will run the am labeler on the ep8 dataset and visualize the first error that it comes across.

Sequence matcher update

In diff.py there is now a extract_opcodes and extract_opcodes_v2. The v2 takes into account to merge consecutive 'replace' operations as well as 'delete' operations followed by a replace. We also display now at the end an accuracy metric that sort of roughly sketches the am labeler's performance. The accuracy is calculated by penalizing the algorithm for each edit that it gets wrong relative to the size / length of text that was attributed to the edit in question. With this evaluation metric we observed a high accuracy (99+ %) on the ep8 dataset.

Usage

from ep_amendment_extract import  extract_amendments

extract_amendments('file.docx')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

europarl_amendment_extract-1.1.1.tar.gz (10.2 kB view details)

Uploaded Source

Built Distribution

europarl_amendment_extract-1.1.1-py3-none-any.whl (10.9 kB view details)

Uploaded Python 3

File details

Details for the file europarl_amendment_extract-1.1.1.tar.gz.

File metadata

File hashes

Hashes for europarl_amendment_extract-1.1.1.tar.gz
Algorithm Hash digest
SHA256 82397bad3b2660745326b141eae972e69189a2a47d3644f51a3b23857eb11e61
MD5 d9e718b8467923f06afe6e9895d97681
BLAKE2b-256 f8deb432cace178d5ddc69c3549a4efbbcc498e72fe1919030226dcb5b7b49a0

See more details on using hashes here.

File details

Details for the file europarl_amendment_extract-1.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for europarl_amendment_extract-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ea4c65d09869916c716aa14453404b553c947571b0a33750c79c471b26a5a42f
MD5 f52201b79e431c408aa2e057a24d1a44
BLAKE2b-256 48d3d7274075f2183c05646be8f743daa34c8049548e6c89ad47431ca620cd9a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page