Skip to main content

Extract amendments from European Parliament docx files

Project description

EuroParl-Amendment-Extract

This is a simple script to convert the amendments from the EP to JSON. It follow the format define in that dataset https://zenodo.org/record/4709248#.YXesJS8itqs for the article War Of Word https://github.com/indy-lab/war-of-words

prerequisites

Build the MEPs dataset with the script python3 meps.py

Debugger

To run the debugger we simply need to run 'streamlit run diff_visualizer.py', it will run the am labeler on the ep8 dataset and visualize the first error that it comes across.

Sequence matcher update

In diff.py there is now a extract_opcodes and extract_opcodes_v2. The v2 takes into account to merge consecutive 'replace' operations as well as 'delete' operations followed by a replace. We also display now at the end an accuracy metric that sort of roughly sketches the am labeler's performance. The accuracy is calculated by penalizing the algorithm for each edit that it gets wrong relative to the size / length of text that was attributed to the edit in question. With this evaluation metric we observed a high accuracy (99+ %) on the ep8 dataset.

Usage

from ep_amendment_extract import  extract_amendments

extract_amendments('file.docx')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

europarl_amendment_extract-1.1.1.tar.gz (10.2 kB view hashes)

Uploaded Source

Built Distribution

europarl_amendment_extract-1.1.1-py3-none-any.whl (10.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page