Extract amendments from European Parliament docx files

Project description

EuroParl-Amendment-Extract

This is a simple script to convert the amendments from the EP to JSON. It follow the format define in that dataset https://zenodo.org/record/4709248#.YXesJS8itqs for the article War Of Word https://github.com/indy-lab/war-of-words

prerequisites

Build the MEPs dataset with the script python3 meps.py

Debugger

To run the debugger we simply need to run 'streamlit run diff_visualizer.py', it will run the am labeler on the ep8 dataset and visualize the first error that it comes across.

Sequence matcher update

In diff.py there is now a extract_opcodes and extract_opcodes_v2. The v2 takes into account to merge consecutive 'replace' operations as well as 'delete' operations followed by a replace. We also display now at the end an accuracy metric that sort of roughly sketches the am labeler's performance. The accuracy is calculated by penalizing the algorithm for each edit that it gets wrong relative to the size / length of text that was attributed to the edit in question. With this evaluation metric we observed a high accuracy (99+ %) on the ep8 dataset.

Usage

from ep_amendment_extract import  extract_amendments

extract_amendments('file.docx')

Project details

Release history Release notifications | RSS feed

This version

1.1.1

Sep 23, 2023

1.1.0

Sep 23, 2023

1.0.3

Sep 22, 2023

1.0.2

Sep 22, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

europarl_amendment_extract-1.1.1.tar.gz (10.2 kB view hashes)

Uploaded Sep 23, 2023 Source

Built Distribution

europarl_amendment_extract-1.1.1-py3-none-any.whl (10.9 kB view hashes)

Uploaded Sep 23, 2023 Python 3

Hashes for europarl_amendment_extract-1.1.1.tar.gz

Hashes for europarl_amendment_extract-1.1.1.tar.gz
Algorithm	Hash digest
SHA256	`82397bad3b2660745326b141eae972e69189a2a47d3644f51a3b23857eb11e61`
MD5	`d9e718b8467923f06afe6e9895d97681`
BLAKE2b-256	`f8deb432cace178d5ddc69c3549a4efbbcc498e72fe1919030226dcb5b7b49a0`

Hashes for europarl_amendment_extract-1.1.1-py3-none-any.whl

Hashes for europarl_amendment_extract-1.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ea4c65d09869916c716aa14453404b553c947571b0a33750c79c471b26a5a42f`
MD5	`f52201b79e431c408aa2e057a24d1a44`
BLAKE2b-256	`48d3d7274075f2183c05646be8f743daa34c8049548e6c89ad47431ca620cd9a`