Skip to main content

Document level Attitude and Relation Extraction toolkit (AREkit) for sampling and prompting mass-media news into datasets for ML-model training

Project description

AREkit 0.25.0

AREkit (Attitude and Relation Extraction Toolkit) -- is a python toolkit, devoted to document level Attitude and Relation Extraction between text objects from mass-media news.

Description

This toolkit aims at memory-effective data processing in Relation Extraction (RE) related tasks.

Figure: AREkit pipelines design. More on ARElight: Context Sampling of Large Texts for Deep Learning Relation Extraction paper

In particular, this framework serves the following features:

  • pipelines and iterators for handling large-scale collections serialization without out-of-memory issues.
  • 🔗 EL (entity-linking) API support for objects,
  • ➰ avoidance of cyclic connections,
  • :straight_ruler: distance consideration between relation participants (in terms or sentences),
  • 📑 relations annotations and filtering rules,
  • *️⃣ entities formatting or masking, and more.

The core functionality includes:

  • API for document presentation with EL (Entity Linking, i.e. Object Synonymy) support for sentence level relations preparation (dubbed as contexts);
  • API for contexts extraction;
  • Relations transferring from sentence-level onto document-level, and more.

Installation

pip install git+https://github.com/nicolay-r/AREkit.git@0.25.0-rc

Usage

Please follow the tutorial section on project Wiki for mode details.

How to cite

A great research is also accompanied by the faithful reference. if you use or extend our work, please cite as follows:

@inproceedings{rusnachenko2024arelight,
  title={ARElight: Context Sampling of Large Texts for Deep Learning Relation Extraction},
  author={Rusnachenko, Nicolay and Liang, Huizhi and Kolomeets, Maxim and Shi, Lei},
  booktitle={European Conference on Information Retrieval},
  year={2024},
  organization={Springer}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

arekit-0.25.0.tar.gz (121.7 kB view details)

Uploaded Source

Built Distribution

arekit-0.25.0-py3-none-any.whl (180.9 kB view details)

Uploaded Python 3

File details

Details for the file arekit-0.25.0.tar.gz.

File metadata

  • Download URL: arekit-0.25.0.tar.gz
  • Upload date:
  • Size: 121.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.5

File hashes

Hashes for arekit-0.25.0.tar.gz
Algorithm Hash digest
SHA256 ebd116054919cf0c22322fddcd34bacb39d7d262bbbf34e00d960697c85eec9b
MD5 25efdb6a8a93a986d0b65b0cc66c131c
BLAKE2b-256 70580af69a8ce177f7ad952924c8de175dc53f6591d5a276ee5a742858d747bc

See more details on using hashes here.

File details

Details for the file arekit-0.25.0-py3-none-any.whl.

File metadata

  • Download URL: arekit-0.25.0-py3-none-any.whl
  • Upload date:
  • Size: 180.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.5

File hashes

Hashes for arekit-0.25.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1f81560666e9c2d155e4a5566ad6824323c70a0b77f255102f5120311154354a
MD5 b658c890010e3c1f4cb0d363f6bead50
BLAKE2b-256 b44700ac31b9e91fa14fb02312b19219e4eaf5fa973de26915d1addc07ba460a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page