Document level Attitude and Relation Extraction toolkit (AREkit) for sampling and prompting mass-media news into datasets for ML-model training
Project description
AREkit 0.25.0
AREkit (Attitude and Relation Extraction Toolkit) -- is a python toolkit, devoted to document level Attitude and Relation Extraction between text objects from mass-media news.
Description
This toolkit aims at memory-effective data processing in Relation Extraction (RE) related tasks.
Figure: AREkit pipelines design. More on ARElight: Context Sampling of Large Texts for Deep Learning Relation Extraction paper
In particular, this framework serves the following features:
- ➿ pipelines and iterators for handling large-scale collections serialization without out-of-memory issues.
- 🔗 EL (entity-linking) API support for objects,
- ➰ avoidance of cyclic connections,
- :straight_ruler: distance consideration between relation participants (in
terms
orsentences
), - 📑 relations annotations and filtering rules,
- *️⃣ entities formatting or masking, and more.
The core functionality includes:
- API for document presentation with EL (Entity Linking, i.e. Object Synonymy) support for sentence level relations preparation (dubbed as contexts);
- API for contexts extraction;
- Relations transferring from sentence-level onto document-level, and more.
Installation
pip install git+https://github.com/nicolay-r/AREkit.git@0.25.0-rc
Usage
Please follow the tutorial section on project Wiki for mode details.
How to cite
A great research is also accompanied by the faithful reference. if you use or extend our work, please cite as follows:
@inproceedings{rusnachenko2024arelight,
title={ARElight: Context Sampling of Large Texts for Deep Learning Relation Extraction},
author={Rusnachenko, Nicolay and Liang, Huizhi and Kolomeets, Maxim and Shi, Lei},
booktitle={European Conference on Information Retrieval},
year={2024},
organization={Springer}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file arekit-0.25.0.tar.gz
.
File metadata
- Download URL: arekit-0.25.0.tar.gz
- Upload date:
- Size: 121.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ebd116054919cf0c22322fddcd34bacb39d7d262bbbf34e00d960697c85eec9b |
|
MD5 | 25efdb6a8a93a986d0b65b0cc66c131c |
|
BLAKE2b-256 | 70580af69a8ce177f7ad952924c8de175dc53f6591d5a276ee5a742858d747bc |
File details
Details for the file arekit-0.25.0-py3-none-any.whl
.
File metadata
- Download URL: arekit-0.25.0-py3-none-any.whl
- Upload date:
- Size: 180.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1f81560666e9c2d155e4a5566ad6824323c70a0b77f255102f5120311154354a |
|
MD5 | b658c890010e3c1f4cb0d363f6bead50 |
|
BLAKE2b-256 | b44700ac31b9e91fa14fb02312b19219e4eaf5fa973de26915d1addc07ba460a |