This is a general framework to create arango db graphs and annotate them.
Project description
Welcome to the Corpus Annotation Graph Builder (CAG)
cag
is a Python Library offering an architectural framework to employ the build-annotate pattern when building Graphs.
Corpus Annotation Graph builder (CAG) is an architectural framework that employs the build-and-annotate pattern for creating a graph. CAG is built on top of ArangoDB and its Python drivers (PyArango). The build-and-annotate pattern consists of two phases (see Figure below): (1) data is collected from different sources (e.g., publication databases, online encyclopedias, news feeds, web portals, electronic libraries, repositories, media platforms) and preprocessed to build the core nodes, which we call Objects of Interest. The component responsible for this phase is the Graph-Creator. (2) Annotations are extracted from the OOIs, and corresponding annotation nodes are created and linked to the core nodes. The component dealing with this phase is the Graph-Annotator.
This framework aims to offer researchers a flexible but unified and reproducible way of organizing and maintaining their interlinked document collections in a Corpus Annotation Graph.
Installation
Direct install via pip
The package can also be installed directly via pip.
pip install cag
This will allow you to use the module cag
from any python script locally. The two main packages are cag.framework
and cag.view_wrapper
.
Manual cloning
Clone the repository, go to the root folder and then run:
pip install -e .
Citation
Please cite us in case you use CAG
@inproceedings{el-baff-etal-2023-corpus,
title = "Corpus Annotation Graph Builder ({CAG}): An Architectural Framework to Create and Annotate a Multi-source Graph",
author = "El Baff, Roxanne and
Hecking, Tobias and
Hamm, Andreas and
Korte, Jasper W. and
Bartsch, Sabine",
booktitle = "Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations",
month = may,
year = "2023",
address = "Dubrovnik, Croatia",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.eacl-demo.28",
pages = "248--255"
}
Usage
- After the installation, a project scaffold can be created with the command
cag start-project
- Graph Creation [jupyter notebook]
- Graph Annotation [jupyter notebook]
Zenodo refs
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.