Skip to main content

This is a general framework to create arango db graphs and annotate them.

Project description

Welcome to the Corpus Annotation Graph Builder (CAG)

Badge: PyPI version Badge: Made with Python Badge: Open in VSCode Badge: Black DOI License: MIT Twitter: DLR Software

cag is a Python Library offering an architectural framework to employ the build-annotate pattern when building Graphs.


Official Documentation.

Corpus Annotation Graph builder (CAG) is an architectural framework that employs the build-and-annotate pattern for creating a graph. CAG is built on top of ArangoDB and its Python drivers (PyArango). The build-and-annotate pattern consists of two phases (see Figure below): (1) data is collected from different sources (e.g., publication databases, online encyclopedias, news feeds, web portals, electronic libraries, repositories, media platforms) and preprocessed to build the core nodes, which we call Objects of Interest. The component responsible for this phase is the Graph-Creator. (2) Annotations are extracted from the OOIs, and corresponding annotation nodes are created and linked to the core nodes. The component dealing with this phase is the Graph-Annotator.

cag

This framework aims to offer researchers a flexible but unified and reproducible way of organizing and maintaining their interlinked document collections in a Corpus Annotation Graph.

Installation

Direct install via pip

The package can also be installed directly via pip.

pip install cag

This will allow you to use the module cag from any python script locally. The two main packages are cag.framework and cag.view_wrapper.

Manual cloning

Clone the repository, go to the root folder and then run:

pip install -e .

Citation

Please cite us in case you use CAG

@inproceedings{el-baff-etal-2023-corpus,
  title = "Corpus Annotation Graph Builder ({CAG}): An Architectural Framework to Create and Annotate a Multi-source Graph",
  author = "El Baff, Roxanne  and
    Hecking, Tobias  and
    Hamm, Andreas  and
    Korte, Jasper W.  and
    Bartsch, Sabine",
  booktitle = "Proceedings of the 17th Conference of the European Chapter of the Association for Computational Linguistics: System Demonstrations",
  month = may,
  year = "2023",
  address = "Dubrovnik, Croatia",
  publisher = "Association for Computational Linguistics",
  url = "https://aclanthology.org/2023.eacl-demo.28",
  pages = "248--255"
}

Usage

  • After the installation, a project scaffold can be created with the command cag start-project
  • Graph Creation [jupyter notebook]
  • Graph Annotation [jupyter notebook]

Zenodo refs

  • v1.5.17 DOI
  • v1.5.0 DOI
  • v1.4.0 DOI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cag-1.6.0.tar.gz (3.1 MB view details)

Uploaded Source

Built Distribution

cag-1.6.0-py3-none-any.whl (98.4 kB view details)

Uploaded Python 3

File details

Details for the file cag-1.6.0.tar.gz.

File metadata

  • Download URL: cag-1.6.0.tar.gz
  • Upload date:
  • Size: 3.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.31.0

File hashes

Hashes for cag-1.6.0.tar.gz
Algorithm Hash digest
SHA256 cf0a24a5640eb8c21600be1631d0f07ce7fc719854166d64a815ec97ef7d5936
MD5 c664deaf1d0cf0cf281b8bdc07b67aba
BLAKE2b-256 1e866b58120867cd3603bb0e540d00c32b63a6f5194f1e6fd0bc9140f83c4ff0

See more details on using hashes here.

File details

Details for the file cag-1.6.0-py3-none-any.whl.

File metadata

  • Download URL: cag-1.6.0-py3-none-any.whl
  • Upload date:
  • Size: 98.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.31.0

File hashes

Hashes for cag-1.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d5fc7cf8ba00c18413d2ded609b40e1ceb804a01c6e885b7136729a87e3624c3
MD5 6c0d440d09a4268cfaef94bcad23bf6e
BLAKE2b-256 c9a82eb53ae618e578f4debb5cb58504c7edd4781653bfdcc38dc2529e57a849

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page