Skip to main content

GATE NLP implementation in Python.

Project description

Python library gatenlp

PyPi version Python compatibility PyPI - Downloads License GitHub Build Status CodeCov Code style: black Join the chat at https://gitter.im/GateNLP/python-gatenlp Documentation Status Codacy Badge

This is a package for representing the basic elements of text processing and NLP in a way that is very similar to the Java GATE NLP framework, for manipulating GateNLP documents and for interacting with GATE Java and the GATE python plugin.

Documentation and feedback

If you find bugs, want to requrest a feature or change, please use the issue tracker

For more general discussions about the package and communication within current and future users, please use the Dicussions

Overview

Python GateNLP is an NLP and text processing framework implemented in Python.

Python GateNLP represents documents and stand-off annotations very similar to the Java GATE framework: Annotations describe arbitrary character ranges in the text and each annotation can have an arbitrary number of features. Documents can have arbitrary features and an arbitrary number of named annotation sets, where each annotation set can have an arbitrary number of annotations which can overlap in any way. Python GateNLP documents can be exchanged with Java GATE by using the bdocjs/bdocym/bdocmp formats which are supported in Java GATE via the Format Bdoc Plugin

Other than many other Python NLP tools, GateNLP does not require a specific way of how text is split up into tokens, tokens can be represented by annotations in any way, and a document can have different ways of tokenization simoultanously, if needed. Similarly, entities can be represented by annotations without restriction: they do not need to start or end at token boundaries and can overlap arbitrarily.

GateNLP provides ways to process text and create annotations using annotating pipelines, which are sequences of one or more annotators. There are annotators for matching text against gazetteer lists and annotators for complex matching of annotation and text sequences (see PAMPAC).

There is also support for creating GateNLP annotations with other NLP packages like Spacy or Stanford Stanza.

The GateNLP document representation also optionally allows to track all changes done to the document in a "change log" (a gatenlp.ChangeLog instance). Such changes can later be applied to other Python GateNLP or to Java GATE documents.

This library also implements the functionality for the interaction with a Java GATE process in two different ways:

  • The Java GATE Python plugin can invoke a python process to annotate GATE documents with python code
  • the python code can remote-control a Jave GATE instance

Versions and Roadmap

  • Versions 0.x are unpublished
  • Versions 1.0.x are public releases with feedback that may change APIs and change main parts of the software
  • Versions 1.x are public stable releases

Default branch renamed to "main"

If you have a cloned copy, you need to rename it in your local copy as well:

git branch -m master main
git fetch origin
git branch -u origin/main main

NOTE: The previous Pypi project "gatenlp" has moved to gatenlphiltlab

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gatenlp-1.0.4.tar.gz (2.3 MB view hashes)

Uploaded Source

Built Distribution

gatenlp-1.0.4-py3-none-any.whl (196.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page