Skip to main content

Annotator combining different NLP pipelines

Project description

Automated annotation of natural languages using selected toolchains

Version License: MIT GitHub Workflow Status codecov Quality Gate Status Language

This project just had its first version release and is still under development.

Description

The nlpannotator package serves as modular toolchain to combine different natural language processing (nlp) tools to annotate texts (sentencizing, tokenization, part-of-speech (POS) and lemma).

Options

All input options are provided in an input dictionary. Two pre-set toolchains can be used: fast using spaCy for all annotations; accurate using SoMaJo for sentencizing and tokenization, and stanza for POS and lemma; and manual where any combination of spaCy, stanza, SoMaJo, Flair, Treetagger can be used, given the tool supports the selected annotation and language.

Installation

Install the project and its dependencies from PyPi:

pip install nlpannotator

The language models need to be installed separately. You can make use of the convenience script here which installs all language models for all languages that have been implemented for spaCy and stanza.

Usage

Take a look at the DemoNotebook or run it on Binder.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nlpannotator-1.0.1.tar.gz (22.2 kB view hashes)

Uploaded Source

Built Distribution

nlpannotator-1.0.1-py3-none-any.whl (26.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page