Skip to main content

NLP pipeline software using common workflow language

Project description

Codacy Badge Build Status Documentation Status PyPI version PyPI

nlppln is a python package for creating NLP pipelines using Common Workflow Language (CWL). It provides steps for (generic) NLP functionality, such as tokenization, lemmatization, and part of speech tagging, and helps users to construct workflows from these steps.

A text processing step consist of a (Python) command line tool and a CWL specification to use this tool. Most tools provided by nppln wrap existing NLP functionality. The command line tools are made with Click, a Python package for creating command line interfaces.

To create a workflow, you have to write a Python script:

from nlppln import WorkflowGenerator

with WorkflowGenerator() as wf:
  txt_dir = wf.add_inputs(txt_dir='Directory')

  frogout = wf.frog_dir(in_dir=txt_dir)
  saf = wf.frog_to_saf(in_files=frogout)
  ner_stats = wf.save_ner_data(in_files=saf)
  new_saf = wf.replace_ner(metadata=ner_stats, in_files=saf)
  txt = wf.saf_to_txt(in_files=new_saf)

  wf.add_outputs(ner_stats=ner_stats, txt=txt)'anonymize.cwl')

The resulting workflow can be run using a CWL runner, such as cwltool:

cwltool anonymize.cwl --txt_dir /path/to/directory/with/txt/files/

For creating new (e.g., project specific) NLP functionality, you can use nlppln-gen to generate boilerplate (i.e., empty) command line tools and CWL specifications.

The full documentation can be found on Read the Docs.


Install nlppln using pip:

pip install nlppln

Please check the installation guidelines for additional required software.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for nlppln, version 0.3.0
Filename, size File type Python version Upload date Hashes
Filename, size nlppln-0.3.0.tar.gz (18.9 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page