Skip to main content

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for example the computation of n-grams, frequency lists and distributions, language models. There are also more complex data types, such as Priority Queues, and search algorithms, such as Beam Search.

Project description

https://travis-ci.org/proycon/pynlpl.svg?branch=master

PyNLPl, pronounced as “pineapple”, is a Python library for Natural Language Processing. It is a collection of various independent or loosely interdependent modules useful for common, and less common, NLP tasks. PyNLPl can be used for example the computation of n-grams, frequency lists and distributions, language models. There are also more complex data types, such as Priority Queues, and search algorithms, such as Beam Search.

The library is a divided into several packages and modules. It works on Python 2.7, as well as Python 3.

The following modules are available:

  • pynlpl.datatypes - Extra datatypes (priority queues, patterns, tries)
  • pynlpl.evaluation - Evaluation & experiment classes (parameter search, wrapped progressive sampling, class evaluation (precision/recall/f-score/auc), sampler, confusion matrix, multithreaded experiment pool)
  • pynlpl.formats.cgn - Module for parsing CGN (Corpus Gesproken Nederlands) part-of-speech tags
  • pynlpl.formats.folia - Extensive library for reading and manipulating the documents in FoLiA format (Format for Linguistic Annotation).
  • pynlpl.formats.fql - Extensive library for the FoLiA Query Language (FQL), built on top of pynlpl.formats.folia. FQL is currently documented here.
  • pynlpl.formats.cql - Parser for the Corpus Query Language (CQL), as also used by Corpus Workbench and Sketch Engine. Contains a convertor to FQL.
  • pynlpl.formats.giza - Module for reading GIZA++ word alignment data
  • pynlpl.formats.moses - Module for reading Moses phrase-translation tables.
  • pynlpl.formats.sonar - Largely obsolete module for pre-releases of the SoNaR corpus, use pynlpl.formats.folia instead.
  • pynlpl.formats.timbl - Module for reading Timbl output (consider using python-timbl instead though)
  • pynlpl.lm.lm - Module for simple language model and reader for ARPA language model data as well (used by SRILM).
  • pynlpl.search - Various search algorithms (Breadth-first, depth-first, beam-search, hill climbing, A star, various variants of each)
  • pynlpl.statistics - Frequency lists, Levenshtein, common statistics and information theory functions
  • pynlpl.textprocessors - Simple tokeniser, n-gram extraction

API Documentation can be found here.

Project details


Release history Release notifications

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
PyNLPl-0.7.6.4.tar.gz (152.9 kB) Copy SHA256 hash SHA256 Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page