Skip to main content

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotation).

Project description

https://travis-ci.org/proycon/pynlpl.svg?branch=master http://applejack.science.ru.nl/lamabadge.php/pynlpl

PyNLPl, pronounced as ‘pineapple’, is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Moreover, there are parsers for file formats common in NLP (e.g. FoLiA/Giza/Moses/ARPA/Timbl/CQL). There are also clients to interface with various NLP specific servers. PyNLPl most notably features a very extensive library for working with FoLiA XML (Format for Linguistic Annotatation).

The library is a divided into several packages and modules. It works on Python 2.7, as well as Python 3.

The following modules are available:

  • pynlpl.datatypes - Extra datatypes (priority queues, patterns, tries)

  • pynlpl.evaluation - Evaluation & experiment classes (parameter search, wrapped progressive sampling, class evaluation (precision/recall/f-score/auc), sampler, confusion matrix, multithreaded experiment pool)

  • pynlpl.formats.cgn - Module for parsing CGN (Corpus Gesproken Nederlands) part-of-speech tags

  • pynlpl.formats.folia - Extensive library for reading and manipulating the documents in FoLiA format (Format for Linguistic Annotation).

  • pynlpl.formats.fql - Extensive library for the FoLiA Query Language (FQL), built on top of pynlpl.formats.folia. FQL is currently documented here.

  • pynlpl.formats.cql - Parser for the Corpus Query Language (CQL), as also used by Corpus Workbench and Sketch Engine. Contains a convertor to FQL.

  • pynlpl.formats.giza - Module for reading GIZA++ word alignment data

  • pynlpl.formats.moses - Module for reading Moses phrase-translation tables.

  • pynlpl.formats.sonar - Largely obsolete module for pre-releases of the SoNaR corpus, use pynlpl.formats.folia instead.

  • pynlpl.formats.timbl - Module for reading Timbl output (consider using python-timbl instead though)

  • pynlpl.lm.lm - Module for simple language model and reader for ARPA language model data as well (used by SRILM).

  • pynlpl.search - Various search algorithms (Breadth-first, depth-first, beam-search, hill climbing, A star, various variants of each)

  • pynlpl.statistics - Frequency lists, Levenshtein, common statistics and information theory functions

  • pynlpl.textprocessors - Simple tokeniser, n-gram extraction

API Documentation can be found here.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PyNLPl-0.9.1.tar.gz (154.9 kB view details)

Uploaded Source

File details

Details for the file PyNLPl-0.9.1.tar.gz.

File metadata

  • Download URL: PyNLPl-0.9.1.tar.gz
  • Upload date:
  • Size: 154.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for PyNLPl-0.9.1.tar.gz
Algorithm Hash digest
SHA256 52f644b0f3432890aa1c5a66b618b06256fcc71268dd40cbb826ff8fb17ab24c
MD5 62c4a5495753bc1160db3f3f24bdd189
BLAKE2b-256 114c1ec856150fb77d03e15155f4ef0957211bffbc82bef2fa4efc4ed861c41c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page