Skip to main content

A simple lemmatizer based on Unitex word lists

Project description

This is a simple module for lemmatization based on the Unitex inflected word list. As such, it needs a Unitex vocabulary file in order to work properly.

So far, I’ve only worked with Portuguese, with the DELAF_PB file provided by NILC.

Installing

You can either clone the repository and install with

$ python setup.py install

or install through pip

$ pip install unitexlemmatizer

Usage

In order to use the Unitex Lemmatizer, you need to tell it where the word list is:

>>> import unitexlemmatizer as ul
>>> ul.load_unitex_dictionary('/path/to/delaf.dic')

Then, you can call the get_lemma function passing the inflected word and its part of speech tag (from the Universal Dependencies tagset).

>>> ul.get_lemma('corpora', 'noun')
'corpus'

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unitexlemmatizer-1.0.0.tar.gz (3.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

unitexlemmatizer-1.0.0-py2.py3-none-any.whl (4.9 kB view details)

Uploaded Python 2Python 3

unitexlemmatizer-1.0.0-py2.7.egg (5.1 kB view details)

Uploaded Egg

File details

Details for the file unitexlemmatizer-1.0.0.tar.gz.

File metadata

File hashes

Hashes for unitexlemmatizer-1.0.0.tar.gz
Algorithm Hash digest
SHA256 6602ab1bdd8fd0946f6348718a6f6473814f81e8f77144e647dfee3645ff62a5
MD5 e2b5ef3622bf8939bf6a9a39ce385bb4
BLAKE2b-256 f57b61b0192d541ccb055603d75bf52021b9399a5cf4a2ef22f0b09a34bbc208

See more details on using hashes here.

File details

Details for the file unitexlemmatizer-1.0.0-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for unitexlemmatizer-1.0.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 a493635169a21456d66e7587a065ec86e0fb80b926516198077335a60fd38df3
MD5 36a8f4d39f2d0b494158320ecb3faf6e
BLAKE2b-256 1ed639ad1bd2dce9bd0d90faa64373a2fee48e89fe03e95da9b7a04cded0339b

See more details on using hashes here.

File details

Details for the file unitexlemmatizer-1.0.0-py2.7.egg.

File metadata

File hashes

Hashes for unitexlemmatizer-1.0.0-py2.7.egg
Algorithm Hash digest
SHA256 5a7a4699e10a1b37efaac2e9404e8766c0e664c907bc1a89fcd37910756dac08
MD5 4811bf793feb638b997305efa3654171
BLAKE2b-256 f9aebee3a227b4c623abd36c2354909354783a0a413e9bc11c5421c00b1ae1e9

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page