Multilingual natural language tools, wrapping NLTK and other systems.
Project description
This package provides wrappers around NLTK and other systems to provide convenient natural language tools, such as:
Tokenizers
Stopword removers
Word frequency lookup
Lemmatizers (which reduce words to their root form, possibly taking part-of-speech tags into account)
Analyzers for East Asian languages (for example, we currently use a MeCab process to find word breaks in Japanese)
For word frequencies in some language, metanl uses corpora from the University of Leeds Center for Translation Studies (http://corpus.leeds.ac.uk/list.html), whose data is released under the Creative Commons Attribution license.
Author: Rob Speer
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.