Multilingual natural language tools, wrapping NLTK and other systems.
This package provides wrappers around NLTK and other systems to provide convenient natural language tools, such as:
- Stopword removers
- Word frequency lookup
- Lemmatizers (which reduce words to their root form, possibly taking part-of-speech tags into account)
- Analyzers for East Asian languages (for example, we currently use a MeCab process to find word breaks in Japanese)
For word frequencies in some language, metanl uses corpora from the University of Leeds Center for Translation Studies (http://corpus.leeds.ac.uk/list.html), whose data is released under the Creative Commons Attribution license.
Author: Rob Speer