Skip to main content

Text Mining and Topic Modeling Toolkit

Project description

tmtoolkit is a set of tools for text mining and topic modeling with Python. It contains functions for text preprocessing like lemmatization, stemming or POS tagging especially for English and German texts. Preprocessing is done in parallel by using all available processors on your machine. The topic modeling features include topic model evaluation metrics, allowing to calculate models with different parameters in parallel and comparing them (e.g. in order to find the best number of topics for a given set of documents). Topic models can be generated in parallel for different copora and/or parameter sets using the LDA implementations either from lda, scikit-learn or gensim.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for tmtoolkit, version 0.4.0
Filename, size File type Python version Upload date Hashes
Filename, size tmtoolkit-0.4.0-py2.py3-none-any.whl (15.3 MB) File type Wheel Python version 3.5 Upload date Hashes View
Filename, size tmtoolkit-0.4.0.tar.gz (15.2 MB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page