Skip to main content

Topic Modeling Toolkit

Project description

This library aims to automate Topic Modeling research-related activities.

  • Data preprocessing and dataset computing
  • Model training (with parameter grid-search), evaluating and comparing
  • Graph building
  • Computing KL-divergence between p(c|t) distributions
  • Datasets/models/kl-distances reporting
tests
Travis-CI Build Status
Coverage Status
Code Quality
Code Intelligence
package PyPI Package latest release Supported versions

Overview

This library serves as a higher level API around the BigARTM (artm python interface) library and exposes it conviniently through the command line.

Key features of the Library:

  • Flexible preprocessing pipelines
  • Optimization of classification scheme with an evolutionary algorithm
  • Fast model inference with parallel/multicore execution
  • Persisting of models and experimental results
  • Visualization

Installation

The Topic Modeling Toolkit depends on the BigARTM C++ library. Therefore first you should first build and install it
either by following the instructions here or by using
the ‘build_artm.sh’ script provided. For example, for python3 you can use the following
$ git clone https://github.com/boromir674/topic-modeling-toolkit.git
$ chmod +x topic-modeling-toolkit/build_artm.sh
$ # build and install BigARTM library in /usr/local and create python3 wheel
$ topic-modeling-toolkit/build_artm.sh
$ ls bigartm/build/python/bigartm*.whl
Now you should have the ‘bigartm’ executable in PATH and you can find a built python wheel in ‘bigartm/build/python/’
You should install the wheel in your environment, for example with command
python -m pip install bigartm/build/python/path-python-wheel
You can install the package with the following command
When the package gets hosted on PyPI, it should be installed
$ cd topic-modeling-toolkit
$ pip install .

If the above fails try again including manual installation of dependencies

$ cd topic-modeling-toolkit
$ pip install -r requirements.txt
$ pip install .

Usage

A sample example is below.

$ current_dir=$(echo $PWD)
$ export COLLECTIONS_DIR=$current_dir/datasets-dir
$ mkdir $COLLECTIONS_DIR

$ transform posts pipeline.cfg my-dataset
$ train my-dataset train.cfg plsa-model --save
$ make-graphs --model-labels "plsa-model" --allmetrics --no-legend
$ xdg-open $COLLECTIONS_DIR/plsa-model/graphs/plsa*prpl*

Citation

  1. Vorontsov, K. and Potapenko, A. (2015). Additive regularization of topic models. Machine Learning, 101(1):303–323.

Project details


Release history Release notifications

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for topic-modeling-toolkit, version 0.5.6
Filename, size File type Python version Upload date Hashes
Filename, size topic_modeling_toolkit-0.5.6-py3-none-any.whl (15.8 kB) File type Wheel Python version py3 Upload date Hashes View hashes
Filename, size topic-modeling-toolkit-0.5.6.tar.gz (10.8 MB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page