Skip to main content

Topic Modeling Toolkit

Project description

This library aims to automate Topic Modeling research-related activities.

  • Data preprocessing and dataset computing

  • Model training (with parameter grid-search), evaluating and comparing

  • Graph building

  • Computing KL-divergence between p(c|t) distributions

  • Datasets/models/kl-distances reporting

tests

Travis-CI Build Status
Coverage Status
Code Quality
Code Intelligence

package

PyPI Package latest release Supported versions

Overview

This library serves as a higher level API around the BigARTM (artm python interface) library and exposes it conviniently through the command line.

Key features of the Library:

  • Flexible preprocessing pipelines

  • Optimization of classification scheme with an evolutionary algorithm

  • Fast model inference with parallel/multicore execution

  • Persisting of models and experimental results

  • Visualization

Installation

The Topic Modeling Toolkit depends on the BigARTM C++ library. Therefore first you should first build and install it
either by following the instructions here or by using
the ‘build_artm.sh’ script provided. For example, for python3 you can use the following
$ git clone https://github.com/boromir674/topic-modeling-toolkit.git
$ chmod +x topic-modeling-toolkit/build_artm.sh
$ # build and install BigARTM library in /usr/local and create python3 wheel
$ topic-modeling-toolkit/build_artm.sh
$ ls bigartm/build/python/bigartm*.whl
Now you should have the ‘bigartm’ executable in PATH and you can find a built python wheel in ‘bigartm/build/python/’
You should install the wheel in your environment, for example with command
python -m pip install bigartm/build/python/path-python-wheel
You can install the package with the following command
When the package gets hosted on PyPI, it should be installed
$ cd topic-modeling-toolkit
$ pip install .

If the above fails try again including manual installation of dependencies

$ cd topic-modeling-toolkit
$ pip install -r requirements.txt
$ pip install .

Usage

A sample example is below.

$ current_dir=$(echo $PWD)
$ export COLLECTIONS_DIR=$current_dir/datasets-dir
$ mkdir $COLLECTIONS_DIR

$ transform posts pipeline.cfg my-dataset
$ train my-dataset train.cfg plsa-model --save
$ make-graphs --model-labels "plsa-model" --allmetrics --no-legend
$ xdg-open $COLLECTIONS_DIR/plsa-model/graphs/plsa*prpl*

Citation

  1. Vorontsov, K. and Potapenko, A. (2015). Additive regularization of topic models. Machine Learning, 101(1):303–323.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

topic-modeling-toolkit-0.5.6.tar.gz (10.8 MB view details)

Uploaded Source

Built Distribution

topic_modeling_toolkit-0.5.6-py3-none-any.whl (15.8 kB view details)

Uploaded Python 3

File details

Details for the file topic-modeling-toolkit-0.5.6.tar.gz.

File metadata

  • Download URL: topic-modeling-toolkit-0.5.6.tar.gz
  • Upload date:
  • Size: 10.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for topic-modeling-toolkit-0.5.6.tar.gz
Algorithm Hash digest
SHA256 654818986aaaf12b96b02d408f37684ea3691e515d8e736f6293eb73fd6fe999
MD5 e1a40a78c2cd3308babd198804e5fb33
BLAKE2b-256 a9605af5385f7c3ebbb25fe5611df20c89d2a5bb3654885b81b1c71f4819d8b3

See more details on using hashes here.

File details

Details for the file topic_modeling_toolkit-0.5.6-py3-none-any.whl.

File metadata

  • Download URL: topic_modeling_toolkit-0.5.6-py3-none-any.whl
  • Upload date:
  • Size: 15.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4

File hashes

Hashes for topic_modeling_toolkit-0.5.6-py3-none-any.whl
Algorithm Hash digest
SHA256 da3b5c6d4122c65c472c8214a042358e05023d91dca4be05f0cfd279b7586fd4
MD5 cc6cbf3532d7755e0ee41a3c0675fe5c
BLAKE2b-256 3ded644f0d0e707f0367b6b2df944b498ff9359708ce86d295a512fc6e9c6e0f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page