Topic Modeling Toolkit
Project description
This library aims to automate Topic Modeling research-related activities.
Data preprocessing and dataset computing
Model training (with parameter grid-search), evaluating and comparing
Graph building
Computing KL-divergence between p(c|t) distributions
Datasets/models/kl-distances reporting
tests |
|
---|---|
package |
Overview
This library serves as a higher level API around the BigARTM (artm python interface) library and exposes it conviniently through the command line.
Key features of the Library:
Flexible preprocessing pipelines
Optimization of classification scheme with an evolutionary algorithm
Fast model inference with parallel/multicore execution
Persisting of models and experimental results
Visualization
Installation
$ git clone https://github.com/boromir674/topic-modeling-toolkit.git $ chmod +x topic-modeling-toolkit/build_artm.sh $ # build and install BigARTM library in /usr/local and create python3 wheel $ topic-modeling-toolkit/build_artm.sh $ ls bigartm/build/python/bigartm*.whl
python -m pip install bigartm/build/python/path-python-wheel
$ cd topic-modeling-toolkit $ pip install .
If the above fails try again including manual installation of dependencies
$ cd topic-modeling-toolkit $ pip install -r requirements.txt $ pip install .
Usage
A sample example is below.
$ current_dir=$(echo $PWD) $ export COLLECTIONS_DIR=$current_dir/datasets-dir $ mkdir $COLLECTIONS_DIR $ transform posts pipeline.cfg my-dataset $ train my-dataset train.cfg plsa-model --save $ make-graphs --model-labels "plsa-model" --allmetrics --no-legend $ xdg-open $COLLECTIONS_DIR/plsa-model/graphs/plsa*prpl*
Citation
Vorontsov, K. and Potapenko, A. (2015). Additive regularization of topic models. Machine Learning, 101(1):303–323.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for topic-modeling-toolkit-0.5.6.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 654818986aaaf12b96b02d408f37684ea3691e515d8e736f6293eb73fd6fe999 |
|
MD5 | e1a40a78c2cd3308babd198804e5fb33 |
|
BLAKE2b-256 | a9605af5385f7c3ebbb25fe5611df20c89d2a5bb3654885b81b1c71f4819d8b3 |
Hashes for topic_modeling_toolkit-0.5.6-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | da3b5c6d4122c65c472c8214a042358e05023d91dca4be05f0cfd279b7586fd4 |
|
MD5 | cc6cbf3532d7755e0ee41a3c0675fe5c |
|
BLAKE2b-256 | 3ded644f0d0e707f0367b6b2df944b498ff9359708ce86d295a512fc6e9c6e0f |