Topic Modeling Toolkit
Project description
This library aims to automate Topic Modeling research-related activities.
Data preprocessing and dataset computing
Model training (with parameter grid-search), evaluating and comparing
Graph building
Computing KL-divergence between p(c|t) distributions
Datasets/models/kl-distances reporting
tests |
|
---|---|
package |
Overview
This library serves as a higher level API around the BigARTM (artm python interface) library and exposes it conviniently through the command line.
Key features of the Library:
Flexible preprocessing pipelines
Optimization of classification scheme with an evolutionary algorithm
Fast model inference with parallel/multicore execution
Persisting of models and experimental results
Visualization
Installation
$ git clone https://github.com/boromir674/topic-modeling-toolkit.git $ chmod +x topic-modeling-toolkit/build_artm.sh $ # build and install BigARTM library in /usr/local and create python3 wheel $ topic-modeling-toolkit/build_artm.sh $ ls bigartm/build/python/bigartm*.whl
python -m pip install bigartm/build/python/path-python-wheel
$ cd topic-modeling-toolkit $ pip install .
If the above fails try again including manual installation of dependencies
$ cd topic-modeling-toolkit $ pip install -r requirements.txt $ pip install .
Usage
A sample example is below.
$ current_dir=$(echo $PWD) $ export COLLECTIONS_DIR=$current_dir/datasets-dir $ mkdir $COLLECTIONS_DIR $ transform posts pipeline.cfg my-dataset $ train my-dataset train.cfg plsa-model --save $ make-graphs --model-labels "plsa-model" --allmetrics --no-legend $ xdg-open $COLLECTIONS_DIR/plsa-model/graphs/plsa*prpl*
Citation
Vorontsov, K. and Potapenko, A. (2015). Additive regularization of topic models. Machine Learning, 101(1):303–323.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file topic-modeling-toolkit-0.5.6.tar.gz
.
File metadata
- Download URL: topic-modeling-toolkit-0.5.6.tar.gz
- Upload date:
- Size: 10.8 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 654818986aaaf12b96b02d408f37684ea3691e515d8e736f6293eb73fd6fe999 |
|
MD5 | e1a40a78c2cd3308babd198804e5fb33 |
|
BLAKE2b-256 | a9605af5385f7c3ebbb25fe5611df20c89d2a5bb3654885b81b1c71f4819d8b3 |
File details
Details for the file topic_modeling_toolkit-0.5.6-py3-none-any.whl
.
File metadata
- Download URL: topic_modeling_toolkit-0.5.6-py3-none-any.whl
- Upload date:
- Size: 15.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.36.1 CPython/3.7.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | da3b5c6d4122c65c472c8214a042358e05023d91dca4be05f0cfd279b7586fd4 |
|
MD5 | cc6cbf3532d7755e0ee41a3c0675fe5c |
|
BLAKE2b-256 | 3ded644f0d0e707f0367b6b2df944b498ff9359708ce86d295a512fc6e9c6e0f |