Skip to main content

Benchmarking topic models for a paper

Project description

topic-benchmark

Command Line Interface for benchmarking topic models.

The package contains catalogue registries for all models, datasets and metrics for model evaluation, along with scripts for producing tables and figures for the S3 paper.

Usage

Installation

You can install the package from PyPI.

pip install topic-benchmark

Commands

run

Run the benchmark using a given embedding model. Runs can be resumed if they get obruptly stopped from the results file.

python3 -m topic_benchmark run -e "embedding_model_name"
argument description type default
--encoder_model (-e) The encoder model to use for the benchmark. str "all-MiniLM-L6-v2"
--out_file (-o) The output path of the benchmark results. By default it will be under results/{encoder_model}.jsonl str None

table

Creates a latex table of the results of the benchmark.

python3 -m topic_benchmark table -o results.tex
argument description type default
results_folder The folder where all result files are located. str "results/"
--out_file (-o) The output path of the benchmark results. By default, results will be printed to stdout. str None

Reproducing $S^3$ paper results

Result files to all runs in the $S^3$ publication can be found in the results/ folder in the repository. To reproduce the results reported in our paper, please do the following.

First, install this package by running the following command:

pip install topic-benchmark==0.3.0

Then, reproduce results for all the embedding models tested in the paper by running the following CLI commands:

python3 -m topic_benchmark run -e all-Mini-L6-v2
python3 -m topic_benchmark run -e all-mpnet-base-v2
python3 -m topic_benchmark run -e average_word_embeddings_glove.6B.300d
python3 -m topic_benchmark run -e intfloat/e5-large-v2

The results for each embedding model will be found in the results folder (unless a value for --out_file is explicitly passed).

To produce figures and tables in the paper, you can use the scripts in the s3_paper_scripts/ folder.

pip install -r s3_paper_scripts/requirements.txt

# Table 3: Main Table (tables/main_table.tex)
python3 s3_paper_scripts/main_table.py

# Figure 2: Preprocessing effects (figures/effect_of_preprocessing.png)
python3 s3_paper_scripts/effect_of_preprocessing.py

# Figure 3: Stop word frequency in topic descriptions (figures/stop_freq.png)
python3 s3_paper_scripts/stop_words_figure.py

# Table 4: Average percentage runtime difference from S^3 (tables/speed.tex)
python3 s3_paper_scripts/speed.py

# Table 5: Topics in ArXiv ML (tables/arxiv_ml_topics.tex)
# Figure 4: Compass of Concepts in ArXiv ML (figures/arxiv_ml_map.png)
python3 s3_paper_scripts/arxiv_ml_compass.py

##################
#### APPENDIX ####
##################

# Table 6: NPMI Coherence of topics (tables/npmi_table.tex)
python3 s3_paper_scripts/npmi_table.py

# Figures 5-9: Disaggregated results (figures/disaggregated_{metric_name}.png)
python3 s3_paper_scripts/disaggregated_results_figures.py

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

topic_benchmark-0.5.0.tar.gz (16.3 kB view details)

Uploaded Source

Built Distribution

topic_benchmark-0.5.0-py3-none-any.whl (22.9 kB view details)

Uploaded Python 3

File details

Details for the file topic_benchmark-0.5.0.tar.gz.

File metadata

  • Download URL: topic_benchmark-0.5.0.tar.gz
  • Upload date:
  • Size: 16.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.11.5 Linux/5.15.0-122-generic

File hashes

Hashes for topic_benchmark-0.5.0.tar.gz
Algorithm Hash digest
SHA256 bae86bc98742efe1995b8d7a13b3e0ad4b88f31b4dd145f6498981c1cb026ffd
MD5 4cfcbf3edee7cea663ffa63b011c13f5
BLAKE2b-256 1bddb4753bc23ad2e971551fdd9b50aaaba85ed56f94149f0e06d725c23cd229

See more details on using hashes here.

File details

Details for the file topic_benchmark-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: topic_benchmark-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 22.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.11.5 Linux/5.15.0-122-generic

File hashes

Hashes for topic_benchmark-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 5c3ea0ba5d0018d48a1de3cab8cd8535f0de4dd66c13092e7eaf9c194a797123
MD5 0957238db788f9118df96045de191f5d
BLAKE2b-256 bcccb85f9bbf835c9ef767bd9325aed0936dadc93f0ff5e3c7a4d0413be9888d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page