Skip to main content

CLI suite for benchmarking topic models

Project description

topic-benchmark

Command Line Interface for benchmarking topic models.

The package contains catalogue registries for all models, datasets and metrics for model evaluation, along with scripts for producing tables and figures for the S3 paper.

Usage

Installation

You can install the package from PyPI.

pip install topic-benchmark

Commands

run

Run the benchmark. Defaults to running all models with the benchmark used in Kardos et al. (2024).

python3 -m topic_benchmark run
Argument Short Flag Description Type Default
--out_dir OUT_DIR -o Output directory for the results. str results/
--encoders ENCODERS -e Which encoders should be used for conducting runs? str None
--models MODELS -m What subsection of models should the benchmark be run on. Optional[list[str], NoneType] None
--datasets DATASETS -d What datasets should the models be evaluated on. Optional[list[str], NoneType] None
--metrics METRICS -t What metrics should the models be evaluated on. Optional[list[str], NoneType] None
--seeds SEEDS -s What seeds should the models be evaluated on. Optional[list[int], NoneType] None

Push to hub

Push results to a HuggingFace repository.

python3 -m topic_benchmark push_to_hub "your_user/your_repo"
Argument Description Type Default
hf_repo HuggingFace repository to push results to. str N/A
results_folder Folder containing results for all embedding models. str results/

Reproducing $S^3$ paper results

Result files to all runs in the $S^3$ publication can be found in the results/ folder in the repository. To reproduce the results reported in our paper, please do the following.

First, install this package by running the following command:

pip install topic-benchmark
python3 -m topic-benchmark run -o results/

The results for each embedding model will be found in the results folder (unless a value for --out_file is explicitly passed).

To produce figures and tables in the paper, you can use the scripts in the scripts/s3_paper/ folder.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

topic_benchmark-0.6.0.tar.gz (16.0 kB view details)

Uploaded Source

Built Distribution

topic_benchmark-0.6.0-py3-none-any.whl (23.0 kB view details)

Uploaded Python 3

File details

Details for the file topic_benchmark-0.6.0.tar.gz.

File metadata

  • Download URL: topic_benchmark-0.6.0.tar.gz
  • Upload date:
  • Size: 16.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.11.5 Linux/5.15.0-124-generic

File hashes

Hashes for topic_benchmark-0.6.0.tar.gz
Algorithm Hash digest
SHA256 73e3b58f4b8925cfb279a92f1fd3b5cd7a3e150be558a434ec4c510f3de6adde
MD5 64349294faac2c1d85e04746d78809c5
BLAKE2b-256 1aaac306464c4660319e0004e8679b11b53d4a0f62dbcbfda41f070cc954b55c

See more details on using hashes here.

File details

Details for the file topic_benchmark-0.6.0-py3-none-any.whl.

File metadata

  • Download URL: topic_benchmark-0.6.0-py3-none-any.whl
  • Upload date:
  • Size: 23.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.1 CPython/3.11.5 Linux/5.15.0-124-generic

File hashes

Hashes for topic_benchmark-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dfc7884c3148b58c91e125ba021ca13abc83116bf07042c4dc22634fd1d99bc6
MD5 c57917e97c00fcd32cc951ede7064930
BLAKE2b-256 fdf0194187c269f17dc34552e76ca1c7621ded94bfa948c20c43ea9118f8a6f6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page