Skip to main content

A toolkit to quickly evaluate topic model goodness over number of topics

Project description

Topic Modeling Evaluation

A toolkit to quickly evaluate model goodness over number of topics

Metrics

Coherence measure to be used.

  • Fastest method - 'u_mass', 'c_uci' also known as c_pmi.

  • For 'u_mass' corpus should be provided, if texts is provided, it will be converted to corpus using the dictionary.

  • For 'c_v', 'c_uci' and 'c_npmi' texts should be provided (corpus isn't needed)

Examples

Example 1: estimate metrics for one topic model with specific number of topics

from tm_eval import *
# load a dictionary with document key and its term list split by ','.
input_file = "datasets/covid19_symptoms.pickle"
output_folder = "outputs"
model_name = "symptom"
num_topics = 10
# run
results = evaluate_all_metrics_from_lda_model(input_file=input_file, 
                                              output_folder=output_folder,
                                              model_name=model_name, 
                                              num_topics=num_topics)
print(results)

Example 2: find model goodness change over number of topics

from tm_eval import *

if __name__=="__main__":
    # start configure
    # load a dictionary with document id as key and its term list split by ',' as value.
    input_file = "datasets/covid19_symptoms.pickle"
    output_folder = "outputs"
    model_name = "symptom"
    start=2
    end=20
    # end configure
    # run and explore
    list_results = explore_topic_model_metrics(input_file=input_file,
                                               output_folder=output_folder,
                                               model_name=model_name,
                                               start=start,
                                               end=end)
    # summarize results
    show_topic_model_metric_change(list_results,save=True,save_path=f"{output_folder}/metrics.csv")

License

The tm-eval toolkit is provided by Donghua Chen with MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tm-eval-0.0.1a0.tar.gz (11.2 kB view hashes)

Uploaded Source

Built Distribution

tm_eval-0.0.1a0-py3-none-any.whl (8.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page