Skip to main content

Topic Modeling Evaluation

Project description

Topic Modeling Evaluation

A toolkit to quickly evaluate model goodness over number of topics

Metrics

Coherence measure to be used.

  • Fastest method - 'u_mass', 'c_uci' also known as c_pmi.

  • For 'u_mass' corpus should be provided, if texts is provided, it will be converted to corpus using the dictionary.

  • For 'c_v', 'c_uci' and 'c_npmi' texts should be provided (corpus isn't needed)

Examples

Example 1: estimate metrics for one topic model with specific number of topics

from tm_eval import *
# load a dictionary with document key and its term list split by ','.
input_file = "datasets/covid19_symptoms.pickle"
output_folder = "outputs"
model_name = "symptom"
num_topics = 10
# run
results = evaluate_all_metrics_from_lda_model(input_file=input_file, 
                                              output_folder=output_folder,
                                              model_name=model_name, 
                                              num_topics=num_topics)
print(results)

Example 2: find model goodness change over number of topics

from tm_eval import *
if __name__=="__main__":
    # start configure
    # load a dictionary (key,value) with document id as key and its term list combined by ',' as value.
    input_file = "datasets/covid19_symptoms.pickle"
    output_folder = "outputs"
    model_name = "symptom"
    start=2
    end=5
    # end configure
    # run and explore

    list_results = explore_topic_model_metrics(input_file=input_file, 
                                               output_folder=output_folder,
                                               model_name=model_name,
                                               start=start,
                                               end=end)
    # summarize results
    show_topic_model_metric_change(list_results,save=True,
                                   save_path=f"{output_folder}/metrics.csv")

    # plot metric changes
    plot_tm_metric_change(csv_path=f"{output_folder}/metrics.csv",
                          save=True,save_folder=output_folder)

Output results

c_v

u_mass

c_npmi

c_uci

License

The tm-eval toolkit is provided by Donghua Chen with MIT License.

References

  1. Topic Modeling in Python: Latent Dirichlet Allocation (LDA)
  2. Evaluate Topic Models: Latent Dirichlet Allocation (LDA)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tm-eval-0.0.2.tar.gz (11.9 kB view details)

Uploaded Source

Built Distribution

tm_eval-0.0.2-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file tm-eval-0.0.2.tar.gz.

File metadata

  • Download URL: tm-eval-0.0.2.tar.gz
  • Upload date:
  • Size: 11.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.11

File hashes

Hashes for tm-eval-0.0.2.tar.gz
Algorithm Hash digest
SHA256 8b57ab1e2c29c69826c38b545afbbe1637af9de242c91497c41199375b4d7026
MD5 cac50262e18b839e5e942bab2bb37f74
BLAKE2b-256 0833d9efe353a5e2216eb045b6d5efa74cd4ba09077a8ae6161757d6c4a320fc

See more details on using hashes here.

File details

Details for the file tm_eval-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: tm_eval-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 9.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.11

File hashes

Hashes for tm_eval-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 0cfcc1bb6b171d5240fa531b3d022da75f5a33d4779ffe31be861e97a73979f1
MD5 75f574b7a41c4d714ef6985c00ec3ea6
BLAKE2b-256 871de6fec53d02ab08be20db22400ab6c1aca63111009500d034eaa898554957

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page