Topic Modeling Evaluation
Project description
Topic Modeling Evaluation
A toolkit to quickly evaluate model goodness over number of topics
Metrics
Coherence measure to be used.
-
Fastest method - 'u_mass', 'c_uci' also known as
c_pmi
. -
For 'u_mass' corpus should be provided, if texts is provided, it will be converted to corpus using the dictionary.
-
For 'c_v', 'c_uci' and 'c_npmi'
texts
should be provided (corpus
isn't needed)
Examples
Example 1: estimate metrics for one topic model with specific number of topics
from tm_eval import *
# load a dictionary with document key and its term list split by ','.
input_file = "datasets/covid19_symptoms.pickle"
output_folder = "outputs"
model_name = "symptom"
num_topics = 10
# run
results = evaluate_all_metrics_from_lda_model(input_file=input_file,
output_folder=output_folder,
model_name=model_name,
num_topics=num_topics)
print(results)
Example 2: find model goodness change over number of topics
from tm_eval import *
if __name__=="__main__":
# start configure
# load a dictionary (key,value) with document id as key and its term list combined by ',' as value.
input_file = "datasets/covid19_symptoms.pickle"
output_folder = "outputs"
model_name = "symptom"
start=2
end=5
# end configure
# run and explore
list_results = explore_topic_model_metrics(input_file=input_file,
output_folder=output_folder,
model_name=model_name,
start=start,
end=end)
# summarize results
show_topic_model_metric_change(list_results,save=True,
save_path=f"{output_folder}/metrics.csv")
# plot metric changes
plot_tm_metric_change(csv_path=f"{output_folder}/metrics.csv",
save=True,save_folder=output_folder)
Output results
License
The tm-eval
toolkit is provided by Donghua Chen with MIT License.
References
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tm-eval-0.0.2.tar.gz
.
File metadata
- Download URL: tm-eval-0.0.2.tar.gz
- Upload date:
- Size: 11.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8b57ab1e2c29c69826c38b545afbbe1637af9de242c91497c41199375b4d7026 |
|
MD5 | cac50262e18b839e5e942bab2bb37f74 |
|
BLAKE2b-256 | 0833d9efe353a5e2216eb045b6d5efa74cd4ba09077a8ae6161757d6c4a320fc |
File details
Details for the file tm_eval-0.0.2-py3-none-any.whl
.
File metadata
- Download URL: tm_eval-0.0.2-py3-none-any.whl
- Upload date:
- Size: 9.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.9 tqdm/4.63.0 importlib-metadata/4.11.3 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.11
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0cfcc1bb6b171d5240fa531b3d022da75f5a33d4779ffe31be861e97a73979f1 |
|
MD5 | 75f574b7a41c4d714ef6985c00ec3ea6 |
|
BLAKE2b-256 | 871de6fec53d02ab08be20db22400ab6c1aca63111009500d034eaa898554957 |