Extracts keywords with 'TF-IDF' algorithm
Project description
"topicextractor" extracts topic keywords from documents based on 'TF-IDF' algorithm.
Usage
import topicextractor as te
sample_docs = [["doc1_str"], ["doc2_str"], ["doc3_str"] ....]
#extract noun counts from 'sample_docs' count_container = te.extract_noun_counts(sample_docs)
print(count_container) [{"a": 3, "b": 2}, {"c": 5, "d":3}, {"e": 7, "f": 12} ....]
#extract keywords for 'count_container[0]'(=the first docmunet in 'sample_docs') keywords = te.tfidf(count_container[0], count_container)
print(keywords) [("keyword1", 2.1104),("keyword2" 2.0012),("keyword3", 1.8892) ....]
Thanks, and please contact the author via e-mail for any comment.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for topicextractor-0.0.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 57b1d03ee998561705dd43a633cd991030d88f9a4222ac6c645da3b71dee5c42 |
|
MD5 | 150581b3b10226a4087fa36848905ed8 |
|
BLAKE2b-256 | 29dd8f966387184db4b86fc8fe5c4632ef9acb49c39c8a061a8b12803e551e12 |