Extracts keywords with 'TF-IDF' algorithm
Project description
"topicextractor" extracts topic keywords from documents based on 'TF-IDF' algorithm.
Usage
import topicextractor as te
sample_docs = [["doc1_str"], ["doc2_str"], ["doc3_str"] ....]
#extract noun counts from 'sample_docs' count_container = te.extract_noun_counts(sample_docs)
print(count_container) [{"a": 3, "b": 2}, {"c": 5, "d":3}, {"e": 7, "f": 12} ....]
#extract keywords for 'count_container[0]'(=the first docmunet in 'sample_docs') keywords = te.tfidf(count_container[0], count_container)
print(keywords) [("keyword1", 2.1104),("keyword2" 2.0012),("keyword3", 1.8892) ....]
Thanks, and please contact the author via e-mail for any comment.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for topicextractor-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7ba8a354cdc8fdde56819a00b7553f1ed257514369d99b875052c5c181af4d2d |
|
MD5 | c3006c854b2858947766664d4ff1bb03 |
|
BLAKE2b-256 | 76655ce27da7265f0fc8d59e39c558003322bd83771a3750b6fa38c4bd92341c |