Extracts keywords with 'TF-IDF' algorithm
Project description
"topicextractor" extracts topic keywords from documents based on 'TF-IDF' algorithm.
Usage
import topicextractor as te
sample_docs = [["doc1_str"], ["doc2_str"], ["doc3_str"] ....]
#extract noun counts from 'sample_docs' (yon can omit this step and preprocess the data in your own way)
count_container = te.extract_noun_counts(sample_docs)
print(count_container) [{"a": 3, "b": 2}, {"c": 5, "d":3}, {"e": 7, "f": 12} ....]
#extract keywords for 'count_container[0]'(=the first docmunet in 'sample_docs')
keywords = te.tfidf(count_container[0], count_container)
print(keywords) [("keyword1", 2.1104),("keyword2" 2.0012),("keyword3", 1.8892) ....]
Thanks, and please contact the author via e-mail for any comment.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for topicextractor-0.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dcc3725c4b3c30042cedb45da0a1587dd91749161db436624027644e918d9496 |
|
MD5 | c699783ced52ac1650da29070fe62f98 |
|
BLAKE2b-256 | 99c3eed23434d2793d014a45bf84ce27a2fcca336ad2923d9adcf5e0a24aa073 |