No project description provided
Project description
KWE
kwe
is a simple tool to extract keywords from a text file. It provides a simple interface to extract keywords using different models. The models are based on statistical, graph-based, and embed-based methods. The tool is designed to be simple and easy to use.
Table of Contents
Installation
pip install kwe
Example
from kwe.models import load_model
from nltk.corpus import stopwords
corpus = [
"The quick brown fox jumps over the lazy dog.",
"The quick brown fox jumps over the lazy dog and the quick brown fox jumps over the lazy dog."
]
tokens = [sentence.split() for sentence in corpus]
stop_words_list = stopwords.words("english")
model_configs = [
{
"model_name": "countwords",
"model_kwargs": {
"token_freq_threshold": 100,
"stopwords": stop_words_list,
},
"use_tokens": True,
"do_train": True,
},
{
"model_name": "bm25",
"model_kwargs": {"token_freq_threshold": 100},
"use_tokens": True,
"do_train": True,
},
{
"model_name": "textrank",
"model_kwargs": {
"token_freq_threshold": 100,
"stopwords": stop_words_list,
},
"use_tokens": True,
"do_train": True,
},
{
"model_name": "embedrank",
"model_kwargs": {
"epochs": 15,
"stopwords": stop_words_list,
},
"use_tokens": True,
"do_train": True,
},
{
"model_name": "keybert",
"model_kwrags": {},
"use_tokens": False,
"do_train": False,
},
]
sample_size = None
results = {}
for model_config in model_configs:
model_name = model_config["model_name"]
model_kwargs = model_config.get("model_kwargs", {})
model = load_model(model_name, model_kwargs)
train_data = tokens if model_config["use_tokens"] else corpus
if model_config["do_train"]:
model.train(train_data)
top_words, top_scores = model.get_top_keywords(train_data[:sample_size])
results[model_name] = top_words
print(results)
Implmented models
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
kwe-0.1.5.tar.gz
(7.7 kB
view details)
Built Distribution
kwe-0.1.5-py3-none-any.whl
(9.6 kB
view details)
File details
Details for the file kwe-0.1.5.tar.gz
.
File metadata
- Download URL: kwe-0.1.5.tar.gz
- Upload date:
- Size: 7.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.9.18 Darwin/23.4.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dacd63dd90acb11b863b5a55f51d1b8fd7d016bcfd71cb9501d93266073f49a8 |
|
MD5 | a6321a989bc56329477470b031d59352 |
|
BLAKE2b-256 | fdd7318c5970b7c34d071712dd9d29e13e296b5b57b6d53e7529f1e5314bfb4d |
File details
Details for the file kwe-0.1.5-py3-none-any.whl
.
File metadata
- Download URL: kwe-0.1.5-py3-none-any.whl
- Upload date:
- Size: 9.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.2 CPython/3.9.18 Darwin/23.4.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5928e747cdceeba47e675d5caafe80076a8fe183b3bd3fdec8b1cd422f3ed4cc |
|
MD5 | 44b1ce0774ed92afef21122ba5ab8a04 |
|
BLAKE2b-256 | 6bffb606e11d217b2baad94e449645b066b8a2a6f8f235dcf2d1c4c5179c1e45 |