Skip to main content

No project description provided

Project description

KWE

kwe is a simple tool to extract keywords from a text file. It provides a simple interface to extract keywords using different models. The models are based on statistical, graph-based, and embed-based methods. The tool is designed to be simple and easy to use.

Table of Contents

Installation

pip install kwe

Example

from kwe.models import load_model
from nltk.corpus import stopwords

corpus = [
    "The quick brown fox jumps over the lazy dog.",
    "The quick brown fox jumps over the lazy dog and the quick brown fox jumps over the lazy dog."
]
tokens = [sentence.split() for sentence in corpus]

stop_words_list = stopwords.words("english")

model_configs = [
    {
        "model_name": "countwords",
        "model_kwargs": {
            "token_freq_threshold": 100,
            "stopwords": stop_words_list,
        },
        "use_tokens": True,
        "do_train": True,
    },
    {
        "model_name": "bm25",
        "model_kwargs": {"token_freq_threshold": 100},
        "use_tokens": True,
        "do_train": True,
    },
    {
        "model_name": "textrank",
        "model_kwargs": {
            "token_freq_threshold": 100,
            "stopwords": stop_words_list,
        },
        "use_tokens": True,
        "do_train": True,
    },
    {
        "model_name": "embedrank",
        "model_kwargs": {
            "epochs": 15,
            "stopwords": stop_words_list,
        },
        "use_tokens": True,
        "do_train": True,
    },
    {
        "model_name": "keybert",
        "model_kwrags": {},
        "use_tokens": False,
        "do_train": False,
    },
]

sample_size = None
results = {}
for model_config in model_configs:
    model_name = model_config["model_name"]
    model_kwargs = model_config.get("model_kwargs", {})
    model = load_model(model_name, model_kwargs)
    train_data = tokens if model_config["use_tokens"] else corpus

    if model_config["do_train"]:
        model.train(train_data)

    top_words, top_scores = model.get_top_keywords(train_data[:sample_size])

    results[model_name] = top_words

print(results)

Implmented models

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

kwe-0.1.5.tar.gz (7.7 kB view details)

Uploaded Source

Built Distribution

kwe-0.1.5-py3-none-any.whl (9.6 kB view details)

Uploaded Python 3

File details

Details for the file kwe-0.1.5.tar.gz.

File metadata

  • Download URL: kwe-0.1.5.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.9.18 Darwin/23.4.0

File hashes

Hashes for kwe-0.1.5.tar.gz
Algorithm Hash digest
SHA256 dacd63dd90acb11b863b5a55f51d1b8fd7d016bcfd71cb9501d93266073f49a8
MD5 a6321a989bc56329477470b031d59352
BLAKE2b-256 fdd7318c5970b7c34d071712dd9d29e13e296b5b57b6d53e7529f1e5314bfb4d

See more details on using hashes here.

File details

Details for the file kwe-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: kwe-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 9.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.9.18 Darwin/23.4.0

File hashes

Hashes for kwe-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 5928e747cdceeba47e675d5caafe80076a8fe183b3bd3fdec8b1cd422f3ed4cc
MD5 44b1ce0774ed92afef21122ba5ab8a04
BLAKE2b-256 6bffb606e11d217b2baad94e449645b066b8a2a6f8f235dcf2d1c4c5179c1e45

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page