Skip to main content

A Python library for extracting and matching keywords with semantic and entity-based boosting.

Project description

KeywordX

PyPI version License: MIT Python Version

KeywordX is a lightweight Python library for extracting and matching keywords from text using semantic similarity and entity-based boosting.
Perfect for NLP pipelines, chatbots, search systems, and event extraction.


Features

  • Extract keywords with semantic similarity scoring
  • Boost keyword matches using entities (dates, times, places, etc.)
  • Supports custom IDF weighting for better relevance
  • Easy-to-use API for integration into NLP pipelines

Installation

Install from PyPI:

pip install keywordx

The en_core_web_md spaCy model is required for the library to function. Install it using the following command:

python -m spacy download en_core_web_md

If the en_core_web_md model is not available, the library will attempt to fall back to the smaller en_core_web_sm model. However, this may result in reduced accuracy. You can install the fallback model using:

python -m spacy download en_core_web_sm

Or install from source:

git clone https://github.com/keikurono7/keywordx.git
cd keywordx
pip install -e .

Quick Start

Here is a quick example to get you started:

from keywordx import KeywordExtractor

ke = KeywordExtractor()
text = "Tomorrow I have a work meeting at 5pm in Bangalore."
keywords = ["meeting", "time", "place", "date"]

result = ke.extract(text, keywords)
print(result)

Example Output

The result will include extracted entities and semantic matches with scores:

{
  "entities": [
    {"span": [0, 8], "text": "Tomorrow", "type": "DATE"},
    {"span": [34, 37], "text": "5pm", "type": "TIME"},
    {"span": [41, 50], "text": "Bangalore", "type": "GPE"}
  ],
  "semantic_matches": [
    {"keyword": "meeting", "match": "meeting", "score": 0.99},
    {"keyword": "time", "match": "5pm", "score": 1.0},
    {"keyword": "place", "match": "Bangalore", "score": 1.0},
    {"keyword": "date", "match": "Tomorrow", "score": 1.0}
  ]
}

API Reference

  • KeywordExtractor()
    Initializes the keyword extractor.

  • .extract(text, keywords) → dict
    Extracts keywords and entities from text.

    • text: input string
    • keywords: list of keywords to match
  • Returns:

    • entities: named entities (DATE, TIME, GPE, etc.)
    • semantic_matches: list of matched keywords with similarity scores

Use Cases

  • Event and meeting extraction for calendar assistants
  • Chatbot intent detection
  • Automatic tagging of documents and notes
  • Context-aware search and indexing

Contributing

Contributions are welcome. For significant changes, please open an issue first to discuss the proposal.

Contributors

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

keywordx-1.0.5.tar.gz (8.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

keywordx-1.0.5-py3-none-any.whl (7.6 kB view details)

Uploaded Python 3

File details

Details for the file keywordx-1.0.5.tar.gz.

File metadata

  • Download URL: keywordx-1.0.5.tar.gz
  • Upload date:
  • Size: 8.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for keywordx-1.0.5.tar.gz
Algorithm Hash digest
SHA256 f388bd91efbdf25b5903276892a232881e3671538853b0bf01eee3ee7a170918
MD5 b7691b67c1bd040f5a5ea727dc8d2896
BLAKE2b-256 c2909d08c15d6c4a5a062a9225970ad0d3edb7bad56aaaf64d4c029d3e32a80d

See more details on using hashes here.

File details

Details for the file keywordx-1.0.5-py3-none-any.whl.

File metadata

  • Download URL: keywordx-1.0.5-py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for keywordx-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 1178e92754e9295d2a92d50b5d7b89535d0a039801ad01b585e6f78df8f92d2d
MD5 9a74f8d80e427ef2f382b43c2421f8b2
BLAKE2b-256 b4be0ae1b807dd0c691693042f0f688c597ec88ec6ee5a043c5959bfcc35be7e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page