Skip to main content

A Python library for extracting and matching keywords with semantic and entity-based boosting.

Project description

KeywordX

PyPI version License: MIT Python Version

KeywordX is a lightweight Python library for extracting and matching keywords from text using semantic similarity and entity-based boosting.
Perfect for NLP pipelines, chatbots, search systems, and event extraction.


Features

  • Extract keywords with semantic similarity scoring
  • Boost keyword matches using entities (dates, times, places, etc.)
  • Supports custom IDF weighting for better relevance
  • Easy-to-use API for integration into NLP pipelines

Installation

Install from PyPI:

pip install keywordx

Additionally, for better results, install the en_core_web_md spaCy model:

python -m spacy download en_core_web_md

If the en_core_web_md model is not available, the library will automatically fall back to the smaller en_core_web_sm model, but this may result in reduced accuracy.

Or install from source:

git clone https://github.com/keikurono7/keywordx.git
cd keywordx
pip install -e .

Quick Start

Here is a quick example to get you started:

from keywordx import KeywordExtractor

ke = KeywordExtractor()
text = "Tomorrow I have a work meeting at 5pm in Bangalore."
keywords = ["meeting", "time", "place", "date"]

result = ke.extract(text, keywords)
print(result)

Example Output

The result will include extracted entities and semantic matches with scores:

{
  "entities": [
    {"span": [0, 8], "text": "Tomorrow", "type": "DATE"},
    {"span": [34, 37], "text": "5pm", "type": "TIME"},
    {"span": [41, 50], "text": "Bangalore", "type": "GPE"}
  ],
  "semantic_matches": [
    {"keyword": "meeting", "match": "meeting", "score": 0.99},
    {"keyword": "time", "match": "5pm", "score": 1.0},
    {"keyword": "place", "match": "Bangalore", "score": 1.0},
    {"keyword": "date", "match": "Tomorrow", "score": 1.0}
  ]
}

API Reference

  • KeywordExtractor()
    Initializes the keyword extractor.

  • .extract(text, keywords) → dict
    Extracts keywords and entities from text.

    • text: input string
    • keywords: list of keywords to match
  • Returns:

    • entities: named entities (DATE, TIME, GPE, etc.)
    • semantic_matches: list of matched keywords with similarity scores

Use Cases

  • Event and meeting extraction for calendar assistants
  • Chatbot intent detection
  • Automatic tagging of documents and notes
  • Context-aware search and indexing

Contributing

Contributions are welcome. For significant changes, please open an issue first to discuss the proposal.

Contributors

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

keywordx-1.0.3.tar.gz (7.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

keywordx-1.0.3-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file keywordx-1.0.3.tar.gz.

File metadata

  • Download URL: keywordx-1.0.3.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for keywordx-1.0.3.tar.gz
Algorithm Hash digest
SHA256 3e15b4ab840c1c6757bc19f4bbdfa1036e099327347eef0c62d102d76ba9e779
MD5 02d700f431a7121ab3c25d50bce610e1
BLAKE2b-256 5b6778ca6a5c195011a5ebd419af126262a51ec5b10beb9091b4be28a6697dc3

See more details on using hashes here.

File details

Details for the file keywordx-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: keywordx-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 7.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for keywordx-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 6e48aa4ee574adb67dc7f9f143f7b4791ad97c8ce9cbe726393c8ca6c6af285b
MD5 4d7b4e9b7712811efb038c669e2588a5
BLAKE2b-256 0ab97a295720ee9c039564e5fb77dc1c22ca775fb2ced37f4d7c033b25f2f464

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page