Skip to main content

A Python library for extracting and matching keywords with semantic and entity-based boosting.

Project description

KeywordX

PyPI version License: MIT Python Version

KeywordX is a lightweight Python library for extracting and matching keywords from text using semantic similarity and entity-based boosting.
Perfect for NLP pipelines, chatbots, search systems, and event extraction.


Features

  • Extract keywords with semantic similarity scoring
  • Boost keyword matches using entities (dates, times, places, etc.)
  • Supports custom IDF weighting for better relevance
  • Easy-to-use API for integration into NLP pipelines

Installation

Install from PyPI:

pip install keywordx

The en_core_web_md spaCy model is required for the library to function. Install it using the following command:

python -m spacy download en_core_web_md

If the en_core_web_md model is not available, the library will attempt to fall back to the smaller en_core_web_sm model. However, this may result in reduced accuracy. You can install the fallback model using:

python -m spacy download en_core_web_sm

Or install from source:

git clone https://github.com/keikurono7/keywordx.git
cd keywordx
pip install -e .

Quick Start

Here is a quick example to get you started:

from keywordx import KeywordExtractor

ke = KeywordExtractor()
text = "Tomorrow I have a work meeting at 5pm in Bangalore."
keywords = ["meeting", "time", "place", "date"]

result = ke.extract(text, keywords)
print(result)

Example Output

The result will include extracted entities and semantic matches with scores:

{
  "entities": [
    {"span": [0, 8], "text": "Tomorrow", "type": "DATE"},
    {"span": [34, 37], "text": "5pm", "type": "TIME"},
    {"span": [41, 50], "text": "Bangalore", "type": "GPE"}
  ],
  "semantic_matches": [
    {"keyword": "meeting", "match": "meeting", "score": 0.99},
    {"keyword": "time", "match": "5pm", "score": 1.0},
    {"keyword": "place", "match": "Bangalore", "score": 1.0},
    {"keyword": "date", "match": "Tomorrow", "score": 1.0}
  ]
}

API Reference

  • KeywordExtractor()
    Initializes the keyword extractor.

  • .extract(text, keywords) → dict
    Extracts keywords and entities from text.

    • text: input string
    • keywords: list of keywords to match
  • Returns:

    • entities: named entities (DATE, TIME, GPE, etc.)
    • semantic_matches: list of matched keywords with similarity scores

Use Cases

  • Event and meeting extraction for calendar assistants
  • Chatbot intent detection
  • Automatic tagging of documents and notes
  • Context-aware search and indexing

Contributing

Contributions are welcome. For significant changes, please open an issue first to discuss the proposal.

Contributors

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

keywordx-1.0.4.tar.gz (7.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

keywordx-1.0.4-py3-none-any.whl (7.4 kB view details)

Uploaded Python 3

File details

Details for the file keywordx-1.0.4.tar.gz.

File metadata

  • Download URL: keywordx-1.0.4.tar.gz
  • Upload date:
  • Size: 7.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for keywordx-1.0.4.tar.gz
Algorithm Hash digest
SHA256 9da4b742f2d84d014baa32a54e037cadeebee8077c3c108a6bcbde88b78fbc6f
MD5 5bbbef655e96cb2f1471164ba2b38bb4
BLAKE2b-256 79f46b79b8919d419583e5dd85c2a5fecd6cf4a1d46ebd4de46f5bfedaa24ed0

See more details on using hashes here.

File details

Details for the file keywordx-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: keywordx-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 7.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.2

File hashes

Hashes for keywordx-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 516ffbb686e8ff0fd2c1e394ec8f778aa41273b0f316255e7e7b15bab799b135
MD5 ca5d2a912f4620d4d6ee3dfd9cd22779
BLAKE2b-256 04a39e68859117dd34bc70ceda5d3de343f31bc71a41eb917143b850002af371

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page