A Python library for extracting and matching keywords with semantic and entity-based boosting.
Project description
KeywordX
KeywordX is a lightweight Python library for extracting and matching keywords from text using semantic similarity and entity-based boosting.
Perfect for NLP pipelines, chatbots, search systems, and event extraction.
Features
- Extract keywords with semantic similarity scoring
- Boost keyword matches using entities (dates, times, places, etc.)
- Supports custom IDF weighting for better relevance
- Easy-to-use API for integration into NLP pipelines
Installation
Install from PyPI:
pip install keywordx
Additionally, for better results, install the en_core_web_md spaCy model:
python -m spacy download en_core_web_md
If the en_core_web_md model is not available, the library will automatically fall back to the smaller en_core_web_sm model, but this may result in reduced accuracy.
Or install from source:
git clone https://github.com/keikurono7/keywordx.git
cd keywordx
pip install -e .
Quick Start
Here is a quick example to get you started:
from keywordx import KeywordExtractor
ke = KeywordExtractor()
text = "Tomorrow I have a work meeting at 5pm in Bangalore."
keywords = ["meeting", "time", "place", "date"]
result = ke.extract(text, keywords)
print(result)
Example Output
The result will include extracted entities and semantic matches with scores:
{
"entities": [
{"span": [0, 8], "text": "Tomorrow", "type": "DATE"},
{"span": [34, 37], "text": "5pm", "type": "TIME"},
{"span": [41, 50], "text": "Bangalore", "type": "GPE"}
],
"semantic_matches": [
{"keyword": "meeting", "match": "meeting", "score": 0.99},
{"keyword": "time", "match": "5pm", "score": 1.0},
{"keyword": "place", "match": "Bangalore", "score": 1.0},
{"keyword": "date", "match": "Tomorrow", "score": 1.0}
]
}
API Reference
-
KeywordExtractor()
Initializes the keyword extractor. -
.extract(text, keywords) → dict
Extracts keywords and entities from text.- text: input string
- keywords: list of keywords to match
-
Returns:
- entities: named entities (DATE, TIME, GPE, etc.)
- semantic_matches: list of matched keywords with similarity scores
Use Cases
- Event and meeting extraction for calendar assistants
- Chatbot intent detection
- Automatic tagging of documents and notes
- Context-aware search and indexing
Contributing
Contributions are welcome. For significant changes, please open an issue first to discuss the proposal.
Contributors
-
Madhusudan
- Email: dmpathani@gmail.com
- GitHub: keikurono7
- Role: Code implementation
-
Saniya Naaz
- Email: saniyanaaz2k4@gmail.com
- GitHub: Saniyanaaz11
- Role: Research work
-
Dr. Nandeeswar S B
- Email: hodcse.aiml@amceducation.in
- Role: Concept and idea generation
License
This project is licensed under the MIT License. See the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file keywordx-1.0.3.tar.gz.
File metadata
- Download URL: keywordx-1.0.3.tar.gz
- Upload date:
- Size: 7.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e15b4ab840c1c6757bc19f4bbdfa1036e099327347eef0c62d102d76ba9e779
|
|
| MD5 |
02d700f431a7121ab3c25d50bce610e1
|
|
| BLAKE2b-256 |
5b6778ca6a5c195011a5ebd419af126262a51ec5b10beb9091b4be28a6697dc3
|
File details
Details for the file keywordx-1.0.3-py3-none-any.whl.
File metadata
- Download URL: keywordx-1.0.3-py3-none-any.whl
- Upload date:
- Size: 7.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6e48aa4ee574adb67dc7f9f143f7b4791ad97c8ce9cbe726393c8ca6c6af285b
|
|
| MD5 |
4d7b4e9b7712811efb038c669e2588a5
|
|
| BLAKE2b-256 |
0ab97a295720ee9c039564e5fb77dc1c22ca775fb2ced37f4d7c033b25f2f464
|