Modification of the KeyBERT method to extract keywords and keyphrases using chunks. This provides better results, especialy when handling long documents.
Project description
ChunkeyBERT
Overview
ChunkeyBert is a minimal and easy-to-use keyword extraction technique that leverages BERT embeddings for unsupervised keyphrase extraction from text documents. ChunkeyBert is a modification of the KeyBERT method to handle documents with arbitrary length with better results. ChunkeyBERT works by chunking the documents and uses KeyBERT to extract candidate keywords/keyphrases from all chunks followed by a similarity based selection stage to produce the final keywords for the entire document. ChunkeyBert can use any document chunking method as long as it can be wrapped in a simple function, however it can also work without a chunker and process the entire document as a single chunk. ChunkeyBert works with any configuration of KeyBERT and can handle batches of documents.
Installation
Install from PyPI using pip (preferred method):
pip install chunkey-bert
Experimental results
Very limited experimental results and demonstration of the library on a small number of documents is available at https://nbviewer.org/github/yaniv-shulman/chunkey-bert/tree/main/src/experiments/.
Contribution and feedback
Contributions and feedback are most welcome. Please see CONTRIBUTING.md for further details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for chunkey_bert-0.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 250b25912548e17c679e39599d069ecac588f19ac37c6ee7b04277e7c2621d31 |
|
MD5 | 42f1228ff05562d753ad722fa1e57f5d |
|
BLAKE2b-256 | e3188ec7ace43589906f70a54f89ecc61bd24164d88415616baa6e7b31521a03 |