Modification of the KeyBERT method to extract keywords and keyphrases using chunks. This provides better results, especialy when handling long documents.
Project description
ChunkeyBERT
Overview
ChunkeyBert is a minimal and easy-to-use keyword extraction technique that leverages BERT embeddings for unsupervised keyphrase extraction from text documents. ChunkeyBert is a modification of the KeyBERT method to handle documents with arbitrary length with better results. ChunkeyBERT works by chunking the documents and uses KeyBERT to extract candidate keywords/keyphrases from all chunks followed by a similarity based selection stage to produce the final keywords for the entire document. ChunkeyBert can use any document chunking method as long as it can be wrapped in a simple function, however it can also work without a chunker and process the entire document as a single chunk. ChunkeyBert works with any configuration of KeyBERT and can handle batches of documents.
Installation
Install from PyPI using pip (preferred method):
pip install chunkey-bert
Experimental results
Very limited experimental results and demonstration of the library on a small number of documents is available at https://nbviewer.org/github/yaniv-shulman/chunkey-bert/tree/main/src/experiments/.
Contribution and feedback
Contributions and feedback are most welcome. Please see CONTRIBUTING.md for further details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for chunkey_bert-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a16b592204aa257adcc293aa67c75eb06bf34b886671e08e2d7c644d8f9b4bc3 |
|
MD5 | 1d6e518f88ce7d567d5261ab2255285c |
|
BLAKE2b-256 | efc0cb1701194e41daf082c3ac16474d45745bd6e7e15daa2e0a338abd0da23f |