Modification of the KeyBERT method to extract keywords and keyphrases using chunks. This provides better results, especialy when handling long documents.
Project description
ChunkeyBERT
Overview
ChunkeyBert is a minimal and easy-to-use keyword extraction technique that leverages BERT embeddings for unsupervised keyphrase extraction from text documents. ChunkeyBert is a modification of the KeyBERT method to handle documents with arbitrary length with better results. ChunkeyBERT works by chunking the documents and uses KeyBERT to extract candidate keywords/keyphrases from all chunks followed by a similarity based selection stage to produce the final keywords for the entire document. ChunkeyBert can use any document chunking method as long as it can be wrapped in a simple function, however it can also work without a chunker and process the entire document as a single chunk. ChunkeyBert works with any configuration of KeyBERT and can handle batches of documents.
Installation
Install from PyPI using pip (preferred method):
pip install chunkey-bert
Experimental results
Very limited experimental results and demonstration of the library on a small number of documents is available at https://nbviewer.org/github/yaniv-shulman/chunkey-bert/tree/main/src/experiments/.
Contribution and feedback
Contributions and feedback are most welcome. Please see CONTRIBUTING.md for further details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file chunkey_bert-0.2.0.tar.gz
.
File metadata
- Download URL: chunkey_bert-0.2.0.tar.gz
- Upload date:
- Size: 7.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.8.10 Linux/5.15.0-107-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2764e83d0ec420ceb18eb0ca5f1c7818afcdb55ce3d32f96439eea2f38a14b9f |
|
MD5 | 50d5b8ba35be5008932b58f3e831c3b6 |
|
BLAKE2b-256 | b893a18712c152e5291adcf1f135199e0e818f5314d3ab0df7ee657f32de1671 |
File details
Details for the file chunkey_bert-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: chunkey_bert-0.2.0-py3-none-any.whl
- Upload date:
- Size: 7.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.3 CPython/3.8.10 Linux/5.15.0-107-generic
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 250b25912548e17c679e39599d069ecac588f19ac37c6ee7b04277e7c2621d31 |
|
MD5 | 42f1228ff05562d753ad722fa1e57f5d |
|
BLAKE2b-256 | e3188ec7ace43589906f70a54f89ecc61bd24164d88415616baa6e7b31521a03 |