Modification of the KeyBERT method to extract keywords and keyphrases using chunks. This provides better results, especialy when handling long documents.

These details have not been verified by PyPI

Project links

GitHub Statistics

Project description

Tests

ChunkeyBERT

Overview

ChunkeyBert is a minimal and easy-to-use keyword extraction technique that leverages BERT embeddings for unsupervised keyphrase extraction from text documents. ChunkeyBert is a modification of the KeyBERT method to handle documents with arbitrary length with better results. ChunkeyBERT works by chunking the documents and uses KeyBERT to extract candidate keywords/keyphrases from all chunks followed by a similarity based selection stage to produce the final keywords for the entire document. ChunkeyBert can use any document chunking method as long as it can be wrapped in a simple function, however it can also work without a chunker and process the entire document as a single chunk. ChunkeyBert works with any configuration of KeyBERT and can handle batches of documents.

Installation

Install from PyPI using pip (preferred method):

pip install chunkey-bert

Experimental results

Very limited experimental results and demonstration of the library on a small number of documents is available at https://nbviewer.org/github/yaniv-shulman/chunkey-bert/tree/main/src/experiments/.

Contribution and feedback

Contributions and feedback are most welcome. Please see CONTRIBUTING.md for further details.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

Release history Release notifications | RSS feed

0.2.0

Jun 7, 2024

This version

0.1.0

Jun 7, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chunkey_bert-0.1.0.tar.gz (7.5 kB view hashes)

Uploaded Jun 7, 2024 Source

Built Distribution

chunkey_bert-0.1.0-py3-none-any.whl (7.1 kB view hashes)

Uploaded Jun 7, 2024 Python 3

Hashes for chunkey_bert-0.1.0.tar.gz

Hashes for chunkey_bert-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`27d819c2c7bd3eeaa60eeff3a2f338c04514bfb1f2f5361c97efdd5d0b8dcfe8`
MD5	`ca3008c7afafa7c089056c6452a24702`
BLAKE2b-256	`f957a2c5c03d76705433f34f26f49ef0357981d41c26e004fcdc08d1f991c978`

Hashes for chunkey_bert-0.1.0-py3-none-any.whl

Hashes for chunkey_bert-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a16b592204aa257adcc293aa67c75eb06bf34b886671e08e2d7c644d8f9b4bc3`
MD5	`1d6e518f88ce7d567d5261ab2255285c`
BLAKE2b-256	`efc0cb1701194e41daf082c3ac16474d45745bd6e7e15daa2e0a338abd0da23f`