Skip to main content

Modification of the KeyBERT method to extract keywords and keyphrases using chunks. This provides better results, especialy when handling long documents.

Project description

Tests phorm.ai Pyversions

ChunkeyBERT

Overview

ChunkeyBert is a minimal and easy-to-use keyword extraction technique that leverages BERT embeddings for unsupervised keyphrase extraction from text documents. ChunkeyBert is a modification of the KeyBERT method to handle documents with arbitrary length with better results. ChunkeyBERT works by chunking the documents and uses KeyBERT to extract candidate keywords/keyphrases from all chunks followed by a similarity based selection stage to produce the final keywords for the entire document. ChunkeyBert can use any document chunking method as long as it can be wrapped in a simple function, however it can also work without a chunker and process the entire document as a single chunk. ChunkeyBert works with any configuration of KeyBERT and can handle batches of documents.

Installation

Install from PyPI using pip (preferred method):

pip install chunkey-bert

Experimental results

Very limited experimental results and demonstration of the library on a small number of documents is available at https://nbviewer.org/github/yaniv-shulman/chunkey-bert/tree/main/src/experiments/.

Contribution and feedback

Contributions and feedback are most welcome. Please see CONTRIBUTING.md for further details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chunkey_bert-0.2.0.tar.gz (7.5 kB view details)

Uploaded Source

Built Distribution

chunkey_bert-0.2.0-py3-none-any.whl (7.2 kB view details)

Uploaded Python 3

File details

Details for the file chunkey_bert-0.2.0.tar.gz.

File metadata

  • Download URL: chunkey_bert-0.2.0.tar.gz
  • Upload date:
  • Size: 7.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.8.10 Linux/5.15.0-107-generic

File hashes

Hashes for chunkey_bert-0.2.0.tar.gz
Algorithm Hash digest
SHA256 2764e83d0ec420ceb18eb0ca5f1c7818afcdb55ce3d32f96439eea2f38a14b9f
MD5 50d5b8ba35be5008932b58f3e831c3b6
BLAKE2b-256 b893a18712c152e5291adcf1f135199e0e818f5314d3ab0df7ee657f32de1671

See more details on using hashes here.

File details

Details for the file chunkey_bert-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: chunkey_bert-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 7.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.3 CPython/3.8.10 Linux/5.15.0-107-generic

File hashes

Hashes for chunkey_bert-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 250b25912548e17c679e39599d069ecac588f19ac37c6ee7b04277e7c2621d31
MD5 42f1228ff05562d753ad722fa1e57f5d
BLAKE2b-256 e3188ec7ace43589906f70a54f89ecc61bd24164d88415616baa6e7b31521a03

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page