Skip to main content

AdaptKeyBERT extended keyphrase extraction with zero-shot and few-shot semi-supervised domain adaptation.

Project description

AdaptKeyBERT

KeyBERT is a minimal and easy-to-use keyword extraction technique that leverages BERT embeddings to create keywords and keyphrases that are most similar to a document.

AdaptKeyBERT expands the aforementioned library by integrating semi-supervised attention for creating a few-shot domain adaptation technique for keyphrase extraction. Also extended the work by allowing zero-shot word seeding, allowing better performance on topic relevant documents

Basic Use:

Take a look at runner.py

from adaptkeybert import KeyBERT

doc = """
         Supervised learning is the machine learning task of learning a function that
         maps an input to an output based on example input-output pairs. It infers a
         function from labeled training data consisting of a set of training examples.
         In supervised learning, each example is a pair consisting of an input object
         (typically a vector) and a desired output value (also called the supervisory signal).
         A supervised learning algorithm analyzes the training data and produces an inferred function,
         which can be used for mapping new examples. An optimal scenario will allow for the
         algorithm to correctly determine the class labels for unseen instances. This requires
         the learning algorithm to generalize from the training data to unseen situations in a
         'reasonable' way (see inductive bias). But then what about supervision and unsupervision, what happens to unsupervised learning.
      """
kw_model = KeyBERT()
keywords = kw_model.extract_keywords(doc, top_n=10)
print(keywords)


kw_model = KeyBERT(domain_adapt=True)
kw_model.pre_train([doc], [['supervised', 'unsupervised']], lr=1e-3)
keywords = kw_model.extract_keywords(doc, top_n=10)
print(keywords)


kw_model = KeyBERT(zero_adapt=True)
kw_model.zeroshot_pre_train(['supervised', 'unsupervised'], adaptive_thr=0.15)
keywords = kw_model.extract_keywords(doc, top_n=10)
print(keywords)


kw_model = KeyBERT(domain_adapt=True, zero_adapt=True)
kw_model.pre_train([doc], [['supervised', 'unsupervised']], lr=1e-3)
kw_model.zeroshot_pre_train(['supervised', 'unsupervised'], adaptive_thr=0.15)
keywords = kw_model.extract_keywords(doc, top_n=10)
print(keywords)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adaptkeybert-0.0.2.tar.gz (14.9 kB view details)

Uploaded Source

Built Distribution

adaptkeybert-0.0.2-py3-none-any.whl (19.7 kB view details)

Uploaded Python 3

File details

Details for the file adaptkeybert-0.0.2.tar.gz.

File metadata

  • Download URL: adaptkeybert-0.0.2.tar.gz
  • Upload date:
  • Size: 14.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for adaptkeybert-0.0.2.tar.gz
Algorithm Hash digest
SHA256 f250808c4c63f62343f9646c805aa3622865857e02eba6cd38711f1f3e80c5c9
MD5 de488b4b4feaf4bc4cf2802af4141c21
BLAKE2b-256 a7313585a0a3e1f9d805b1ef574a6ed02b65bd1ea2edaca897db7fec4925cd81

See more details on using hashes here.

File details

Details for the file adaptkeybert-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: adaptkeybert-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 19.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for adaptkeybert-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 4f0a6acf08f5398cf033a3adb5e79ff1626f000494c77bca9c8d7592c369fb3e
MD5 5c44e9831c69b52e150946acb3b17aa7
BLAKE2b-256 f3f8bd825716adf74244838475fc8b3869d6e7a18396763ac3c0e5d6789a57f4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page