Skip to main content

spaCy pipeline component for adding Keyphrase Extraction.

Project description

spacycaKE: Keyphrase Extraction for spaCy

spaCy v2.0 extension and pipeline component for Keyphrase Extraction methods meta data to Doc objects.

Installation

spacycaKE requires spacy v2.0.0 or higher and spacybert v1.0.0 or higher.

Usage

import spacy
from spacycake import BertKeyphraseExtraction as bake
nlp = spacy.load('en')

Then use bake as part of the spacy pipeline,

cake = bake(nlp, from_pretrained='bert-base-cased', top_k=3)
nlp.add_pipe(cake, last=True)

Extract the keyphrases.

doc = nlp("This is a test but obviously you need to place a bigger document here to extract meaningful keyphrases")
print(doc._.extracted_phrases)  # <-- List of 3 keyphrases

Available attributes

The extension sets attributes on the Doc object. You can change the attribute names on initializing the extension.

Doc._.bert_repr torch.Tensor Document BERT embedding
Doc._.noun_phrases List[str] List of the candidate phrases from the document
Doc._.extracted_phrases List[str] List of the final extracted keyphrases

Settings

On initialization of bake, you can define the following:

name type default description
nlp spacy.lang.(...) - Only used to get the language vocabulary to initialize the phrase matcher
from_pretrained str None Path to Bert model directory or name of HuggingFace transformers pre-trained Bert weights, e.g., bert-base-cased
attr_names Tuple[str] ('bert_repr', 'noun_phrases', 'extracted_phrases') Name of the various available attributes set to the ._ property (in order)
force_extension bool True A boolean value to create the same 'Extension Attribute' upon being executed again
top_k int 5 Max number of extracted phrases
mmr_lambda float .5 Maximum Marginal Relevance lambda parameter. Used to control diversity of extracted keyphrases. Closer to 1., the more diverse the results. Closer to 0., the more similar the extracted phrases will be to the source document.
kws kwargs - More keyword arguments to supply to spacybert.BertInference()

Roadmap

This extension is still experimental. Possible future updates include:

  • Adding other keyphrase extraction methods.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spacycake-1.0.0.tar.gz (4.0 kB view details)

Uploaded Source

Built Distribution

spacycake-1.0.0-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file spacycake-1.0.0.tar.gz.

File metadata

  • Download URL: spacycake-1.0.0.tar.gz
  • Upload date:
  • Size: 4.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0.post20200209 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for spacycake-1.0.0.tar.gz
Algorithm Hash digest
SHA256 cb24dadd368e29bf7739b96f9a07655ca15bbe9c3ca709b4e893bb22fb1d12e6
MD5 9db74300dbf7fec8862f1e8753b53f3f
BLAKE2b-256 b144fd5ca2626c8de497ce5ad47361fc59519c4acd79196b403cb08e384e13d7

See more details on using hashes here.

File details

Details for the file spacycake-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: spacycake-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 5.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0.post20200209 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for spacycake-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a82bc7e91a6aa4021eb70fc14926c5043d82bdc2f60910c87af21b8950bf054f
MD5 436f4648b947bf8270802337f913b138
BLAKE2b-256 a0d34a4ad10ca61d6fb18b3b8c62b91a6136822b9c5618c1c53d25e24e1ec07d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page