spaCy pipeline component for adding Keyphrase Extraction.
Project description
spacycaKE: Keyphrase Extraction for spaCy
spaCy v2.0 extension and pipeline component for Keyphrase Extraction methods meta data to Doc
objects.
Installation
spacycaKE
requires spacy
v2.0.0 or higher and spacybert
v1.0.0 or higher.
Usage
import spacy
from spacycake import BertKeyphraseExtraction as bake
nlp = spacy.load('en')
Then use bake
as part of the spacy pipeline,
cake = bake(nlp, from_pretrained='bert-base-cased', top_k=3)
nlp.add_pipe(cake, last=True)
Extract the keyphrases.
doc = nlp("This is a test but obviously you need to place a bigger document here to extract meaningful keyphrases")
print(doc._.extracted_phrases) # <-- List of 3 keyphrases
Available attributes
The extension sets attributes on the Doc
object. You can change the attribute names on initializing the extension.
Doc._.bert_repr |
torch.Tensor |
Document BERT embedding |
Doc._.noun_phrases |
List[str] |
List of the candidate phrases from the document |
Doc._.extracted_phrases |
List[str] |
List of the final extracted keyphrases |
Settings
On initialization of bake
, you can define the following:
name | type | default | description |
---|---|---|---|
nlp |
spacy.lang.(...) |
- | Only used to get the language vocabulary to initialize the phrase matcher |
from_pretrained |
str |
None |
Path to Bert model directory or name of HuggingFace transformers pre-trained Bert weights, e.g., bert-base-cased |
attr_names |
Tuple[str] |
('bert_repr', 'noun_phrases', 'extracted_phrases') |
Name of the various available attributes set to the ._ property (in order) |
force_extension |
bool |
True |
A boolean value to create the same 'Extension Attribute' upon being executed again |
top_k |
int |
5 | Max number of extracted phrases |
mmr_lambda |
float |
.5 | Maximum Marginal Relevance lambda parameter. Used to control diversity of extracted keyphrases. Closer to 1., the more diverse the results. Closer to 0., the more similar the extracted phrases will be to the source document. |
kws |
kwargs |
- | More keyword arguments to supply to spacybert.BertInference() |
Roadmap
This extension is still experimental. Possible future updates include:
- Adding other keyphrase extraction methods.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file spacycake-1.0.0.tar.gz
.
File metadata
- Download URL: spacycake-1.0.0.tar.gz
- Upload date:
- Size: 4.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0.post20200209 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cb24dadd368e29bf7739b96f9a07655ca15bbe9c3ca709b4e893bb22fb1d12e6 |
|
MD5 | 9db74300dbf7fec8862f1e8753b53f3f |
|
BLAKE2b-256 | b144fd5ca2626c8de497ce5ad47361fc59519c4acd79196b403cb08e384e13d7 |
File details
Details for the file spacycake-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: spacycake-1.0.0-py3-none-any.whl
- Upload date:
- Size: 5.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0.post20200209 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a82bc7e91a6aa4021eb70fc14926c5043d82bdc2f60910c87af21b8950bf054f |
|
MD5 | 436f4648b947bf8270802337f913b138 |
|
BLAKE2b-256 | a0d34a4ad10ca61d6fb18b3b8c62b91a6136822b9c5618c1c53d25e24e1ec07d |