Keyword extraction with spaCy
Project description
spacy_ke: Keyword Extraction with spaCy.
⏳ Installation
pip install spacy_ke
🚀 Quickstart
Usage as a spaCy pipeline component (spaCy v2.x.x)
import spacy
import spacy_ke
# load spacy model
nlp = spacy.load("en_core_web_sm")
# spacy v3.0.x factory.
# if you're using spacy v2.x.x swich to `nlp.add_pipe(spacy_ke.Yake(nlp))`
nlp.add_pipe("yake")
doc = nlp(
"Natural language processing (NLP) is a subfield of linguistics, computer science, and artificial intelligence "
"concerned with the interactions between computers and human language, in particular how to program computers "
"to process and analyze large amounts of natural language data. "
)
for keyword, score in doc._.extract_keywords(n=3):
print(keyword, "-", score)
Configure the pipeline component
Normally you'd want to configure the keyword extraction pipeline according to its implementation.
window: int = 2 # default
lemmatize: bool = False # default
candidate_selection: str = "ngram" # default, use "chunk" for noun phrase selection.
nlp.add_pipe(
Yake(
nlp,
window=window, # default
lemmatize=lemmatize, # default
candidate_selection="ngram" # default, use "chunk" for noun phrase selection
)
)
And if you want to define a custom candidate selection use the example below.
from typing import Iterable
from spacy.tokens import Doc
from spacy_ke.util import registry, Candidate
@registry.candidate_selection.register("custom")
def custom_selection(doc: Doc, n=3) -> Iterable[Candidate]:
...
nlp.add_pipe(
Yake(
nlp,
candidate_selection="custom"
)
)
Development
Set up virtualenv
$ python -m venv .venv
$ source .venv/bin/activate
Install dependencies
$ pip install -U pip
$ pip install -r requirements-dev.txt
Run unit test
$ pytest
Run black (code formatter)
$ black spacy_ke/ --config=pyproject.toml
Release package (via twine
)
$ python setup.py upload
References
[1] A Review of Keyphrase Extraction
@article{DBLP:journals/corr/abs-1905-05044,
author = {Eirini Papagiannopoulou and
Grigorios Tsoumakas},
title = {A Review of Keyphrase Extraction},
journal = {CoRR},
volume = {abs/1905.05044},
year = {2019},
url = {http://arxiv.org/abs/1905.05044},
archivePrefix = {arXiv},
eprint = {1905.05044},
timestamp = {Tue, 28 May 2019 12:48:08 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-1905-05044.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
[2] pke: an open source python-based keyphrase extraction toolkit.
@InProceedings{boudin:2016:COLINGDEMO,
author = {Boudin, Florian},
title = {pke: an open source python-based keyphrase extraction toolkit},
booktitle = {Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations},
month = {December},
year = {2016},
address = {Osaka, Japan},
pages = {69--73},
url = {http://aclweb.org/anthology/C16-2015}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
spacy_ke-0.1.4.tar.gz
(13.3 kB
view details)
Built Distribution
spacy_ke-0.1.4-py3-none-any.whl
(27.7 kB
view details)
File details
Details for the file spacy_ke-0.1.4.tar.gz
.
File metadata
- Download URL: spacy_ke-0.1.4.tar.gz
- Upload date:
- Size: 13.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.9.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.7.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4effa703954b1f3309ae99648acb3ea4764797c9b77d83c0ec319ab832dc103d |
|
MD5 | 9dc175bb28013f3a18aab1c86fda0ecd |
|
BLAKE2b-256 | 0d30e0dff7f1481d0ebc59084ed66d06e1837d8f2e844b9a360cf4ce05e6647f |
File details
Details for the file spacy_ke-0.1.4-py3-none-any.whl
.
File metadata
- Download URL: spacy_ke-0.1.4-py3-none-any.whl
- Upload date:
- Size: 27.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.9.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.7.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 746f333c3a9fc7ac19661d3a0882758402183f5f3722f0ab124b5d9e30e3160c |
|
MD5 | f33a66e610048b04e8069ac2c328da8c |
|
BLAKE2b-256 | 184534edca4ff8de00be107c8e87e8cd0856ad6cf45502b95d7686c3c4c580cb |