Keyword extraction with spaCy
Project description
spacy_ke: Keyword Extraction with spaCy.
⏳ Installation
pip install spacy_ke
🚀 Quickstart
Usage as a spaCy pipeline component (spaCy v2.x.x)
import spacy
import spacy_ke
# load spacy model
nlp = spacy.load("en_core_web_sm")
# spacy v3.0.x factory.
# if you're using spacy v2.x.x swich to `nlp.add_pipe(spacy_ke.Yake(nlp))`
nlp.add_pipe("yake")
doc = nlp(
"Natural language processing (NLP) is a subfield of linguistics, computer science, and artificial intelligence "
"concerned with the interactions between computers and human language, in particular how to program computers "
"to process and analyze large amounts of natural language data. "
)
for keyword, score in doc._.extract_keywords(n=3):
print(keyword, "-", score)
Configure the pipeline component
Normally you'd want to configure the keyword extraction pipeline according to its implementation.
window: int = 2 # default
lemmatize: bool = False # default
candidate_selection: str = "ngram" # default, use "chunk" for noun phrase selection.
nlp.add_pipe(
Yake(
nlp,
window=window, # default
lemmatize=lemmatize, # default
candidate_selection="ngram" # default, use "chunk" for noun phrase selection
)
)
And if you want to define a custom candidate selection use the example below.
from typing import Iterable
from spacy.tokens import Doc
from spacy_ke.util import registry, Candidate
@registry.candidate_selection.register("custom")
def custom_selection(doc: Doc, n=3) -> Iterable[Candidate]:
...
nlp.add_pipe(
Yake(
nlp,
candidate_selection="custom"
)
)
Development
Set up virtualenv
$ python -m venv .venv
$ source .venv/bin/activate
Install dependencies
$ pip install -U pip
$ pip install -r requirements-dev.txt
Run unit test
$ pytest
Run black (code formatter)
$ black spacy_ke/ --config=pyproject.toml
Release package (via twine)
$ python setup.py upload
References
[1] A Review of Keyphrase Extraction
@article{DBLP:journals/corr/abs-1905-05044,
author = {Eirini Papagiannopoulou and
Grigorios Tsoumakas},
title = {A Review of Keyphrase Extraction},
journal = {CoRR},
volume = {abs/1905.05044},
year = {2019},
url = {http://arxiv.org/abs/1905.05044},
archivePrefix = {arXiv},
eprint = {1905.05044},
timestamp = {Tue, 28 May 2019 12:48:08 +0200},
biburl = {https://dblp.org/rec/journals/corr/abs-1905-05044.bib},
bibsource = {dblp computer science bibliography, https://dblp.org}
}
[2] pke: an open source python-based keyphrase extraction toolkit.
@InProceedings{boudin:2016:COLINGDEMO,
author = {Boudin, Florian},
title = {pke: an open source python-based keyphrase extraction toolkit},
booktitle = {Proceedings of COLING 2016, the 26th International Conference on Computational Linguistics: System Demonstrations},
month = {December},
year = {2016},
address = {Osaka, Japan},
pages = {69--73},
url = {http://aclweb.org/anthology/C16-2015}
}
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file spacy_ke-0.1.4.tar.gz.
File metadata
- Download URL: spacy_ke-0.1.4.tar.gz
- Upload date:
- Size: 13.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.9.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.7.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4effa703954b1f3309ae99648acb3ea4764797c9b77d83c0ec319ab832dc103d
|
|
| MD5 |
9dc175bb28013f3a18aab1c86fda0ecd
|
|
| BLAKE2b-256 |
0d30e0dff7f1481d0ebc59084ed66d06e1837d8f2e844b9a360cf4ce05e6647f
|
File details
Details for the file spacy_ke-0.1.4-py3-none-any.whl.
File metadata
- Download URL: spacy_ke-0.1.4-py3-none-any.whl
- Upload date:
- Size: 27.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/3.9.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.7.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
746f333c3a9fc7ac19661d3a0882758402183f5f3722f0ab124b5d9e30e3160c
|
|
| MD5 |
f33a66e610048b04e8069ac2c328da8c
|
|
| BLAKE2b-256 |
184534edca4ff8de00be107c8e87e8cd0856ad6cf45502b95d7686c3c4c580cb
|