A keyphrase extractor for Persian
Project description
Perke
perke
is an open source python-based keyphrase extraction toolkit for
persian language. It provides an end-to-end keyphrase extraction pipeline in
which each component can be easily modified or extended to develop new models.
Installation
- To pip install
perke
from github:pip install git+https://github.com/alirezah320/perke.git
perke
also requires a pos tagger model that can be obtained from here and must be put in resources directory.perke
supports Python 3.x.
Minimal example
perke
provides a standardized API for extracting keyphrases from a document.
Start by typing the 4 lines below. For using another model, simply replace
TextRank
with another model.
from perke.unsupervised.graph_based import TextRank
# Define the set of valid part of speech tags to occur in the model.
valid_pos_tags = {'N', 'Ne', 'AJ', 'AJe'}
# 1. Create a TextRank extractor.
extractor = TextRank(valid_pos_tags=valid_pos_tags)
# 2. Load the text.
extractor.load_text(input='text or path/to/input_file',
word_normalization_method=None)
# 3. Build the graph representation of the text and weight the
# words. Keyphrase candidates are composed from the 33 percent
# highest weighted words.
extractor.weight_candidates(window_size=2, top_t_percent=0.33)
# 4. Get the 10 highest weighted candidates as keyphrases.
keyphrases = extractor.get_n_best(n=10)
Detailed examples are provided in the examples directory.
Implemented models
perke
currently, implements the following keyphrase extraction models:
- Unsupervised models
- Graph-based models
- TextRank: article by Mihalcea and Tarau, 2004
- SingleRank: article by Wan and Xiao, 2008
- TopicRank: article by Bougouin et al., 2013
- PositionRank: article by Florescu and Caragea, 2017
- MultipartiteRank: article by Boudin, 2018
- Graph-based models
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
perke-0.2.0.tar.gz
(16.1 kB
view hashes)