Skip to main content

Keyword extraction Python package

Project description

Yet Another Keyword Extractor (Yake)

https://img.shields.io/pypi/v/yake.svg Documentation Status Updates

Unsupervised Approach for Automatic Keyword Extraction using Text Features

Main Features

  • Unsupervised approach

  • Multi-Language Support

  • Single document

Rationale

Extracting keywords from texts has become a challenge for individuals and organizations as the information grows in complexity and size. The need to automate this task so that texts can be processed in a timely and adequate manner has led to the emergence of automatic keyword extraction tools. Despite the advances, there is a clear lack of multilingual online tools to automatically extract keywords from single documents. Yake! is a novel feature-based system for multi-lingual keyword extraction, which supports texts of different sizes, domain or languages. Unlike other approaches, Yake! does not rely on dictionaries nor thesauri, neither is trained against any corpora. Instead, it follows an unsupervised approach which builds upon features extracted from the text, making it thus applicable to documents written in different languages without the need for further knowledge. This can be beneficial for a large number of tasks and a plethora of situations where the access to training corpora is either limited or restricted.

Requirements

Python3

Installation

To install Yake on your terminal

pip install yake

Usage

How to use it on your favorite command line:

yake --input_file [text file] --language en --ngram_size 3

How to use it on Python:

from yake.yake import YakeKeywordExtractor

text_content = """
        Sources tell us that Google is acquiring Kaggle, a platform that hosts data science and machine learning
        competitions. Details about the transaction remain somewhat vague , but given that Google is hosting
        its Cloud Next conference in San Francisco this week, the official announcement could come as early
        as tomorrow.  Reached by phone, Kaggle co-founder CEO Anthony Goldbloom declined to deny that the
        acquisition is happening. Google itself declined 'to comment on rumors'.
"""

# assuming default parameters
simple_kwextractor = YakeKeywordExtractor()
keywords = simple_kwextractor.extract_keywords(text_content)

for kw in keywords:
        print(kw)

# specifying parameters
custom_kwextractor = YakeKeywordExtractor(lan="en", n=3, dedupLim=0.8, windowsSize=2, top=20)
keywords = custom_kwextractor.extract_keywords(text_content)

for kw in keywords:
        print(kw)

Credits

  • Vitor Mangaravite

  • Arian Pasquali

  • Ricardo Campos

  • Alípio Jorge

  • Adam Jatowt

  • Célia Nunes

Paper Citation

Please cite the paper when it applies:

Ricardo Campos; Vítor Mangaravite; Alípio Mário Jorge; Célia Nunes; Arian Pasquali; Adam Jatowt
Unsupervised Approach for Automatic Keyword Extraction using Text Features, 2017

History

0.1.0 (2017-10-03)

  • First release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yake-0.2.0.tar.gz (61.5 kB view details)

Uploaded Source

File details

Details for the file yake-0.2.0.tar.gz.

File metadata

  • Download URL: yake-0.2.0.tar.gz
  • Upload date:
  • Size: 61.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for yake-0.2.0.tar.gz
Algorithm Hash digest
SHA256 7f1d4bde24808b4b7d9346827b6cfe224f1852c76e877f33722a6471321305d3
MD5 1e9c8cbfd9e4f8179360d318759cfd59
BLAKE2b-256 e868c94d377fa0f2ea030f4efe3e26b466daa314e64f8ff30282573b3ba43212

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page