Skip to main content

RAKE short for Rapid Automatic Keyword Extraction algorithm, is a domain independent keyword extraction algorithm which tries to determine key phrases in a body of text by analyzing the frequency of word appearance and its co-occurance with other words in the text.

Project description

pypiv pyv Licence Build Status Coverage Status

RAKE short for Rapid Automatic Keyword Extraction algorithm, is a domain independent keyword extraction algorithm which tries to determine key phrases in a body of text by analyzing the frequency of word appearance and its co-occurance with other words in the text.

Demo

Features

  • Ridiculously simple interface.

  • Configurable word and sentence tokenizers, language based stop words etc

  • Configurable ranking metric.

Setup

Using pip

pip install rake-nltk

Directly from the repository

git clone https://github.com/csurfer/rake-nltk.git
python rake-nltk/setup.py install

Quick Start

from rake_nltk import Rake

# Uses stopwords for english from NLTK, and all puntuation characters by
# default
r = Rake()

# Extraction given the text.
r.extract_keywords_from_text(<text to process>)

# Extraction given the list of strings where each string is a sentence.
r.extract_keywords_from_sentences(<list of sentences>)

# To get keyword phrases ranked highest to lowest.
r.get_ranked_phrases()

# To get keyword phrases ranked highest to lowest with scores.
r.get_ranked_phrases_with_scores()

Debugging Setup

If you see a stopwords error, it means that you do not have the corpus stopwords downloaded from NLTK. You can download it using command below.

python -c "import nltk; nltk.download('stopwords')"

References

This is a python implementation of the algorithm as mentioned in paper Automatic keyword extraction from individual documents by Stuart Rose, Dave Engel, Nick Cramer and Wendy Cowley

Why I chose to implement it myself?

  • It is extremely fun to implement algorithms by reading papers. It is the digital equivalent of DIY kits.

  • There are some rather popular implementations out there, in python(aneesha/RAKE) and node(waseem18/node-rake) but neither seemed to use the power of NLTK. By making NLTK an integral part of the implementation I get the flexibility and power to extend it in other creative ways, if I see fit later, without having to implement everything myself.

  • I plan to use it in my other pet projects to come and wanted it to be modular and tunable and this way I have complete control.

Contributing

Bug Reports and Feature Requests

Please use issue tracker for reporting bugs or feature requests.

Development

  1. Checkout the repository.

  2. Make your changes and add/update relavent tests.

  3. Install `poetry` using `pip install poetry`.

  4. Run `poetry install` to create project’s virtual environment.

  5. Run tests using `poetry run tox` (Any python versions which you don’t have checked out will fail this). Fix failing tests and repeat.

  6. Make documentation changes that are relavant.

  7. Install `pre-commit` using `pip install pre-commit` and run `pre-commit run –all-files` to do lint checks.

  8. Generate documentation using `poetry run sphinx-build -b html docs/ docs/_build/html`.

  9. Generate `requirements.txt` for automated testing using `poetry export –dev –without-hashes -f requirements.txt > requirements.txt`.

  10. Commit the changes and raise a pull request.

Buy the developer a cup of coffee!

If you found the utility helpful you can buy me a cup of coffee using

Donate

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rake-nltk-1.0.6.tar.gz (9.1 kB view details)

Uploaded Source

Built Distribution

rake_nltk-1.0.6-py3-none-any.whl (9.1 kB view details)

Uploaded Python 3

File details

Details for the file rake-nltk-1.0.6.tar.gz.

File metadata

  • Download URL: rake-nltk-1.0.6.tar.gz
  • Upload date:
  • Size: 9.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.8 CPython/3.8.10 Linux/5.11.0-25-generic

File hashes

Hashes for rake-nltk-1.0.6.tar.gz
Algorithm Hash digest
SHA256 7813d680b2ce77b51cdac1757f801a87ff47682c9dbd2982aea3b66730346122
MD5 f916f6b2ceb4e191bc61ccf6e9d7c16f
BLAKE2b-256 dab153392b9ba76fdb1e9de3198f63eb1cb92529c80201e0709162d140134b30

See more details on using hashes here.

File details

Details for the file rake_nltk-1.0.6-py3-none-any.whl.

File metadata

  • Download URL: rake_nltk-1.0.6-py3-none-any.whl
  • Upload date:
  • Size: 9.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.8 CPython/3.8.10 Linux/5.11.0-25-generic

File hashes

Hashes for rake_nltk-1.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 1c1ffdb64cae8cb99d169d53a5ffa4635f1c4abd3a02c6e22d5d083136bdc5c1
MD5 47317c3149911d055055cbf3b9d44143
BLAKE2b-256 3be518876d587142df57b1c70ef752da34664bb7dd383710ccf3ccaefba2aa0c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page