Skip to main content
Join the official 2020 Python Developers SurveyStart the survey!

Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK

Project description

pypiv pyv Licence Build Status Coverage Status Thanks

RAKE short for Rapid Automatic Keyword Extraction algorithm, is a domain independent keyword extraction algorithm which tries to determine key phrases in a body of text by analyzing the frequency of word appearance and its co-occurance with other words in the text.



Using pip

pip install rake-nltk

Directly from the repository

git clone
python rake-nltk/ install

Post setup

If you see a stopwords error, it means that you do not have the corpus stopwords downloaded from NLTK. You can download it using command below.

python -c "import nltk;'stopwords')"

Basic Usage

from rake_nltk import Rake

r = Rake() # Uses stopwords for english from NLTK, and all puntuation characters.

r.extract_keywords_from_text(<text to process>)

r.get_ranked_phrases() # To get keyword phrases ranked highest to lowest.

Advanced Usage

from rake_nltk import Metric, Rake

# To use it with a specific language supported by nltk.
r = Rake(language=<language>)

# If you want to provide your own set of stop words and punctuations to
r = Rake(
    stopwords=<list of stopwords>,
    punctuations=<string of puntuations to ignore>

# If you want to control the metric for ranking. Paper uses d(w)/f(w) as the
# metric. You can use this API with the following metrics:
# 1. d(w)/f(w) (Default metric) Ratio of degree of word to its frequency.
# 2. d(w) Degree of word only.
# 3. f(w) Frequency of word only.

r = Rake(ranking_metric=Metric.DEGREE_TO_FREQUENCY_RATIO)
r = Rake(ranking_metric=Metric.WORD_DEGREE)
r = Rake(ranking_metric=Metric.WORD_FREQUENCY)

# If you want to control the max or min words in a phrase, for it to be
# considered for ranking you can initialize a Rake instance as below:

r = Rake(min_length=2, max_length=4)


This is a python implementation of the algorithm as mentioned in paper Automatic keyword extraction from individual documents by Stuart Rose, Dave Engel, Nick Cramer and Wendy Cowley

Why I chose to implement it myself?

  • It is extremely fun to implement algorithms by reading papers. It is the digital equivalent of DIY kits.
  • There are some rather popular implementations out there, in python(aneesha/RAKE) and node(waseem18/node-rake) but neither seemed to use the power of NLTK. By making NLTK an integral part of the implementation I get the flexibility and power to extend it in other creative ways, if I see fit later, without having to implement everything myself.
  • I plan to use it in my other pet projects to come and wanted it to be modular and tunable and this way I have complete control.


Bug Reports and Feature Requests

Please use issue tracker for reporting bugs or feature requests.


Pull requests are most welcome.

Buy the developer a cup of coffee!

If you found the utility helpful you can buy me a cup of coffee using


Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for rake-nltk, version 1.0.4
Filename, size File type Python version Upload date Hashes
Filename, size rake_nltk-1.0.4.tar.gz (7.6 kB) File type Source Python version None Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page