Python implementation of the Rapid Automatic Keyword Extraction algorithm using NLTK
Project description
RAKE short for Rapid Automatic Keyword Extraction algorithm, is a domain independent keyword extraction algorithm which tries to determine key phrases in a body of text by analyzing the frequency of word appearance and its co-occurance with other words in the text.
References
This is a python implementation of the algorithm as mentioned in paper Automatic keyword extraction from individual documents by Stuart Rose, Dave Engel, Nick Cramer and Wendy Cowley
Why I chose to implement it myself?
It is extremely fun to implement algorithms by reading papers. It is the digital equivalent of DIY kits.
There are some rather popular implementations out there, in python(aneesha/RAKE) and node(waseem18/node-rake) but neither seemed to use the power of NLTK. By making NLTK an integral part of the implementation I get the flexibility and power to extend it in other creative ways, if I see fit later, without having to implement everything myself.
I plan to use it in my other pet projects to come and wanted it to be modular and tunable and this way I have complete control.
Versions of python this code is tested against
2.7
3.4
3.5
3.6
Contributing
Bug Reports and Feature Requests
Please use issue tracker for reporting bugs or feature requests.
Development
Pull requests are most welcome.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for rake_nltk-1.0.0-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7e992647bd16902dd96a4d366eef5752b165633fc7b5f84de677a46589ac80b0 |
|
MD5 | 513ee50934e9ab55e8daf337a226d522 |
|
BLAKE2b-256 | cac024cdfd8759616348f586a9cf360219e43153d65b671e5d901f2531f744de |