A Python library that enables smooth keyword extraction from any text using the RAKE(Rapid Automatic Keyword Extraction) algorithm.

These details have not been verified by PyPI

Project links

Homepage

Project description

rake_new2

rake_new2 is a Python library that enables simple and fast keyword extraction from any text. As the name implies, this library works on the RAKE(Rapid Automatic Keyword Extraction) algorithm.

It tries to determine the key phrases in a text by calculating the co-occurrences of every word in a key phrase and also its frequency in the entire text.

Demo

New in this version

Handles repetitive keywords/key-phrases
Handles consecutive punctuations.
Handles HTML tags in text : The user is allowed an option to choose if they want to keep HTML tags as keywords too.

Demo 2

Installation

Use the package manager pip to install rake_new2.

pip install rake_new2

Quick Start

from rake_new2 import Rake

text = "Red apples are good in taste."
text2 = "<h1> Hello world !</h1>"
rk,rk_new1,rk_new2 = Rake(),Rake(keep_html_tags=True),Rake(keep_html_tags=False)

# Case 1
# Initialize
rk.get_keywords_from_raw_text(text)
kw_s = rk.get_keywords_with_scores()  
# Returns keywords with degree scores : {(1.0, 'taste'), (1.0, 'good'), (4.0, 'red apples')}
kw = rk.get_ranked_keywords() 
# Returns keywords only : ['red apples', 'taste', 'good']
f = rk.get_word_freq()
# Returns word frequencies as a Counter object : {'red': 1, 'apples': 1, 'good': 1, 'taste': 1}
deg = rk.get_kw_degree()
# Returns word degrees as defaultdict object : {'red': 2.0, 'apples': 2.0, 'good': 1.0, 'taste': 1.0}

# Case 2 : Sample case for testing the 'keep_html_tags' parameter. Default = False
print("\nORIGINAL TEXT : {}".format(text))
# Sub Case 1 : Keeping the HTMLtags
rk_new1.get_keywords_from_raw_text(text2)
kw_s1 = rk_new1.get_keywords_with_scores()
kw1 = rk_new1.get_ranked_keywords()
print("Keeping the tags : ",kw1)

# Sub Case 2 : Eliminating the HTML tags
rk_new2.get_keywords_from_raw_text(text2)
kw_s2 = rk_new2.get_keywords_with_scores()
kw2 = rk_new2.get_ranked_keywords()
print("Eliminating the tags : ",kw2)

'''OUTPUT >>
ORIGINAL TEXT : <h1> Hello world !</h1>
Keeping the tags :  {'h1', 'hello'}
Eliminating the tags :  {'hello world'} 
'''

Debugging

You might come across a stopwords error.

It implies that you do not have the stopwords corpus downloaded from NLTK.

To download it, use the command below.

python -c "import nltk; nltk.download('stopwords')"

Contributing

Pull requests are welcome. For major changes, please open an issue first to discuss what you would like to change.

Please make sure to update tests as appropriate.

License

MIT

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.0.5

Jul 31, 2020

1.0.4

Jul 31, 2020

1.0.3

Jul 31, 2020

1.0.2

Jul 31, 2020

1.0.0

Jul 30, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rake_new2-1.0.5.tar.gz (4.9 kB view details)

Uploaded Jul 31, 2020 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rake_new2-1.0.5-py3-none-any.whl (6.2 kB view details)

Uploaded Jul 31, 2020 Python 3

File details

Details for the file rake_new2-1.0.5.tar.gz.

File metadata

Download URL: rake_new2-1.0.5.tar.gz
Upload date: Jul 31, 2020
Size: 4.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.3

File hashes

Hashes for rake_new2-1.0.5.tar.gz
Algorithm	Hash digest
SHA256	`6a48686b20d048c2294a89146c93ed0b73674f997e30d2419f0ae48344e845c3`
MD5	`c402ba5e60a396e776da120fa8265a6f`
BLAKE2b-256	`ac355ddfa8b5dc9eef44e1a61ff01157ed89f3ad6024f96e9c4f7b08858a7b7b`

See more details on using hashes here.

File details

Details for the file rake_new2-1.0.5-py3-none-any.whl.

File metadata

Download URL: rake_new2-1.0.5-py3-none-any.whl
Upload date: Jul 31, 2020
Size: 6.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.32.1 CPython/3.7.3

File hashes

Hashes for rake_new2-1.0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d846098a213a1c8d71ed0b858e31a2a9b01c2c884dfbc541e3582594061fd1b7`
MD5	`330e2ee2e6cf9e324a21108c8399cbe1`
BLAKE2b-256	`92cc735ecde47d1eac530558039742d6295452ae142965fb830069dde6b58678`

See more details on using hashes here.

rake-new2 1.0.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

rake_new2

New in this version

Installation

Quick Start

Debugging

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes