Extraction most important keywords from any website
Project description
Keyword-Extractor-for-Websites-with-NLP-application
A tool for Extracting keywords and finding the relevant words with cosine similarity, Textrank Algorithm and KeyBERT tranformer model.
Keyword Extractor: A Smart, Automatic, Fast and Lightweight Keyword Extractor with Deep Learning Application with Python
This project is made for keyword extraction of the semantically similar words from websites and their meta data. Can be used in SEO, marketing and few other related applications. It gets a url or the html content of a web page and a list of sample data which we want to scrape from that page. This data can be text, url or any html tag value of that page.
Installation
It's compatible with python 3.
- Install latest version from git repository using pip:
$ pip install git+https://github.com/elvinaqa/Keyword-Extractor-for-Websites.git
Also
$ pip install -r requirements.txt
How to use
How it seems?
Results
Example code from the project
from gensim.summarization import keywords
from gensim.summarization.keywords import get_graph
import networkx as nx
import matplotlib.pyplot as plt
#
# if __name__ == "__main__":
# text = "Keywords extraction is a subtask of the Information Extraction field which is responsible for extracting keywords from a given text or from a collection of texts to help us summarize the content. This is useful in the context of the huge amount of information we deal with every day. We need to index this information, to organise it and retrieve it later. Keywords extraction becomes more and more important these days and keywords extraction algorithms are researched and improved continuously."
#
# print(keywords(text).split('\n'))
def displayGraph(textGraph):
graph = nx.Graph()
for edge in textGraph.edges():
graph.add_node(edge[0])
graph.add_node(edge[1])
graph.add_weighted_edges_from([(edge[0], edge[1], textGraph.edge_weight(edge))])
textGraph.edge_weight(edge)
pos = nx.spring_layout(graph)
plt.figure()
nx.draw(graph, pos, edge_color='black', width=1, linewidths=1,
node_size=500, node_color='seagreen', alpha=0.9,
labels={node: node for node in graph.nodes()})
plt.axis('off')
plt.show()
if __name__=="__main__":
text = "Keywords extraction is a subtask of the Information Extraction field which is responsible for extracting keywords from a given text or from a collection of texts to help us summarize the content. This is useful in the context of the huge amount of information we deal with every day. We need to index this information, to organise it and retrieve it later. Keywords extraction becomes more and more important these days and keywords extraction algorithms are researched and improved continuously."
displayGraph(get_graph(text))
The output is the summarizing version of the text while selecting the most important words from the list
[
]
Issues
Feel free to open an issue if you have any problem using the module.
Support the project
Happy Coding ♥️
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file summarext-0.0.1.tar.gz
.
File metadata
- Download URL: summarext-0.0.1.tar.gz
- Upload date:
- Size: 4.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.23.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5765340c12cf2c40ac1b8712c6fdd49c699a2689cc6f44c6fed449e535e15755 |
|
MD5 | b1c1e61dbb1404f491d5b952e3513672 |
|
BLAKE2b-256 | 730f5d2e3ea8d9d1504b692e677a0e52384fe212f7498689c52110a5786af714 |
File details
Details for the file summarext-0.0.1-py3-none-any.whl
.
File metadata
- Download URL: summarext-0.0.1-py3-none-any.whl
- Upload date:
- Size: 4.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.23.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b89a02df8d29ddab487a42ab6d02dc4cab39ef8907cd884539fde9c662acd207 |
|
MD5 | df3113d031591804d2f6ae333e6454fc |
|
BLAKE2b-256 | 3b53853ff12f474083442e6e7faaa6f7b8e0b95f703c1768280a4e3a8a447c0d |