Skip to main content

Extraction most important keywords from any website

Project description

Keyword-Extractor-for-Websites-with-NLP-application

A tool for Extracting keywords and finding the relevant words with cosine similarity, Textrank Algorithm and KeyBERT tranformer model.

Keyword Extractor: A Smart, Automatic, Fast and Lightweight Keyword Extractor with Deep Learning Application with Python

img

This project is made for keyword extraction of the semantically similar words from websites and their meta data. Can be used in SEO, marketing and few other related applications. It gets a url or the html content of a web page and a list of sample data which we want to scrape from that page. This data can be text, url or any html tag value of that page.

Installation

It's compatible with python 3.

  • Install latest version from git repository using pip:
$ pip install git+https://github.com/elvinaqa/Keyword-Extractor-for-Websites.git

Also

$ pip install -r requirements.txt 

How to use

How it seems?

img img

Results

img

Example code from the project

from gensim.summarization import keywords
from gensim.summarization.keywords import get_graph
import networkx as nx
import matplotlib.pyplot as plt
#
# if __name__ == "__main__":
#     text = "Keywords extraction is a subtask of the Information Extraction field which is responsible for extracting keywords from a given text or from a collection of texts to help us summarize the content. This is useful in the context of the huge amount of information we deal with every day. We need to index this information, to organise it and retrieve it later. Keywords extraction becomes more and more important these days and keywords extraction algorithms are researched and improved continuously."
#
#     print(keywords(text).split('\n'))

def displayGraph(textGraph):

    graph = nx.Graph()
    for edge in textGraph.edges():
        graph.add_node(edge[0])
        graph.add_node(edge[1])
        graph.add_weighted_edges_from([(edge[0], edge[1], textGraph.edge_weight(edge))])

        textGraph.edge_weight(edge)
    pos = nx.spring_layout(graph)
    plt.figure()
    nx.draw(graph, pos, edge_color='black', width=1, linewidths=1,
            node_size=500, node_color='seagreen', alpha=0.9,
            labels={node: node for node in graph.nodes()})
    plt.axis('off')
    plt.show()

if __name__=="__main__":

    text = "Keywords extraction is a subtask of the Information Extraction field which is responsible for extracting keywords from a given text or from a collection of texts to help us summarize the content. This is useful in the context of the huge amount of information we deal with every day. We need to index this information, to organise it and retrieve it later. Keywords extraction becomes more and more important these days and keywords extraction algorithms are researched and improved continuously."
    displayGraph(get_graph(text))

The output is the summarizing version of the text while selecting the most important words from the list

[

]

Issues

Feel free to open an issue if you have any problem using the module.

Support the project

Buy Me A Coffee

Happy Coding ♥️

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

summarext-0.0.1.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

summarext-0.0.1-py3-none-any.whl (4.9 kB view details)

Uploaded Python 3

File details

Details for the file summarext-0.0.1.tar.gz.

File metadata

  • Download URL: summarext-0.0.1.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.23.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3

File hashes

Hashes for summarext-0.0.1.tar.gz
Algorithm Hash digest
SHA256 5765340c12cf2c40ac1b8712c6fdd49c699a2689cc6f44c6fed449e535e15755
MD5 b1c1e61dbb1404f491d5b952e3513672
BLAKE2b-256 730f5d2e3ea8d9d1504b692e677a0e52384fe212f7498689c52110a5786af714

See more details on using hashes here.

File details

Details for the file summarext-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: summarext-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 4.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.23.0 setuptools/49.2.0 requests-toolbelt/0.9.1 tqdm/4.48.0 CPython/3.8.3

File hashes

Hashes for summarext-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b89a02df8d29ddab487a42ab6d02dc4cab39ef8907cd884539fde9c662acd207
MD5 df3113d031591804d2f6ae333e6454fc
BLAKE2b-256 3b53853ff12f474083442e6e7faaa6f7b8e0b95f703c1768280a4e3a8a447c0d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page