Skip to main content

selenium-seo extends Selenium by providing a keyword analyzer on a given webpage

Project description

SeleniumSEO

SeleniumSEO is a Python package designed to help with SEO keyword extraction from web pages using Selenium and BeautifulSoup. It allows you to extract and analyze content from specified HTML tags (like paragraphs, headings, and lists), clean the data by removing stop words, and generate a keyword frequency report. This is particularly useful for SEO professionals and web scraping enthusiasts looking to analyze web pages for SEO optimization.

Features

Extracts text from common HTML tags like <p>, <h1>, <h2>, <h3>, and <li>.

  • Cleans the extracted text by removing common stop words.
  • Removes non-alphabetical characters to focus on meaningful keywords.
  • Provides a frequency count of keywords from a webpage.
  • Easily configurable to specify custom tags and stop words.

Installation

You can install selenium-seo via pip. First, make sure you have Selenium and BeautifulSoup installed, as they are required dependencies:

pip install selenium beautifulsoup4

Then, install selenium-seo:

pip install selenium-seo

Usage

Here's a basic example of how to use the SeleniumSEO class to extract and analyze keywords from a webpage:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium_seo import SeleniumSEO

# Initialize WebDriver (Chrome in this case)
driver = webdriver.Chrome()

# Specify the URL to scrape
url = "https://www.example.com"
driver.get(url)

# Wait for the page to fully load
WebDriverWait(driver, 10).until(lambda driver: driver.execute_script("return document.readyState") == "complete")

# Initialize the SeleniumSEO class with the driver
seo = SeleniumSEO(driver)

# Get the cleaned keyword frequencies
keywords = seo.process_keywords()

# Print the keywords and their counts
for word, count in keywords:
    print(f"{word}: {count}")

# Close the driver after scraping
driver.quit()

Available Methods

process_keywords() Extracts keywords from the page and returns a sorted list of keywords with their frequency count.

set_tags(tags) Sets the tags to be processed (default: ['p', 'h1', 'h2', 'h3', 'li']).

get_tags() Returns the current tags being processed.

set_ignore_words(words) Sets the list of stop words to ignore during keyword extraction.

get_ignore_words() Returns the current list of stop words being ignored.

License

This project is licensed under the terms of the Apache License 2.0.

Acknowledgments

Selenium for browser automation. BeautifulSoup for HTML parsing.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

selenium_seo-1.0.1.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

selenium_seo-1.0.1-py3-none-any.whl (8.0 kB view details)

Uploaded Python 3

File details

Details for the file selenium_seo-1.0.1.tar.gz.

File metadata

  • Download URL: selenium_seo-1.0.1.tar.gz
  • Upload date:
  • Size: 8.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.0

File hashes

Hashes for selenium_seo-1.0.1.tar.gz
Algorithm Hash digest
SHA256 d2b2e5188bb79478d75a3e71027a30ab7a8468e0299d3fcdfd7f31a71f9a8592
MD5 882320429c513a0068e3391c8ddc48c5
BLAKE2b-256 3a61c40be9142ee84b4e337535f123c4d79ea259208bf1c93514a2fdf034b5d9

See more details on using hashes here.

File details

Details for the file selenium_seo-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: selenium_seo-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 8.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.0

File hashes

Hashes for selenium_seo-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 34badd953cd698f9c10fad9a412cd04fc19e09e2a3eeca1df7ff521210fddd20
MD5 fc44cbe8a0a972cc1abe3e6abe1ae9be
BLAKE2b-256 07776c1655f58c2ffde396894032a8f8324c9e3564f8f61d35c75e02649cd092

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page