Skip to main content

A tool designed to quickly parse html tags and elements.

Project description

htmltagparse

A tool designed to quickly parse html tags and elements.

Prerequisites

  • Pip packages:
    • timeoutcall==1.*
    • beautifulsoup4==4.*
    • html5lib==1.*

Usage

Reading Page Titles

Firstly, if you would like to view a page title alone, you could use the titleFromUri function:

from htmltagparse import titleFromUri

websiteTitle = titleFromUri("https://github.com/")
print(websiteTitle) # output: GitHub: Let’s build from here · GitHub

Building Pages

Building Pages via URI

from htmltagparse import build

brave = build.fromUri("https://search.brave.com/", timeout=20)
print(brave.tags) #list of tags found on the specified page
print(brave.searchTag("footer")) #displays a list of innerHtml content to the footer tags
print(brave.searchTag("footer", htmlFormat=False)[0]) #output: © Brave Software Brave Search API Summarizer Helpful answers Report a security issue

Building Pages via HTML

from htmltagparse import NewPage
from requests import get

htmlContent = get("https://duckduckgo.com/").text
ddg = NewPage(htmlContent)
print(list(ddg.sources)) #output: script

Searching A Page

With this package, you have the ability to search the html page you have created directly through a function:

from htmltagparse import build
import re

videoId = ""
page = build.fromUri("https://www.youtube.com/watch?v=%s" % videoId)
try:
  #NOTE: the regex function already has re's MULTILINE and DOTALL flags in use
  #get a list of tags to the youtube video via this regex pattern
  videoTags = page.regex(r"\"keywords\":(?P<tags>\[.*?),\"channelId\":").group("tags")
  #converting from string to array
  videoTags = re.findall(r"(?:\"|\')(?P<tag>.*?)(?:\'|\")(?:\,|\])", videoTags)
except:
  videoTags = "no tags found"

print(videoTags)

Developers

Building To Wheel File

  • cd into root directory of this repository
  • run python3 -m build

[!NOTE] Errors building this package may be due to external package requirements, if this occurs, use python3 -m build -n instead.

Contributions

Must not include:

  • Major changes
  • Breaking code
  • Changes to version number

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

htmltagparse-1.0.tar.gz (5.2 kB view hashes)

Uploaded Source

Built Distribution

htmltagparse-1.0-py3-none-any.whl (5.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page