A tool designed to quickly parse html tags and elements.
Project description
htmltagparse
A tool designed to quickly parse HTML tags and elements.
Prerequisites
-
Pip packages:
- beautifulsoup4==4.*
- html5lib==1.*
- requests==2.*
-
Optional packages:
- timeoutcall==1.*
Usage
Reading Page Titles
Firstly, if you would like to view page info alone, you could use a few functions for this:
import htmltagparse
title = htmltagparse.titleFromUri("https://github.com/")
print(title) # ouput: GitHub: Let’s build from here · GitHub
metadata = htmltagparse.metadataFromUri("https://github.com/") # meta tags from github
Building Pages
Building Pages via URI
from htmltagparse import build
brave = build.fromUri("https://search.brave.com/")
print(brave.response) #output: (200, 'OK')
print(brave.tags) #list of tags found on the specified page
print(brave.elapsed) #the time taken to create the html page class
print(brave.title) #title of the html page
This is not limited to these values alone; there are more values associated with an html page.
Building Pages via HTML
from htmltagparse import HtmlPage
from requests import get
htmlContent = get("https://duckduckgo.com/").text
ddg = HtmlPage(htmlContent)
print(list(ddg.sources)) #output: ['script']
Searching A Page
With this package, you have the ability to search the html page you have created directly through a function:
from htmltagparse import build
import re
videoId = ""
page = build.fromUri("https://www.youtube.com/watch?v=%s" % videoId)
try:
#NOTE: the regex function already has re's MULTILINE and DOTALL flags in use
#get a list of tags to the youtube video via this regex pattern
videoTags = page.regex(r"\"keywords\":(?P<tags>\[.*?),\"channelId\":").group("tags")
#converting from string to array
videoTags = re.findall(r"(?:\"|\')(?P<tag>.*?)(?:\'|\")(?:\,|\])", videoTags)
except:
videoTags = "no tags found"
print(videoTags)
Another way you could get tags from a Youtube video is with the find
function, example:
import htmltagparse
videoId = "" #video id here
yt = htmltagparse.build.fromUri("https://www.youtube.com/watch?v=%s" % videoId)
elTagOpening = yt.find("meta", attrs={"name": "keywords"})[0]
videoKeywords = htmltagparse.getElementAttributeValue(elTagOpening, "content").split(", ")
print(videoKeywords) # tags of the youtube video
Developers
Building to Wheel File
- cd into root directory of this repository
- run
python3 -m build
[!NOTE] Errors building this package may be due to this packages requirements, if this occurs, use
python3 -m build -n
instead.
Contributions
Must not include:
- Major changes
- Breaking code
- Changes to version number
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for htmltagparse-3.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | dd8fed787ee5dbba0efa78d0b10aa99e185b8fbbb46e3235e678b74038928f39 |
|
MD5 | 424878dd9ea3e45ea0e78532bba50640 |
|
BLAKE2b-256 | 4df2c704e7b4781df6812abdfc94215638acc4212a444a2f1772f8af97b38c3c |