A tool designed to quickly parse html tags and elements.
Project description
htmltagparse
A tool designed to quickly parse html tags and elements.
Prerequisites
- Pip packages:
- timeoutcall==1.*
- beautifulsoup4==4.*
- html5lib==1.*
Usage
Reading Page Titles
Firstly, if you would like to view a page title alone, you could use the titleFromUri
function:
from htmltagparse import titleFromUri
websiteTitle = titleFromUri("https://github.com/")
print(websiteTitle) # output: GitHub: Let’s build from here · GitHub
Building Pages
Building Pages via URI
from htmltagparse import build
brave = build.fromUri("https://search.brave.com/", timeout=20)
print(brave.tags) #list of tags found on the specified page
print(brave.searchTag("footer")) #displays a list of innerHtml content to the footer tags
print(brave.searchTag("footer", htmlFormat=False)[0]) #output: © Brave Software Brave Search API Summarizer Helpful answers Report a security issue
Building Pages via HTML
from htmltagparse import NewPage
from requests import get
htmlContent = get("https://duckduckgo.com/").text
ddg = NewPage(htmlContent)
print(list(ddg.sources)) #output: script
Searching A Page
With this package, you have the ability to search the html page you have created directly through a function:
from htmltagparse import build
import re
videoId = ""
page = build.fromUri("https://www.youtube.com/watch?v=%s" % videoId)
try:
#NOTE: the regex function already has re's MULTILINE and DOTALL flags in use
#get a list of tags to the youtube video via this regex pattern
videoTags = page.regex(r"\"keywords\":(?P<tags>\[.*?),\"channelId\":").group("tags")
#converting from string to array
videoTags = re.findall(r"(?:\"|\')(?P<tag>.*?)(?:\'|\")(?:\,|\])", videoTags)
except:
videoTags = "no tags found"
print(videoTags)
Developers
Building To Wheel File
- cd into root directory of this repository
- run
python3 -m build
[!NOTE] Errors building this package may be due to external package requirements, if this occurs, use
python3 -m build -n
instead.
Contributions
Must not include:
- Major changes
- Breaking code
- Changes to version number
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
htmltagparse-1.0.tar.gz
(5.2 kB
view hashes)
Built Distribution
Close
Hashes for htmltagparse-1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6be8131599148d6560d2c8ff6a75ca84e09d61d2ec8ea911b08d118325d16afa |
|
MD5 | 5a2fa9db2d6c91507cba3cb0186c8fff |
|
BLAKE2b-256 | c50c56eac6443fbd0a31d45971a5e21d8712cede15e62a96e8d4cbaef8e92d40 |