A tool designed to quickly parse html tags and elements.
Project description
htmltagparse
A tool designed to quickly parse HTML tags and elements.
Prerequisites
-
Pip packages:
- beautifulsoup4==4.*
- html5lib==1.*
- requests==2.*
-
Optional packages:
- timeoutcall==1.*
Usage
Reading Page Titles
Firstly, if you would like to view page info alone, you could use a few functions for this:
import htmltagparse
title = htmltagparse.titleFromUri("https://github.com/")
print(title) # output: GitHub: Let’s build from here · GitHub
metadata = htmltagparse.metadataFromUri("https://github.com/") # meta tags from github
Building Pages
Building Pages via URI
from htmltagparse import build
brave = build.fromUri("https://search.brave.com/")
print(brave.response) #output: (200, 'OK')
print(brave.tags) #list of tags found on the specified page
print(brave.elapsed) #the time taken to create the html page class
print(brave.title) #title of the html page
This is not limited to these values alone; there are more values associated with an html page.
Building Pages via HTML
from htmltagparse import HtmlPage
from requests import get
htmlContent = get("https://duckduckgo.com/").text
ddg = HtmlPage(htmlContent)
print(list(ddg.sources)) #output: ['script']
Searching A Page
With this package, you have the ability to search the html page you have created directly through a function:
from htmltagparse import build
import re
videoId = ""
page = build.fromUri("https://www.youtube.com/watch?v=%s" % videoId)
try:
#NOTE: the regex function already has re's MULTILINE and DOTALL flags in use
#get a list of tags to the youtube video via this regex pattern
videoTags = page.regex(r"\"keywords\":(?P<tags>\[.*?),\"channelId\":").group("tags")
#converting from string to array
videoTags = re.findall(r"(?:\"|\')(?P<tag>.*?)(?:\'|\")(?:\,|\])", videoTags)
except:
videoTags = "no tags found"
print(videoTags)
Another way you could get tags from a Youtube video is with the find function, example:
import htmltagparse
videoId = "" #video id here
yt = htmltagparse.build.fromUri("https://www.youtube.com/watch?v=%s" % videoId)
elTagOpening = yt.find("meta", attrs={"name": "keywords"})[0]
videoKeywords = htmltagparse.getElementAttributeValue(elTagOpening, "content").split(", ")
print(videoKeywords) # tags of the youtube video
Developers
Building to Wheel File
- cd into root directory of this repository
- run
python3 -m build
[!NOTE] Errors building this package may be due to this packages requirements, if this occurs, use
python3 -m build -ninstead.
Contributions
Must not include:
- Major changes
- Breaking code
- Changes to version number
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file htmltagparse-3.1.tar.gz.
File metadata
- Download URL: htmltagparse-3.1.tar.gz
- Upload date:
- Size: 9.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4f45a0282a0269f285ec8e60be5886421c1f7ae27816e9818f9ec79f71aff5bb
|
|
| MD5 |
3a635b84298f0ee2c07fb89fe0689cbd
|
|
| BLAKE2b-256 |
131e35a2844ce4ffeecca03476114fc999cbaa961fdaf0e72b5298f2671b8a5e
|
File details
Details for the file htmltagparse-3.1-py3-none-any.whl.
File metadata
- Download URL: htmltagparse-3.1-py3-none-any.whl
- Upload date:
- Size: 9.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4f80e340c7e8a027a3f1fac0f474949599505b35f70ac1d6679cc9f77671502d
|
|
| MD5 |
5205d29e58a57b411d24c6d274dfc1fc
|
|
| BLAKE2b-256 |
09380649c7481503dc535899071fc2f17b7bd420dfd4b10e6810f05bf226839f
|