A tool designed to quickly parse html tags and elements.

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- POSIX :: Linux
Programming Language
Topic
- Text Processing :: Markup :: HTML
- Utilities

Project description

htmltagparse

A tool designed to quickly parse html tags and elements.

Prerequisites

Pip packages:
- timeoutcall==1.*
- beautifulsoup4==4.*
- html5lib==1.*

Usage

Reading Page Titles

Firstly, if you would like to view a page title alone, you could use the titleFromUri function:

from htmltagparse import titleFromUri

websiteTitle = titleFromUri("https://github.com/")
print(websiteTitle) # output: GitHub: Let’s build from here · GitHub

Building Pages

Building Pages via URI

from htmltagparse import build

brave = build.fromUri("https://search.brave.com/", timeout=20)
print(brave.tags) #list of tags found on the specified page
print(brave.searchTag("footer")) #displays a list of innerHtml content to the footer tags
print(brave.searchTag("footer", htmlFormat=False)[0]) #output: © Brave Software Brave Search API Summarizer Helpful answers Report a security issue

Building Pages via HTML

from htmltagparse import NewPage
from requests import get

htmlContent = get("https://duckduckgo.com/").text
ddg = NewPage(htmlContent)
print(list(ddg.sources)) #output: script

Searching A Page

With this package, you have the ability to search the html page you have created directly through a function:

from htmltagparse import build
import re

videoId = ""
page = build.fromUri("https://www.youtube.com/watch?v=%s" % videoId)
try:
  #NOTE: the regex function already has re's MULTILINE and DOTALL flags in use
  #get a list of tags to the youtube video via this regex pattern
  videoTags = page.regex(r"\"keywords\":(?P<tags>\[.*?),\"channelId\":").group("tags")
  #converting from string to array
  videoTags = re.findall(r"(?:\"|\')(?P<tag>.*?)(?:\'|\")(?:\,|\])", videoTags)
except:
  videoTags = "no tags found"

print(videoTags)

Developers

Building To Wheel File

cd into root directory of this repository
run python3 -m build

[!NOTE] Errors building this package may be due to external package requirements, if this occurs, use python3 -m build -n instead.

Contributions

Must not include:

Major changes
Breaking code
Changes to version number

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Intended Audience
- Developers
License
- OSI Approved :: MIT License
Operating System
- POSIX :: Linux
Programming Language
Topic
- Text Processing :: Markup :: HTML
- Utilities

Release history Release notifications | RSS feed

3.0

Apr 26, 2024

2.0

Apr 10, 2024

This version

1.0

Apr 3, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

htmltagparse-1.0.tar.gz (5.2 kB view hashes)

Uploaded Apr 3, 2024 Source

Built Distribution

htmltagparse-1.0-py3-none-any.whl (5.9 kB view hashes)

Uploaded Apr 3, 2024 Python 3

Hashes for htmltagparse-1.0.tar.gz

Hashes for htmltagparse-1.0.tar.gz
Algorithm	Hash digest
SHA256	`67deeeffb52b1d2ab2ad56c721b180a9e9d8140b7e1727fc5c7fa0dd15aff7c3`
MD5	`d39cd0c1e082db9baac8ffd698c35f98`
BLAKE2b-256	`415060646843308e5ea263676301bb98181d2aa4d9f8704426649a6987d4bc24`

Hashes for htmltagparse-1.0-py3-none-any.whl

Hashes for htmltagparse-1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6be8131599148d6560d2c8ff6a75ca84e09d61d2ec8ea911b08d118325d16afa`
MD5	`5a2fa9db2d6c91507cba3cb0186c8fff`
BLAKE2b-256	`c50c56eac6443fbd0a31d45971a5e21d8712cede15e62a96e8d4cbaef8e92d40`