This code can refactor your html in pretty dict
Project description
PrettyHTML
Overview
PrettyHTML is a Python library designed to refactor HTML code into a pretty dictionary format. It allows users to easily parse and structure HTML content, making it more readable and accessible for further processing or analysis. The library utilizes the BeautifulSoup parser to extract HTML elements and their attributes, effectively organizing the code into a user-friendly format.
Features
- Conversion of HTML code into a dictionary format
- Recursive element path extraction for easy navigation through the document structure
- Handling of HTML elements without class attributes
- Customizable element searching with optional class filtering
Structure
The PrettyHTML library consists of three main Python files:
html_handler.py: Contains theHandlerBlockclass that takes HTML code as input and refactors it into a dictionary structure.utillites.py: Houses theFinderclass, which uses BeautifulSoup to find elements in a given HTML code and provides methods for finding elements with or without class attributes.__init__.py: Initializes the PrettyHTML library by importing necessary functions and classes.
Usage
To use PrettyHTML in your project, you need to follow these steps:
-
Install PrettyHTML using pip:
pip install prettyhtml -
Import the required files in your Python script:
from handler_html.PrettyHTML import HandlerBlock from handler_html.utillites import Finder
-
Create an instance of the
HandlerBlockclass and pass the HTML code as input:block_code = """ <div> <h1>Hello, World!</h1> <p>This is some text inside a paragraph</p> </div> """ hb = HandlerBlock(block_code=block_code)
-
Call the
Handlermethod of theHandlerBlockinstance to obtain the refactored HTML code in dictionary format:output = hb.Handler() print(output)
By following these steps, users can easily reshape their HTML code into a more accessible and organized dictionary structure using the PrettyHTML library.
html_handler.py
This file provides a class for handling HTML content. The main goal is to parse HTML content and organize the elements into a dictionary structure.
Usage
To use the html_handler.py class, follow these steps:
-
Import the necessary utility classes from the same module by using either a local or absolute import. In this example, import the
utillitesclass.try: import utillites except: from . import utillites
-
Create an instance of the
HandlerBlockclass initialized with the HTML content as the block code.block_html = "" with open("test.txt", "r") as file: block_html = file.read() HB = HandlerBlock(block_code=block_html)
-
Invoke the
Handler()method on theHandlerBlockinstance to process the HTML content and get a dictionary containing the organized elements.out = HB.Handler()
Methods
__init__(self, block_code: str)
- Description: Initializes the
HandlerBlockclass with the HTML content as theblock_code. - Parameters:
block_code(str): The HTML content to be parsed.
Handler(self) -> dict
- Description: Processes the HTML content using the
__handler_element()method and returns a dictionary containing organized elements. - Parameters: None.
- Returns:
dict: A dictionary containing the organized elements.
__handler_element(self) -> dict
- Description:Processes the HTML content, finding the elements without a class, and organizes them into a dictionary.
- Parameters: None.
- Returns:
dict[str, list]: A dictionary containing the organized elements.
__get_item_path(self, finder: utillites.Finder, item) -> str
- Description: Recursively traverses the HTML content to determine the element's path and return it.
- Parameters:
finder(utillites.Finder): AFinderobject provided for traversing the HTML content.item(object): The current item being processed.
- Returns:
str: The item's path based on the parent/child relationships in the given HTML content.
Example
The __main__ section in the file demonstrates how to use the HandlerBlock class by reading HTML content from a file (test.txt), processing it, and printing its top-level keys. To use this example, replace "test.txt" with the path to your HTML file containing the content you would like to process.
if __name__ == "__main__":
block_html = ""
with open("test.txt", "r") as file:
block_html = file.read()
HB = HandlerBlock(block_code=block_html)
out = HB.Handler()
print(list(out.keys()))
utilities.py
This file contains a Finder class which is used to search for HTML elements within a provided HTML string. It utilizes the BeautifulSoup library to parse and extract HTML elements based on specified criteria.
Finder Class
The Finder class is initialized with an HTML code string, which is then parsed using the BeautifulSoup library.
Methods
__init__(self, html_code: str) -> None
- Initializes the
Finderclass with the providedhtml_codeparameter. - The
html_codeis parsed using theBeautifulSouplibrary with the 'html.parser'.
find_classes(self, type_item, class_name) -> list
- Searches for HTML elements with the specified
type_itemandclass_name. - The
type_itemparameter accepts either 'div', 'span', 'p', 'button', etc. - The
class_nameparameter is the class name of the desired HTML element. - Returns a list of matching HTML elements.
find_without_class(self) -> list
- Searches for all HTML elements without a class name.
- Returns a list of all matching HTML elements.
PrettyHTML: __init__.py
This module contains the main class for the PrettyHTML library, which primarily focuses on helping users format their HTML content in a more readable way by automating the process of adding indentation, line breaks, and additional formatting to HTML code. The main class is named PrettyHTML, created with an intention to be both developer-friendly and versatile, supporting a wide range of HTML generation and formatting scenarios.
Usage
To use the PrettyHTML module, you need to follow these simple steps:
- Import the
PrettyHTMLclass from the module. - Create an instance of the
PrettyHTMLclass. - Add HTML content using various built-in methods and properties.
- Generate formatted HTML by accessing the
formattedproperty of the class instance.
Here's a brief example:
from PrettyHTML import PrettyHTML
# Create an instance of PrettyHTML class
pretty_html = PrettyHTML()
# Add HTML content using built-in methods and properties
pretty_html.add_head('<meta charset="utf-8">')
pretty_html.add_script('src')
# Add arbitrary HTML content
pretty_html.add_html('<h1>Welcome to PrettyHTML!</h1>')
# Generate formatted HTML content
formatted_html = pretty_html.formatted
In this example, we created an instance of the PrettyHTML class, added content (head, script, and arbitrary HTML) using its methods, and accessed the formatted HTML content through the formatted property.
Methods and Properties
Constructor: init(self)
The __init__ constructor initializes a new PrettyHTML instance, setting up the template for HTML content. This method ensures that the class maintains its structure and formatting settings during object creation.
add_head(self, head_content: str)
After creating an instance of the PrettyHTML class, the add_head method enables users to insert content corresponding to the HTML <head> element. The head_content argument should be a valid HTML head content string.
add_script(self, script_content: str)
The add_script method facilitates adding JavaScript <script> content to the HTML head section. The script_content argument should be a valid JS script string.
add_html(self, html_content: str)
The add_html method allows users to add arbitrary HTML content to the class instance. Its html_content argument must be a valid string containing HTML code.
formatted(self) -> str
Finally, the formatted property returns a formatted string containing all added content (head, script, and HTML) in a properly formatted, readable manner.
By accessing the formatted property on the PrettyHTML instance, you obtain the complete, formatted, and styled HTML content, ready for usage in your projects or further manipulation as required.
pyproject.toml
This file describes the project's build requirements and dependencies. It is used by tools like Poetry to manage project's dependencies and build system.
Usage
To create a new pyproject.toml file with the basic structure, you can use the following command:
touch pyproject.toml
Then, open the pyproject.toml file and add the necessary fields. For example:
[tool.poetry]
name = "prettyhtml"
version = "0.8.5"
description = "This code can refactor your html in pretty dict"
authors = ["dima-on <sinica911@gmail.com>"]
license = "MIT"
readme = "README.md"
[tool.poetry.dependencies]
python = "^3.12"
bs4 = "^0.0.2"
[build-system]
requires = ["poetry-core"]
build-backend = "poetry.core.masonry.api"
Make sure to replace the content with the appropriate information for your project.
Methods
This file does not contain any methods, but it is essential for declaring dependencies and build requirements for your project.
[tool.poetry]: This section contains general project configuration fields like name, version, description, authors, license, and readme.[tool.poetry.dependencies]: This section specifies the dependencies your project requires, listing the package names and their specific versions or version ranges.[build-system]: This section describes the build system used for your project, specifying the required packages and the build-backend.
Make sure to refer to the official PyPA's pyproject-toml documentation for more information regarding the available fields and configurations for this file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pretty_html-0.9.1.tar.gz.
File metadata
- Download URL: pretty_html-0.9.1.tar.gz
- Upload date:
- Size: 5.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.5 CPython/3.12.7 Windows/11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
270e5c3d2e3e94668350c4e0d86800298086ea286b671d21b4a8f98a6876254b
|
|
| MD5 |
8e95a9b2e329fdee16e17d208778b23a
|
|
| BLAKE2b-256 |
e39d67adb43317e38657d282753df6233c7d77ce11d28543ea427507bdcd08c6
|
File details
Details for the file pretty_html-0.9.1-py3-none-any.whl.
File metadata
- Download URL: pretty_html-0.9.1-py3-none-any.whl
- Upload date:
- Size: 5.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.8.5 CPython/3.12.7 Windows/11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6cbe3b514a3b2caf1171480f9fe63a9be32cc93115d0c1de00cbe0681ff94702
|
|
| MD5 |
a4f807e8e8997b384091cf33f820525b
|
|
| BLAKE2b-256 |
753b56d12f2c9ec2d1c598b6ddf9c9655fb0019dcd13cc8c3cd2d9e6a714dacf
|