No project description provided
Project description
Html Universal Identifier
Html Universal Identifier is an alpha version of an application designed for identifying server-side HTML parsers. This package provides a way to determine which HTML, SVG, and MathML tags are allowed based on a handler function that request HTML.
Features
- Identify allowed HTML, SVG, and MathML tags.
- Use a customizable handler function to process HTML payloads.
- Load and compare results against predefined Parser outputs.
- The class also maintains an
INCORRECT_PARSEDlist, which contains payloads that were incorrectly parsed by the handler. For example, this may include cases where the parser fails to remove nested forms and similar issues.
Installation
To install the package, use pip:
pip install hui
Usage
Here is a basic example of how to use the Identifier class from the package:
from hui.identify import Identifier
import requests
def handler(payload):
return requests.get("http://localhost:3005/sanitize", params={"html": payload}).text
a = Identifier(handler=handler, buffer_enabled=True, buffer_limit=32, debug_mode=True)
print(a.identify()) # Outputs the identification results
print(a.ALLOWED_TAGS) # Outputs the allowed tags
print(a.INCORRECT_PARSED) # Outputs the INCORRECT_PARSED tags
Identifier Class
The Identifier class is the core of this package. It is responsible for identifying allowed HTML, SVG, and MathML tags based on a handler function that processes HTML payloads.
The class also maintains an INCORRECT_PARSED list, which contains payloads that were incorrectly parsed by the handler. For example, this may include cases where the parser fails to remove nested forms and similar issues.
Current Parsers
The following parsers are currently supported in the project:
- DOMpurify with JSDOM (JS)
- JSDOM (JS)
- sanitize_html (JS)
- htmlparser2 (JS)
- html (python)
- lxml (python)
- html_sanitizer (python)
- net/html (go)
- bluemonday (go)
If you believe a new parser/sanitizer should be added, please create an issue, and I will be happy to include it.
Constructor Parameters
-
handler: A function that takes a payload and returns an HTML response. Example:lambda payload: requests.get(f"http://localhost:3000?payload={payload}").text
-
buffer_enabled(optional, default=False): A boolean flag to enable or disable buffering of payloads before sending them to the handler. By default, buffering is disabled, as it can sometimes lead to incorrect results. For example, some sanitizers may simply remove all input if it contains a dangerous tag. Use buffering only if the server you are interacting with has strict rate limits. -
buffer_delimeter(optional, default="TEXTTEXT"): A string used to delimit buffered payloads when sending them to the handler. -
buffer_limit(optional, default=32): An integer that specifies the maximum number of payloads to buffer before sending them to the handler. -
template_vars(optional, default=None): A dictionary of template variables to use for substitution in payloads. -
debug_mode(optional, default=False): A boolean flag to enable or disable debug logging.
Methods
check_allowed_tags(): Checks and populates theALLOWED_TAGSdictionary with allowed tags.call_handler(template_payloads: list[str]): Calls the handler function with a list of template payloads.check_namespace(namespace: str): Checks for allowed tags in the specified namespace (SVG or MathML).identify(): Identifies the best matching Parser based on generated payloads and returns a list of matches.
identify() Method
The identify() method checks if allowed tags have been determined. If not, it calls check_allowed_tags() to populate the ALLOWED_TAGS. It then loads a list of generated payloads from a JSON file and calls the handler for each payload. Finally, it compares the results against all JSON files in the results_parsers directory to count matches and returns a sorted list of results.
- Returns: A list of tuples, each containing:
- The match ratio (float)
- The number of matches (int)
- The name of the Parser (str)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hui-0.1.2.tar.gz.
File metadata
- Download URL: hui-0.1.2.tar.gz
- Upload date:
- Size: 14.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a7039b1a9eb87cc7a3c4f54bf99b2b3b947b5717a903f4d17a2b76aa51045197
|
|
| MD5 |
6222d16f186d214cfa8343c1cb6f8d4a
|
|
| BLAKE2b-256 |
58bae2c42c7d2fd52a953fe8b8a6f84d078125a87bf2a29ab82337b158495a87
|
File details
Details for the file hui-0.1.2-py3-none-any.whl.
File metadata
- Download URL: hui-0.1.2-py3-none-any.whl
- Upload date:
- Size: 20.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.13.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fe0e1ac39df406800b271ba08a4cb77987ade99b639f8e9b03bfb2b564a3fbb4
|
|
| MD5 |
820fd6e9420bc991f2c13c3f2fe02104
|
|
| BLAKE2b-256 |
379105b1f3739d88eed310ac11c255652052ba1fb3a312e6a9b35a13203ad24e
|