DOM nodes with browser rendering data for web automation

These details have not been verified by PyPI

Project links

Project description

domnode

DOM nodes with browser rendering data for web automation.

domnode is a Python library that provides DOM node types enriched with browser rendering information (computed styles, bounding boxes, CDP metadata). It includes parsers for HTML and Chrome DevTools Protocol (CDP) snapshots, plus powerful filtering utilities to extract only visible, semantic content.

Features

🌳 Rich DOM nodes: Includes computed styles, bounding boxes, and CDP backend node IDs
📦 Dual parsers: Parse from HTML strings or CDP snapshots
🎯 Smart filtering: Remove hidden elements, non-semantic attributes, and wrapper divs
🔍 Visibility detection: Handle display:none, visibility:hidden, opacity:0, zero-size elements
🏷️ Semantic extraction: Keep only meaningful attributes (role, aria-*, type, href, etc.)
🧹 Tree optimization: Collapse unnecessary wrapper elements
✅ Well-tested: 86 unit tests with comprehensive coverage

Installation

pip install domnode

Quick Start

from domnode import parse_html, filter_visible

# Parse HTML
html = """
<div>
    <script>console.log('hidden')</script>
    <div style="display: none">Hidden content</div>
    <button role="button" class="btn">Click me</button>
</div>
"""

root = parse_html(html)

# Filter to only visible elements
visible = filter_visible(root)

# Result: Only the button remains
for child in visible:
    print(child.tag, child.attrib)
# Output: button {'role': 'button', 'class': 'btn'}

Usage

Parsing HTML

from domnode.parsers import parse_html

html = '<div class="container"><button>Click</button></div>'
root = parse_html(html)

print(root.tag)          # 'div'
print(root.attrib)       # {'class': 'container'}
print(root.children[0])  # Node(tag='button', ...)

Parsing CDP Snapshots

from domnode.parsers import parse_cdp

# From Playwright/Puppeteer
snapshot = await page.cdp_session.send('DOMSnapshot.captureSnapshot', {
    'computedStyles': [],
    'includeDOMRects': True
})

root = parse_cdp(snapshot)
print(root.bounds)  # BoundingBox(x=0, y=0, width=1920, height=1080)
print(root.styles)  # {'display': 'block', 'position': 'static', ...}

Filtering - Visibility

Remove hidden and non-visible elements:

from domnode import parse_html, filter_visible

html = """
<div>
    <script>alert('hidden')</script>
    <style>.hide { display: none; }</style>
    <div style="display: none">Hidden</div>
    <div style="opacity: 0">Invisible</div>
    <button>Visible</button>
</div>
"""

root = parse_html(html)
visible = filter_visible(root)

# Only button remains
assert len(visible.children) == 1
assert visible.children[0].tag == 'button'

Filtering - Semantic

Keep only semantic attributes and clean structure:

from domnode import parse_html, filter_semantic

html = """
<div class="wrapper" id="container">
    <div class="inner">
        <button class="btn" role="button" aria-label="Submit">Click</button>
    </div>
</div>
"""

root = parse_html(html)
semantic = filter_semantic(root)

# Wrappers collapsed, only semantic attributes remain
assert semantic.tag == 'button'
assert semantic.attrib == {'role': 'button', 'aria-label': 'Submit'}

Filtering - All (Visibility + Semantic)

from domnode import parse_html, filter_all

html = """
<html>
    <head>
        <script src="app.js"></script>
    </head>
    <body class="page">
        <div class="wrapper">
            <button class="btn" role="button">Click</button>
        </div>
    </body>
</html>
"""

root = parse_html(html)
clean = filter_all(root)

# Head removed, wrappers collapsed, only semantic attributes
assert clean.tag == 'button'
assert clean.attrib == {'role': 'button'}

Granular Filtering

Use individual filters for fine-grained control:

from domnode.parsers import parse_html
from domnode.filters.visibility import filter_css_hidden, filter_zero_dimensions
from domnode.filters.semantic import filter_attributes, collapse_wrappers

root = parse_html(html)

# Apply specific filters
root = filter_css_hidden(root)
root = filter_attributes(root)
root = collapse_wrappers(root)

Working with Nodes

from domnode import Node, Text, BoundingBox

# Create nodes
div = Node(tag='div', attrib={'class': 'container'})
button = Node(
    tag='button',
    attrib={'role': 'button'},
    styles={'display': 'block'},
    bounds=BoundingBox(x=10, y=20, width=100, height=50)
)

# Build tree
div.append(Text('Click here: '))
div.append(button)
button.append(Text('Submit'))

# Navigate
for child in div:
    if isinstance(child, Node):
        print(f"Element: {child.tag}")
    elif isinstance(child, Text):
        print(f"Text: {child.content}")

# Get all text
print(div.get_text())  # "Click here: Submit"

# Check visibility
print(button.is_visible())      # True
print(button.has_zero_size())   # False

API Reference

Types

Node: DOM element with tag, attributes, styles, bounds, metadata, and children
Text: Text node with content
BoundingBox: Element bounding box (x, y, width, height)

Parsers

parse_html(html: str) -> Node: Parse HTML string to Node tree
parse_cdp(snapshot: dict) -> Node: Parse CDP snapshot to Node tree

Filters

Presets (convenience)

filter_visible(node) -> Node | None: Remove all hidden elements
filter_semantic(node) -> Node | None: Keep only semantic content
filter_all(node) -> Node | None: Apply all filters

Visibility Filters

filter_non_visible_tags(node): Remove script, style, head, meta, etc.
filter_css_hidden(node): Remove display:none, visibility:hidden, opacity:0
filter_zero_dimensions(node): Remove zero-width/height elements

Semantic Filters

filter_attributes(node, keep=SEMANTIC_ATTRIBUTES): Keep only semantic attributes
filter_empty(node): Remove empty nodes (no attributes, no children)
collapse_wrappers(node): Collapse single-child wrapper elements

Node Methods

node.append(child): Add a child node or text
node.remove(child): Remove a child
node.is_visible(): Check if element is visible (based on styles)
node.has_zero_size(): Check if element has zero dimensions
node.get_text(separator=''): Get all text content recursively

Architecture

domnode is designed as a foundational library for web automation:

┌─────────────────────┐
│   natural-selector  │  (RAG-based element selection)
│   (embeddings, LLM) │
└──────────┬──────────┘
           │ uses
┌──────────▼──────────┐
│     domcontext      │  (LLM context formatting)
│  (markdown, tokens) │
└──────────┬──────────┘
           │ uses
┌──────────▼──────────┐
│      domnode        │  (Core DOM + filtering)
│  (this package)     │
└─────────────────────┘

Semantic Attributes

By default, filter_attributes keeps these semantic attributes:

SEMANTIC_ATTRIBUTES = {
    "role", "aria-label", "aria-labelledby", "aria-describedby",
    "aria-checked", "aria-selected", "aria-expanded", "aria-hidden",
    "aria-disabled", "type", "name", "placeholder", "value",
    "alt", "title", "href", "disabled", "checked", "selected"
}

You can customize:

from domnode.filters.semantic import filter_attributes

custom_attrs = {"role", "href", "data-test-id"}
filtered = filter_attributes(node, keep=custom_attrs)

Use Cases

Web Scraping

Extract only visible, meaningful content from web pages.

Browser Automation

Filter DOM to only interactive elements for AI agents.

LLM Context

Reduce HTML to essential semantic structure for language models.

Accessibility Testing

Analyze semantic attributes and ARIA labels.

Testing

Build and manipulate DOM trees programmatically.

Development

# Clone repository
git clone https://github.com/yourusername/domnode.git
cd domnode

# Create virtual environment
python -m venv venv
source venv/bin/activate  # On Windows: venv\Scripts\activate

# Install in development mode
pip install -e ".[dev]"

# Run tests
pytest

# Run tests with coverage
pytest --cov=domnode --cov-report=html

Testing

The package includes 86 comprehensive unit tests covering:

Core node types and operations
HTML and CDP parsing
All visibility filters
All semantic filters
Preset filter combinations
Edge cases and error handling

pytest -v

License

MIT

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

Related Projects

domcontext: DOM to LLM context with markdown serialization
natural-selector: Natural language element selection with RAG

Changelog

0.1.0 (2025-01-XX)

Initial release
Core Node, Text, BoundingBox types
HTML and CDP parsers
Visibility and semantic filters
86 unit tests

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.2.0

Oct 19, 2025

0.1.1

Oct 19, 2025

This version

0.1.0

Oct 19, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

domnode-0.1.0.tar.gz (21.7 kB view details)

Uploaded Oct 19, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

domnode-0.1.0-py3-none-any.whl (19.6 kB view details)

Uploaded Oct 19, 2025 Python 3

File details

Details for the file domnode-0.1.0.tar.gz.

File metadata

Download URL: domnode-0.1.0.tar.gz
Upload date: Oct 19, 2025
Size: 21.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for domnode-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`ea88877682601976ef582afdda9e85379379aa1c419afc944dd62a62debf6af2`
MD5	`1315748d9031e0b98fb8e2560a6b3964`
BLAKE2b-256	`5b547c3426f9a703fc42387e44825fc773f09d12f93cec8879fe1c312644f0ab`

See more details on using hashes here.

File details

Details for the file domnode-0.1.0-py3-none-any.whl.

File metadata

Download URL: domnode-0.1.0-py3-none-any.whl
Upload date: Oct 19, 2025
Size: 19.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for domnode-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a6bbbf2e3835b8d8ab6bc7151f2dea2dec7d5cb9dbb85b78a89602cc60fae0fa`
MD5	`591d7041f28f8f4eb021171a83778859`
BLAKE2b-256	`42b9deec91da0920a655f8dddf912a03c792f810c2108e83ef94ce4c82788998`

See more details on using hashes here.

domnode 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

domnode

Features

Installation

Quick Start

Usage

Parsing HTML

Parsing CDP Snapshots

Filtering - Visibility

Filtering - Semantic

Filtering - All (Visibility + Semantic)

Granular Filtering

Working with Nodes

API Reference

Types

Parsers

Filters

Presets (convenience)

Visibility Filters

Semantic Filters

Node Methods

Architecture

Semantic Attributes

Use Cases

Web Scraping

Browser Automation

LLM Context

Accessibility Testing

Testing

Development

Testing

License

Contributing

Related Projects

Changelog

0.1.0 (2025-01-XX)

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes