A text-based web representation that retains meaningful structure information while displaying only human-visible content
Project description
Weblite
A text-based web representation that retains meaningful structure information while displaying only human-visible content.
Version 0.2.0 - Major API improvements!
Overview
Weblite transforms complex web pages into a clean, text-based representation optimized for both LLMs and human comprehension. By extracting only the visible elements and their hierarchical relationships, it creates a simplified yet structurally accurate view of web content.
Key characteristics:
- LLM-Optimized: Designed for AI agents to understand and interact with web pages efficiently
- Human-Readable: Clear, intuitive format that humans can easily parse and understand
- Structural Integrity: Preserves the DOM hierarchy and element relationships, unlike flat accessibility trees
- Selector Construction: Maintains enough structural context to enable accurate CSS/XPath selector generation
- Visibility-Focused: Filters out hidden elements, displaying only what users actually see on the page
This approach bridges the gap between raw HTML complexity and oversimplified text extraction, providing just the right amount of information for effective web automation and analysis.
Documentation
See the docs/ folder for detailed documentation:
- Tree Format - Understanding weblite's output structure
- Pruning Rules - How wrapper elements are collapsed
- Selectors - Smart element targeting (coming soon)
- Agent Integration - Using weblite with AI agents
Installation
Install weblite using pip:
pip install weblite
Or install from source:
git clone https://github.com/steve-z-wang/weblite.git
cd weblite
pip install -e .
Requirements
- Python 3.8 or higher
- Playwright (automatically installed as a dependency)
After installation, you'll need to install Playwright browsers:
playwright install
Quick Start
import asyncio
from playwright.async_api import async_playwright
from weblite import PlaywrightPage
async def scrape_page():
async with async_playwright() as p:
browser = await p.chromium.launch()
page = await browser.new_page()
await page.goto("https://example.com")
# Convert to weblite format
web_page = PlaywrightPage(page)
element = await web_page.to_weblite()
if element:
# Get the simplified representation
result = element.to_dict(collapse_wrappers=True)
print(result)
await browser.close()
asyncio.run(scrape_page())
Output Example
Input HTML:
<div class="wrapper">
<h1>Products</h1>
<div class="card">
<h3>Laptop</h3>
<p>$999</p>
<button>Add to Cart</button>
</div>
</div>
Weblite Output:
{
"main": [
{"h1": "Products"},
{
"div": [
{"h3": "Laptop"},
{"p": "$999"},
{"button": "Add to Cart"}
]
}
]
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file weblite-0.2.3.tar.gz.
File metadata
- Download URL: weblite-0.2.3.tar.gz
- Upload date:
- Size: 13.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
14f3888bd1ba51a85e8026805845563f27abce421943d669eda33764fdbc7192
|
|
| MD5 |
b3986a4207a0a7336709894c7fdca1f6
|
|
| BLAKE2b-256 |
54673d5fc5a0ed8ca3a35f84da38d21fc812922336280ceef6fec4990d60f332
|
Provenance
The following attestation bundles were made for weblite-0.2.3.tar.gz:
Publisher:
publish-pypi.yml on steve-z-wang/weblite
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
weblite-0.2.3.tar.gz -
Subject digest:
14f3888bd1ba51a85e8026805845563f27abce421943d669eda33764fdbc7192 - Sigstore transparency entry: 525500395
- Sigstore integration time:
-
Permalink:
steve-z-wang/weblite@b9f3bfca147fdae10ccb619e67229ad56f08908d -
Branch / Tag:
refs/tags/v0.2.3 - Owner: https://github.com/steve-z-wang
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@b9f3bfca147fdae10ccb619e67229ad56f08908d -
Trigger Event:
release
-
Statement type:
File details
Details for the file weblite-0.2.3-py3-none-any.whl.
File metadata
- Download URL: weblite-0.2.3-py3-none-any.whl
- Upload date:
- Size: 19.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
43b7c5a419f75ac618859f370a817885c17067ac42151dee11678d9eddc7cad9
|
|
| MD5 |
551cc16b61125ce0362406bfed3942d8
|
|
| BLAKE2b-256 |
f44597dd8c577babf9ba69b3b7e72b370b2ec458bde79b2e852776d72310f4e3
|
Provenance
The following attestation bundles were made for weblite-0.2.3-py3-none-any.whl:
Publisher:
publish-pypi.yml on steve-z-wang/weblite
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
weblite-0.2.3-py3-none-any.whl -
Subject digest:
43b7c5a419f75ac618859f370a817885c17067ac42151dee11678d9eddc7cad9 - Sigstore transparency entry: 525500434
- Sigstore integration time:
-
Permalink:
steve-z-wang/weblite@b9f3bfca147fdae10ccb619e67229ad56f08908d -
Branch / Tag:
refs/tags/v0.2.3 - Owner: https://github.com/steve-z-wang
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@b9f3bfca147fdae10ccb619e67229ad56f08908d -
Trigger Event:
release
-
Statement type: