Skip to main content

A text-based web representation that retains meaningful structure information while displaying only human-visible content

Project description

Weblite

A text-based web representation that retains meaningful structure information while displaying only human-visible content.

Version 0.2.0 - Major API improvements!

Overview

Weblite transforms complex web pages into a clean, text-based representation optimized for both LLMs and human comprehension. By extracting only the visible elements and their hierarchical relationships, it creates a simplified yet structurally accurate view of web content.

Key characteristics:

  • LLM-Optimized: Designed for AI agents to understand and interact with web pages efficiently
  • Human-Readable: Clear, intuitive format that humans can easily parse and understand
  • Structural Integrity: Preserves the DOM hierarchy and element relationships, unlike flat accessibility trees
  • Selector Construction: Maintains enough structural context to enable accurate CSS/XPath selector generation
  • Visibility-Focused: Filters out hidden elements, displaying only what users actually see on the page

This approach bridges the gap between raw HTML complexity and oversimplified text extraction, providing just the right amount of information for effective web automation and analysis.

Documentation

See the docs/ folder for detailed documentation:

Installation

Install weblite using pip:

pip install weblite

Or install from source:

git clone https://github.com/steve-z-wang/weblite.git
cd weblite
pip install -e .

Requirements

  • Python 3.8 or higher
  • Playwright (automatically installed as a dependency)

After installation, you'll need to install Playwright browsers:

playwright install

Quick Start

import asyncio
from playwright.async_api import async_playwright
from weblite import PlaywrightPage

async def scrape_page():
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page()
        await page.goto("https://example.com")

        # Convert to weblite format
        web_page = PlaywrightPage(page)
        element = await web_page.to_weblite()

        if element:
            # Get the simplified representation
            result = element.to_dict(collapse_wrappers=True)
            print(result)

        await browser.close()

asyncio.run(scrape_page())

Output Example

Input HTML:

<div class="wrapper">
  <h1>Products</h1>
  <div class="card">
    <h3>Laptop</h3>
    <p>$999</p>
    <button>Add to Cart</button>
  </div>
</div>

Weblite Output:

{
  "main": [
    {"h1": "Products"},
    {
      "div": [
        {"h3": "Laptop"},
        {"p": "$999"},
        {"button": "Add to Cart"}
      ]
    }
  ]
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

weblite-0.2.3.tar.gz (13.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

weblite-0.2.3-py3-none-any.whl (19.8 kB view details)

Uploaded Python 3

File details

Details for the file weblite-0.2.3.tar.gz.

File metadata

  • Download URL: weblite-0.2.3.tar.gz
  • Upload date:
  • Size: 13.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for weblite-0.2.3.tar.gz
Algorithm Hash digest
SHA256 14f3888bd1ba51a85e8026805845563f27abce421943d669eda33764fdbc7192
MD5 b3986a4207a0a7336709894c7fdca1f6
BLAKE2b-256 54673d5fc5a0ed8ca3a35f84da38d21fc812922336280ceef6fec4990d60f332

See more details on using hashes here.

Provenance

The following attestation bundles were made for weblite-0.2.3.tar.gz:

Publisher: publish-pypi.yml on steve-z-wang/weblite

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file weblite-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: weblite-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 19.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for weblite-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 43b7c5a419f75ac618859f370a817885c17067ac42151dee11678d9eddc7cad9
MD5 551cc16b61125ce0362406bfed3942d8
BLAKE2b-256 f44597dd8c577babf9ba69b3b7e72b370b2ec458bde79b2e852776d72310f4e3

See more details on using hashes here.

Provenance

The following attestation bundles were made for weblite-0.2.3-py3-none-any.whl:

Publisher: publish-pypi.yml on steve-z-wang/weblite

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page