Skip to main content

A text-based web representation that retains meaningful structure information while displaying only human-visible content

Project description

Weblite

A text-based web representation that retains meaningful structure information while displaying only human-visible content.

Version 0.2.0 - Major API improvements!

Overview

Weblite transforms complex web pages into a clean, text-based representation optimized for both LLMs and human comprehension. By extracting only the visible elements and their hierarchical relationships, it creates a simplified yet structurally accurate view of web content.

Key characteristics:

  • LLM-Optimized: Designed for AI agents to understand and interact with web pages efficiently
  • Human-Readable: Clear, intuitive format that humans can easily parse and understand
  • Structural Integrity: Preserves the DOM hierarchy and element relationships, unlike flat accessibility trees
  • Selector Construction: Maintains enough structural context to enable accurate CSS/XPath selector generation
  • Visibility-Focused: Filters out hidden elements, displaying only what users actually see on the page

This approach bridges the gap between raw HTML complexity and oversimplified text extraction, providing just the right amount of information for effective web automation and analysis.

Documentation

See the docs/ folder for detailed documentation:

Installation

Install weblite using pip:

pip install weblite

Or install from source:

git clone https://github.com/steve-z-wang/weblite.git
cd weblite
pip install -e .

Requirements

  • Python 3.8 or higher
  • Playwright (automatically installed as a dependency)

After installation, you'll need to install Playwright browsers:

playwright install

Quick Start

import asyncio
from playwright.async_api import async_playwright
from weblite import PlaywrightPage

async def scrape_page():
    async with async_playwright() as p:
        browser = await p.chromium.launch()
        page = await browser.new_page()
        await page.goto("https://example.com")

        # Convert to weblite format
        web_page = PlaywrightPage(page)
        element = await web_page.to_weblite()

        if element:
            # Get the simplified representation
            result = element.to_dict(collapse_wrappers=True)
            print(result)

        await browser.close()

asyncio.run(scrape_page())

Output Example

Input HTML:

<div class="wrapper">
  <h1>Products</h1>
  <div class="card">
    <h3>Laptop</h3>
    <p>$999</p>
    <button>Add to Cart</button>
  </div>
</div>

Weblite Output:

{
  "main": [
    {"h1": "Products"},
    {
      "div": [
        {"h3": "Laptop"},
        {"p": "$999"},
        {"button": "Add to Cart"}
      ]
    }
  ]
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

weblite-0.2.1.tar.gz (13.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

weblite-0.2.1-py3-none-any.whl (18.2 kB view details)

Uploaded Python 3

File details

Details for the file weblite-0.2.1.tar.gz.

File metadata

  • Download URL: weblite-0.2.1.tar.gz
  • Upload date:
  • Size: 13.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for weblite-0.2.1.tar.gz
Algorithm Hash digest
SHA256 8bfeeef6dd4d6eb69e5e939db17e5532abc355f05e7361f1feaebe7b025b0e53
MD5 e8dc64e865c8a0f0769196ffc12b878f
BLAKE2b-256 229e53dff034b94fdc48444710115de11ad992ce3e708271afcfbf1a4b834aff

See more details on using hashes here.

Provenance

The following attestation bundles were made for weblite-0.2.1.tar.gz:

Publisher: publish-pypi.yml on steve-z-wang/weblite

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file weblite-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: weblite-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 18.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for weblite-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5c71f8862a4b9dbb047950b51af7428c14017ac19c9cfdccb943c25da2a0e55a
MD5 fd845ac19f5a130b81d457638e054a56
BLAKE2b-256 0bb016dcdb9eec5169bd16fa56da3b1dee6f03fd8024e4bbf7fcb1c2f8b00731

See more details on using hashes here.

Provenance

The following attestation bundles were made for weblite-0.2.1-py3-none-any.whl:

Publisher: publish-pypi.yml on steve-z-wang/weblite

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page