Skip to main content

Plasmate SOM integration for Browser Use - 10x fewer tokens for AI web agents

Project description

plasmate-browser-use

SOM-based content extraction for Browser Use. Drop-in alternative to Browser Use's default DOM serializer that uses Plasmate's Semantic Object Model (SOM) to reduce token costs by 10x or more.

Instead of sending the full DOM tree to your LLM, Plasmate compresses web pages into a compact semantic representation. Same information, 90% fewer tokens, lower costs, faster responses.

Install

pip install plasmate-browser-use

Prerequisites

You need the plasmate binary installed:

# Via cargo
cargo install plasmate

# Or via install script
curl -fsSL https://plasmate.app/install.sh | sh

Verify it works:

plasmate --version

Quick Start

Basic extraction

from plasmate_browser_use import PlasmateExtractor

extractor = PlasmateExtractor()

# Get raw SOM data as a dict
som = extractor.extract("https://news.ycombinator.com")
print(f"Elements: {som['meta']['element_count']}")
print(f"Compression: {som['meta']['html_bytes'] / som['meta']['som_bytes']:.1f}x")

Get page context for an LLM

The get_page_context() method returns a formatted string optimized for LLM consumption, with interactive elements, links, content, and compression stats:

context = extractor.get_page_context("https://example.com")
print(context)

Output:

# Example Domain
URL: https://example.com
Language: en

## Interactive Elements (1)
  [e1] link "More information..." (click)

## Content
This domain is for use in illustrative examples in documents...

---
Compression: 15.2x (1256 HTML bytes -> 83 SOM bytes)
Elements: 5 (1 interactive)

Markdown extraction

md = extractor.extract_markdown("https://example.com")
print(md)

Async support

All methods have async variants:

import asyncio

async def main():
    extractor = PlasmateExtractor()
    context = await extractor.get_page_context_async("https://example.com")
    som = await extractor.extract_async("https://example.com")
    md = await extractor.extract_markdown_async("https://example.com")

asyncio.run(main())

Using with a Browser Use agent

from browser_use import Agent
from plasmate_browser_use import PlasmateExtractor

extractor = PlasmateExtractor()

# Get compact page context instead of full DOM
context = extractor.get_page_context("https://example.com/products")

# Feed to your Browser Use agent with 10x fewer tokens
agent = Agent(task="Find the cheapest product", page_context=context)
result = await agent.run()

Token savings comparison

from plasmate_browser_use import PlasmateExtractor, token_count_comparison

extractor = PlasmateExtractor()
som = extractor.extract("https://news.ycombinator.com")
stats = token_count_comparison(som)

print(f"HTML tokens: ~{stats['html_tokens_est']:,}")
print(f"SOM tokens:  ~{stats['som_tokens_est']:,}")
print(f"Savings:     {stats['token_savings_pct']}%")
print(f"Ratio:       {stats['token_ratio']}x fewer tokens")

Typical token savings

Site HTML tokens SOM tokens Reduction
Hacker News ~22,000 ~1,200 18x
Wikipedia article ~85,000 ~8,500 10x
Amazon product page ~120,000 ~6,000 20x
Google search results ~45,000 ~3,500 13x

Numbers vary by page. The more complex the page (ads, trackers, layout noise), the bigger the savings.

How it works

  1. Plasmate fetches the page and parses the HTML
  2. The DOM is compiled into a Semantic Object Model (SOM) that preserves meaning while stripping layout noise
  3. The SOM is serialized into a compact format with tagged interactive elements
  4. Your LLM agent sees the same page information in 10x fewer tokens

Links

License

Apache-2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

plasmate_browser_use-0.5.0.tar.gz (5.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

plasmate_browser_use-0.5.0-py3-none-any.whl (6.4 kB view details)

Uploaded Python 3

File details

Details for the file plasmate_browser_use-0.5.0.tar.gz.

File metadata

  • Download URL: plasmate_browser_use-0.5.0.tar.gz
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for plasmate_browser_use-0.5.0.tar.gz
Algorithm Hash digest
SHA256 10446c4f4969ffed94206e999856afa4c2af2ba9eac859f095e4793001448d4f
MD5 1bee0f73fc2c2db148d1036df6f48e1e
BLAKE2b-256 69fbc46f9cc4014d2e35b1f8416f07b89b7294ebbff5ef639a48495f7acfb5ad

See more details on using hashes here.

File details

Details for the file plasmate_browser_use-0.5.0-py3-none-any.whl.

File metadata

File hashes

Hashes for plasmate_browser_use-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 882edce686b2091b1e048ca9d0f35f7936a9a85133094f40978c985c1d300591
MD5 a1f390f6c09bed032182dfe1141da90a
BLAKE2b-256 5721f51cd7adeae5a5ef552700e5a826c0aa0f45c6523cbefb982e2ee6cd2c71

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page