Plasmate SOM integration for Browser Use - 10x fewer tokens for AI web agents
Project description
plasmate-browser-use
SOM-based content extraction for Browser Use. Drop-in alternative to Browser Use's default DOM serializer that uses Plasmate's Semantic Object Model (SOM) to reduce token costs by 10x or more.
Instead of sending the full DOM tree to your LLM, Plasmate compresses web pages into a compact semantic representation. Same information, 90% fewer tokens, lower costs, faster responses.
Install
pip install plasmate-browser-use
Prerequisites
You need the plasmate binary installed:
# Via cargo
cargo install plasmate
# Or via install script
curl -fsSL https://plasmate.app/install.sh | sh
Verify it works:
plasmate --version
Quick Start
Basic extraction
from plasmate_browser_use import PlasmateExtractor
extractor = PlasmateExtractor()
# Get raw SOM data as a dict
som = extractor.extract("https://news.ycombinator.com")
print(f"Elements: {som['meta']['element_count']}")
print(f"Compression: {som['meta']['html_bytes'] / som['meta']['som_bytes']:.1f}x")
Get page context for an LLM
The get_page_context() method returns a formatted string optimized for LLM consumption, with interactive elements, links, content, and compression stats:
context = extractor.get_page_context("https://example.com")
print(context)
Output:
# Example Domain
URL: https://example.com
Language: en
## Interactive Elements (1)
[e1] link "More information..." (click)
## Content
This domain is for use in illustrative examples in documents...
---
Compression: 15.2x (1256 HTML bytes -> 83 SOM bytes)
Elements: 5 (1 interactive)
Markdown extraction
md = extractor.extract_markdown("https://example.com")
print(md)
Async support
All methods have async variants:
import asyncio
async def main():
extractor = PlasmateExtractor()
context = await extractor.get_page_context_async("https://example.com")
som = await extractor.extract_async("https://example.com")
md = await extractor.extract_markdown_async("https://example.com")
asyncio.run(main())
Using with a Browser Use agent
from browser_use import Agent
from plasmate_browser_use import PlasmateExtractor
extractor = PlasmateExtractor()
# Get compact page context instead of full DOM
context = extractor.get_page_context("https://example.com/products")
# Feed to your Browser Use agent with 10x fewer tokens
agent = Agent(task="Find the cheapest product", page_context=context)
result = await agent.run()
Token savings comparison
from plasmate_browser_use import PlasmateExtractor, token_count_comparison
extractor = PlasmateExtractor()
som = extractor.extract("https://news.ycombinator.com")
stats = token_count_comparison(som)
print(f"HTML tokens: ~{stats['html_tokens_est']:,}")
print(f"SOM tokens: ~{stats['som_tokens_est']:,}")
print(f"Savings: {stats['token_savings_pct']}%")
print(f"Ratio: {stats['token_ratio']}x fewer tokens")
Typical token savings
| Site | HTML tokens | SOM tokens | Reduction |
|---|---|---|---|
| Hacker News | ~22,000 | ~1,200 | 18x |
| Wikipedia article | ~85,000 | ~8,500 | 10x |
| Amazon product page | ~120,000 | ~6,000 | 20x |
| Google search results | ~45,000 | ~3,500 | 13x |
Numbers vary by page. The more complex the page (ads, trackers, layout noise), the bigger the savings.
How it works
- Plasmate fetches the page and parses the HTML
- The DOM is compiled into a Semantic Object Model (SOM) that preserves meaning while stripping layout noise
- The SOM is serialized into a compact format with tagged interactive elements
- Your LLM agent sees the same page information in 10x fewer tokens
Links
- Plasmate -- the SOM engine
- SOM Spec -- Semantic Object Model specification
- Browser Use -- AI agent browser framework
- Token cost analysis -- detailed benchmarks
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file plasmate_browser_use-0.5.0.tar.gz.
File metadata
- Download URL: plasmate_browser_use-0.5.0.tar.gz
- Upload date:
- Size: 5.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
10446c4f4969ffed94206e999856afa4c2af2ba9eac859f095e4793001448d4f
|
|
| MD5 |
1bee0f73fc2c2db148d1036df6f48e1e
|
|
| BLAKE2b-256 |
69fbc46f9cc4014d2e35b1f8416f07b89b7294ebbff5ef639a48495f7acfb5ad
|
File details
Details for the file plasmate_browser_use-0.5.0-py3-none-any.whl.
File metadata
- Download URL: plasmate_browser_use-0.5.0-py3-none-any.whl
- Upload date:
- Size: 6.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
882edce686b2091b1e048ca9d0f35f7936a9a85133094f40978c985c1d300591
|
|
| MD5 |
a1f390f6c09bed032182dfe1141da90a
|
|
| BLAKE2b-256 |
5721f51cd7adeae5a5ef552700e5a826c0aa0f45c6523cbefb982e2ee6cd2c71
|