Skip to main content

LangChain integration for Plasmate — agent-native headless browser with SOM output

Project description

langchain-plasmate

LangChain integration for Plasmate — an agent-native headless browser that returns SOM (Semantic Object Model) output instead of raw HTML.

SOM compiles web pages into compact, structured representations that preserve interactive elements, content hierarchy, and navigation landmarks while stripping scripts, styles, and layout noise. This typically saves ~10x tokens compared to raw HTML.

Installation

pip install langchain-plasmate

Requires the plasmate binary on your PATH. See the main Plasmate README for installation.

Quick Start

Fetch a page (stateless)

from langchain_plasmate import PlasmateFetchTool

fetch = PlasmateFetchTool()
result = fetch.invoke("https://news.ycombinator.com")
print(result)

Output:

Page: Hacker News
URL: https://news.ycombinator.com

## navigation "Main menu"
  [e_a1b2c3d4e5f6] link "Hacker News" -> /
  [e_b2c3d4e5f6a7] link "new" -> /newest
  [e_c3d4e5f6a7b8] link "past" -> /front
  ...

## main
  h1: Hacker News
  [e_d4e5f6a7b8c9] link "Show HN: Something Cool" -> https://example.com
  [e_e5f6a7b8c9d0] link "42 comments" -> /item?id=12345
  ...

---
87,234 → 4,521 bytes (19.3x) | 156 elements, 89 interactive

Agent with browsing tools

from langchain.agents import AgentExecutor, create_tool_calling_agent
from langchain_core.prompts import ChatPromptTemplate
from langchain_openai import ChatOpenAI
from langchain_plasmate import get_plasmate_tools

tools = get_plasmate_tools()
llm = ChatOpenAI(model="gpt-4o")

prompt = ChatPromptTemplate.from_messages([
    ("system", "You can browse the web using Plasmate tools."),
    ("human", "{input}"),
    ("placeholder", "{agent_scratchpad}"),
])

agent = create_tool_calling_agent(llm, tools, prompt)
executor = AgentExecutor(agent=agent, tools=tools)

result = executor.invoke({
    "input": "Go to Hacker News and tell me the top 3 stories"
})
print(result["output"])

Document loader

from langchain_plasmate import PlasmateSOMLoader

loader = PlasmateSOMLoader([
    "https://example.com",
    "https://news.ycombinator.com",
])
docs = loader.load()

for doc in docs:
    print(f"{doc.metadata['title']}{doc.metadata['element_count']} elements")
    print(doc.page_content[:200])
    print()

Token Comparison

Source Hacker News Example.com Typical News Article
Raw HTML (WebBaseLoader) ~22,000 tokens ~400 tokens ~45,000 tokens
SOM (PlasmateSOMLoader) ~1,500 tokens ~80 tokens ~3,000 tokens
Savings ~15x ~5x ~15x

SOM preserves all interactive elements, headings, and content structure while stripping scripts, styles, hidden elements, and layout-only markup.

Tools Reference

PlasmateFetchTool

Stateless page fetch — returns SOM text for a single URL.

Field Value
Name plasmate_fetch
Input url (string)
Output SOM text with element IDs

PlasmateNavigateTool

Opens a URL in a persistent browser session. Subsequent click/type calls target this page.

Field Value
Name plasmate_navigate
Input url (string)
Output SOM text with element IDs

PlasmateClickTool

Clicks an interactive element by its SOM element ID (e.g., e_a1b2c3d4e5f6).

Field Value
Name plasmate_click
Input element_id (string)
Output Updated SOM text

PlasmateTypeTool

Types text into a form input or textarea by its SOM element ID.

Field Value
Name plasmate_type
Input element_id (string), text (string)
Output Updated SOM text

get_plasmate_tools()

Returns all four tools with a shared Plasmate client and browser session.

from langchain_plasmate import get_plasmate_tools
from plasmate import Plasmate

# Default client
tools = get_plasmate_tools()

# Custom client
client = Plasmate(binary="/path/to/plasmate", timeout=60)
tools = get_plasmate_tools(client=client)

Loader Reference

PlasmateSOMLoader

Loads web pages as LangChain Document objects with SOM text as content.

PlasmateSOMLoader(
    urls=["https://example.com"],
    budget=2000,          # optional token budget per page
    javascript=True,      # enable JS execution (default)
    client=None,          # optional Plasmate instance
)

Document metadata includes: url, title, lang, html_bytes, som_bytes, element_count, interactive_count, and optionally description, open_graph, json_ld.

Configuration

All tools accept an optional Plasmate client instance for custom configuration:

from plasmate import Plasmate
from langchain_plasmate import PlasmateFetchTool

client = Plasmate(binary="/opt/plasmate/bin/plasmate", timeout=60)
fetch = PlasmateFetchTool(client=client)

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

langchain_plasmate-0.5.0.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

langchain_plasmate-0.5.0-py3-none-any.whl (9.2 kB view details)

Uploaded Python 3

File details

Details for the file langchain_plasmate-0.5.0.tar.gz.

File metadata

  • Download URL: langchain_plasmate-0.5.0.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.3

File hashes

Hashes for langchain_plasmate-0.5.0.tar.gz
Algorithm Hash digest
SHA256 2e902b89c27a39e0557177df3d0652999402d7a665bbcc8401fb263b6d1f5e4a
MD5 0d1f230cc03c71e94e8f00c2fdf28fc0
BLAKE2b-256 3149cace4168aa0b5dccb1c6ea5bf4c18f4cacf4284655000462b9ed0d1cd657

See more details on using hashes here.

File details

Details for the file langchain_plasmate-0.5.0-py3-none-any.whl.

File metadata

File hashes

Hashes for langchain_plasmate-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f1fe880ec2b1345b4ad874662d2c38a257bb1044d3b6a4562fe1caff2f62311f
MD5 bda7ba79e67a25ce8866ce875fda3b90
BLAKE2b-256 63991bcd8d13f1b1341662aebe5d30036d559071aeb400bf46307100fde42f2a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page