Skip to main content

Deterministic semantic runtime over live Chromium for LLM planners

Project description

Semantic Browser

Semantic Browser mascot

Version 1.3.1 (Beta) · PyPI · Changelog · License: MIT

Semantic Browser turns live Chromium pages into compact semantic "rooms" for LLM planners. The planner sees a text-adventure description of the page, picks one action ID, and the runtime executes it deterministically.

@ BBC News (bbc.co.uk)
> Home page. Main content: "Top stories". Navigation: News, Sport, Weather.
! Cookie consent banner detected -> dismiss [act-a1b2c3d4-0]
1 open "News" [act-8f2a2d1c-0]
2 open "Sport" [act-c3e119fa-0]
3 fill Search BBC [act-0b9411de-0] *value
+ 28 more [more]

Less confusion, less hallucination, dramatically less cost.

Why Semantic Browser

  • Plain-text room descriptions — prose, not JSON soup.
  • Curated action surface — top 25 actions, with more for progressive disclosure.
  • Deterministic executionobserve → act → observe delta, every time.
  • Built-in blockers — cookie banners, modals, and anti-bot gates are detected and signaled.
  • Token-efficient — median planner input of ~540 tokens vs ~10,000 for standard tooling.
  • Three interfaces — Python API, CLI, and HTTP service.

Install

pip install "semantic-browser[managed]"
semantic-browser install-browser

For service mode: pip install "semantic-browser[server]"

Quickstart

Interactive portal

semantic-browser portal --url https://example.com --headless

Python

import asyncio
from semantic_browser import ManagedSession
from semantic_browser.models import ActionRequest

async def main() -> None:
    session = await ManagedSession.launch(headful=False)
    runtime = session.runtime

    await runtime.navigate("https://example.com")
    obs = await runtime.observe(mode="summary")
    print(obs.planner.room_text)

    first_link = next((a for a in obs.available_actions if a.op == "open"), None)
    if first_link:
        result = await runtime.act(ActionRequest(action_id=first_link.id))
        print(result.status, result.observation.page.url)

    await session.close()

asyncio.run(main())

LLM Agent Loop (Minimal)

async def agent_loop(url: str, task: str) -> None:
    session = await ManagedSession.launch(headful=False)
    runtime = session.runtime
    await runtime.navigate(url)
    obs = await runtime.observe(mode="summary")

    for step in range(25):
        action_id = call_your_llm(obs.planner.room_text, task)  # returns one action ID
        if action_id == "done":
            break
        result = await runtime.act(ActionRequest(action_id=action_id))
        obs = result.observation

    await session.close()

Full worked examples for OpenAI, Anthropic, and more: Integration Examples

Documentation

Document What it covers
Getting Started Install, first run, interactive portal, Python/CLI/service quickstarts
Planner Contract The exact interface between Semantic Browser and an LLM planner — what the planner receives, what it should reply, how to handle blockers, failures, and stopping
Integration Examples End-to-end examples: OpenAI chat, OpenAI function-calling, Anthropic tool use, HTTP service, CDP attach, error handling patterns
API Reference Every public class, method, model, and field — ManagedSession, SemanticBrowserRuntime, Observation, StepResult, ActionDescriptor, configuration, errors
Runtime Modes Decision table for ephemeral/persistent/clone/attach/service modes, headful vs headless, ownership semantics
Real Profiles Using real Chromium profiles for login persistence, SSO, clone mode, safety guarantees, common pitfalls
Benchmark Protocol How benchmark numbers are produced and validated
Versioning Version numbering scheme
Publishing PyPI publish checklist
Changelog Full release history

How It Works

Live page → extract semantic tree → group into regions → curate actions → render room text
                                                                              ↓
                                                              LLM planner picks action ID
                                                                              ↓
                                                              runtime resolves & executes
                                                                              ↓
                                                              observe delta → repeat
  1. Observe — the runtime extracts the page's semantic structure, groups it into regions, curates the top actions, and renders a text-adventure "room".
  2. Plan — the LLM planner reads the room text and replies with one action ID.
  3. Act — the runtime resolves the action to a DOM element, executes it, waits for the page to settle, and produces a delta observation.
  4. Repeat — the planner sees the delta and picks the next action.

Benchmarks

Cross-method comparison on a shared 25-task pack:

Method Success Median planner input (tokens) Median planner output (tokens) Indicative cost/request (USD)
Standard browser tooling 24% (6/25) 10,118 74 $0.041
OpenClaw browser tooling 72% (18/25) 6,833 66 $0.022
Semantic Browser 100% (25/25) 540 14 $0.004

At 5 tasks/day over a year: ~$75/year standard vs ~$7/year Semantic Browser.

These are reference harness results, not universal guarantees. Protocol: docs/benchmark_protocol.md. Manifest: benchmarks/manifest.json.

CLI Reference

semantic-browser version                # Show version
semantic-browser doctor                 # Verify installation
semantic-browser install-browser        # Download Chromium
semantic-browser launch --headless      # Start a session
semantic-browser attach --cdp <ws-url>  # Attach to running Chrome
semantic-browser portal --url <url>     # Interactive exploration REPL
semantic-browser observe --session <id> --mode summary
semantic-browser act --session <id> --action <action_id>
semantic-browser inspect --session <id> --target <target_id>
semantic-browser navigate --session <id> --url <url>
semantic-browser back --session <id>
semantic-browser forward --session <id>
semantic-browser reload --session <id>
semantic-browser diagnostics --session <id>
semantic-browser export-trace --session <id> --out trace.json
semantic-browser serve --host 127.0.0.1 --port 8765 --api-token <token>

What's New in v1.3.0

  • Framework-agnostic element discovery — AngularJS, Vue, Alpine.js, and custom elements discovered automatically.
  • Fuzzy structural settle — live-updating pages (odds, tickers, chat) no longer cause timeouts.
  • Stable fingerprints — action IDs use DOM id + CSS selector, not pixel position.
  • Smarter locator resolution — volatile framework classes stripped from selectors.
  • Robust modal detection — three-tier detection with visibility and size checks.
  • Increased budgets — 25 curated actions, 2K room budget, 4K max elements.
  • SPA navigation awareness — URL changes during settle are handled correctly.

Full details: CHANGELOG.md

Contributing

See CONTRIBUTING.md for development setup and PR expectations.

License

MIT — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semantic_browser-1.3.1.tar.gz (102.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

semantic_browser-1.3.1-py3-none-any.whl (56.7 kB view details)

Uploaded Python 3

File details

Details for the file semantic_browser-1.3.1.tar.gz.

File metadata

  • Download URL: semantic_browser-1.3.1.tar.gz
  • Upload date:
  • Size: 102.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for semantic_browser-1.3.1.tar.gz
Algorithm Hash digest
SHA256 76b33e7e3fffdaf24a55ce2d23f3dc01bc4c406d3be2ef0c0502e487763c8682
MD5 902066a0adf85f67974599c5a4360054
BLAKE2b-256 75743ca158bbfe3e539938da3df9506e4132eb2b44948fe7c18be3588dc3fca2

See more details on using hashes here.

File details

Details for the file semantic_browser-1.3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for semantic_browser-1.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f34ab204dbd52606ffdf463808029f515f298a77a34e5aafc1a5a664ae01861f
MD5 74d39a5eb5d18ce52b38e9b909e16fd7
BLAKE2b-256 17fb4911a3a35ac26e1bc05d4370dc2e587e03c660709cce27244ab7410f60b3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page