Compact, ref-stable, reachability-filtered perception for AI browser agents

These details have not been verified by PyPI

Project description

perceive

A Python library that turns a browser page into a compact, ref-stable, reachability-filtered structured snapshot for AI agents.

AI browser agents that read raw accessibility trees end up trying to click elements that exist in the DOM but cannot actually be interacted with — closed drawers, modal-occluded buttons, inert subtrees, off-screen transforms. perceive filters those out, gives the model compact stable refs, and lets agents diff UI state between actions.

import perceive

with perceive.browser(url="https://example.com") as t:
    state = t.perceive()
    print(state.to_prompt())
    # @e1 link "More information..."

    t.act("click", state.find(name="More information").ref)

Benchmark results

Measured on a 14-page hand-labeled reachability conformance suite (bench/). Same machine, same Chromium build, same 36 ground-truth elements: Playwright MCP surfaces 12 elements an agent cannot actually interact with; perceive surfaces 0.

Adapter	Precision	Recall	F1	False positives	Median tokens / page	Median latency
Raw a11y baseline (no reachability filtering)	0.528	1.000	0.691	17 / 36	21.5	—
Playwright MCP (`@playwright/mcp`)	0.613	1.000	0.760	12 / 36	180.5	6030 ms
`perceive`	1.000	1.000	1.000	0 / 36	8.0	~150 ms

Latency is per-call wall time; both adapters launch a fresh browser process per page (the bench isolates each call). A long-lived MCP server would amortize subprocess startup across many calls, so the latency gap on warm-state usage is smaller than the cold-start numbers above. The token and false-positive numbers are unaffected.

Each false positive is an element an AI agent may try to click and fail on — the failure pattern documented in Playwright issue #39955.

Playwright MCP already filters elements Chromium's accessibility tree excludes (CSS-hidden such as display:none / visibility:hidden, plus disabled controls and most non-focusable elements), so it beats the raw baseline by 5 false positives. The remaining 12 fall into the patterns the accessibility tree alone cannot resolve: modal occlusion, sticky-header overlap, off-screen transforms, inert subtrees, and aria-hidden cascades. perceive performs an explicit reachability pass over these, eliminating them all.

Determinism: 1.000 mean exact-match rate for perceive across 14 pages × 5 runs each.

Scope of claim. This is a reachability conformance benchmark, not a general claim about Playwright. Playwright remains the underlying execution layer that perceive's browser backend builds on; this benchmark measures the observation layer — what an agent sees before it decides what to do.

Adapters for Chrome DevTools MCP and Vercel agent-browser are still on the roadmap.

Install

pip install perceive
playwright install chromium    # ~100 MB Chromium binary

Three things `perceive` does that a raw accessibility tree does not

1. Filter unreachable elements

import perceive

# A closed drawer is still in the DOM, just translated off-screen.
# A raw a11y tree includes its buttons. perceive does not.
with perceive.browser(url="https://your-app.com") as t:
    state = t.perceive()
    print(len(state.elements))                                    # 4 — the visible buttons
    state_full = t.perceive(include_unreachable=True)
    print(len(state_full.elements))                                # 7 — visible + drawer contents
    for el in state_full.elements:
        if not el.reachable:
            print(f"  filtered: {el.role} {el.name!r}")
    # filtered: button 'Close Drawer'
    # filtered: button 'Submit Form'

2. Filter modal-occluded elements

# Buttons behind an open modal are present in the DOM and the a11y tree,
# but a real user cannot click them. perceive returns only the modal's buttons.
with perceive.browser(url="https://your-app.com") as t:
    state = t.perceive()
    for el in state:
        print(el.ref, el.role, repr(el.name))
    # e1 button 'OK'        (in the modal)
    # e2 button 'Cancel'    (in the modal)
    # the two background buttons are filtered out

3. Stable refs across reflows, including for repeated elements

with perceive.browser(url="https://your-app.com/users") as t:
    state = t.perceive()

    # Repeated buttons with the same label get distinct refs, disambiguated
    # by surrounding context (parent landmark, siblings, stable attributes):
    edits = state.find_all(name="Edit")
    print([e.ref for e in edits])
    # ['e3', 'e5', 'e7']

    # An element's ref is preserved across re-perceives, including after
    # scrolling and other reflows that keep the element in the document:
    sign_in_before = state.find(name="Sign in").ref
    t.act("scroll", direction="down", amount=400)
    sign_in_after = t.perceive().find(name="Sign in").ref
    assert sign_in_before == sign_in_after

Why not just use Playwright locators?

Playwright locators are the right tool when you already know what to interact with — you write page.get_by_role("button", name="Sign in") because you, the human author, decided that button is what you want.

perceive is for the part of an agent loop where the model needs to decide what's available. The flow is observe → plan → act → verify, and step 1 is "give the model a compact, reachable, ref-stable action space." perceive does that step; it doesn't replace deterministic Playwright tests for code you've already written.

Integration: feeding `perceive` output to an LLM

import perceive

with perceive.browser(url="https://app.example.com/login") as target:
    state = target.perceive()

    prompt = f"""You are operating a browser. Available actions:
- click(ref)
- type(ref, text)
- scroll(direction)

Current UI:
{state.to_prompt()}

Task: sign in as alice@example.com with password hunter2.
Respond with one action per line."""

    # Send `prompt` to any LLM (Claude, GPT, Gemini, local model).
    # Parse the response into actions, then call:
    target.act("type", "e2", "alice@example.com")
    target.act("type", "e3", "hunter2")

    # Use observe_change to see the result of the click in compact form.
    with target.observe_change() as obs:
        target.act("click", state.find(name="Sign in").ref)
    print(obs.diff.to_prompt())
    # +@e7 dialog "Welcome back, Alice"
    # -@e3 textbox "Password"
    # … 5 unchanged

API

target = perceive.browser(url=None, *, headless=True, viewport=(1280, 800))

# Navigation and lifecycle
target.goto(url)
target.close()                                  # or use as a context manager

# Perception
state = target.perceive(
    region=None,                # CSS selector or (x, y, w, h) bbox to scope
    role=None,                  # filter to a single role (e.g. "button")
    include_text=False,         # reserved; not yet implemented
    include_unreachable=False,  # default: filter unreachable
)

# State
state.elements                  # list[Element]
state.find(ref=..., role=..., name=..., reachable=...)
state.find_all(role=..., name=..., reachable=...)
state.to_prompt(only_reachable=True)
state.diff(previous)            # DiffResult

# Action (shares ref space with the most recent perceive())
target.act("click", ref)
target.act("type", ref, text)
target.act("set_value", ref, text)            # programmatic, for tricky inputs
target.act("scroll", direction="down", amount=400)
target.act("press", key)                       # e.g. "Enter", "Tab"
target.act("goto", url)
target.act("wait", seconds)

# Self-verifying loop
with target.observe_change(settle_ms=200) as obs:
    target.act("click", "e1")
obs.before, obs.after, obs.diff

Limitations (v0.1)

This is a deliberately narrow first release. Things perceive does not do yet:

Browser only. Spec includes a macOS backend (perceive.macos()); it is not in v0.1. Coming in v0.3.
Chromium only. Playwright supports Firefox and WebKit but neither is tested against the benchmark suite.
No vision fallback. Canvas-heavy UIs, custom widgets without ARIA, and image-only elements will return as fewer (or zero) elements. A small-VLM fallback is on the v0.4 roadmap.
Cross-origin iframes cannot be introspected (browser security; same-origin iframes work).
Closed Shadow DOM cannot be traversed ({ mode: 'closed' } is opaque by design). Open shadow roots work.
Ref stability is exact-fingerprint based. A button whose accessible name changes mid-session ("Save" → "Save (1)") will get a new ref. Scored-similarity matching is on the v0.2 roadmap.
Benchmark is 14 pages. Patterns covered: CSS hiding, positioning, occlusion, ancestor attributes, traversal (Shadow DOM + iframe), and non-interactive controls. Patterns not yet covered: virtualized lists, portals, nested modals, cookie banners, real component-library frameworks (Radix, MUI, Ant Design, Headless UI). Expanding before any "production-ready" claim.
No direct adapter for Playwright MCP, Chrome DevTools MCP, or Vercel agent-browser in the benchmark yet. Until those exist, comparison is to the raw-a11y-tree baseline (which is the documented failure pattern of those tools, but not a head-to-head measurement).

Reproducing the benchmarks

The repo includes a bench package. To run it yourself:

git clone <repo-url>
cd perceive
pip install -e ".[bench,dev]"
playwright install chromium

perceive-bench list pages
perceive-bench run --adapter perceive --suite reachability
perceive-bench run --adapter perceive --suite tokens
perceive-bench run --adapter perceive --suite determinism --runs 5

All results are written to results/ as JSON.

Roadmap

The next milestone is making the browser wedge undeniable — broader bench coverage and head-to-head numbers — before broadening the platform surface.

v0.1.x — Benchmark expansion (Radix, MUI, Ant Design, Headless UI, virtualized lists, portals, cookie banners, nested modals) and direct adapters in bench/ for Playwright MCP, Chrome DevTools MCP, and Vercel agent-browser. Goal: head-to-head reachability and token numbers.
v0.2 — include_text=True body capture, scored-similarity ref matching for elements whose accessible name changes mid-session, and an MCP server adapter so non-Python agents can consume perceive directly.
v0.3 — Experimental desktop perception: macOS (AXUIElement), Windows (UIA), Linux (AT-SPI), all behind the same State / Element shape. Read-only first; desktop act() ships separately.
v0.4 — Vision fallback as a plugin API (target.set_vision_backend(...)), with a first small-VLM backend for canvas-heavy and non-accessible regions.

License

Apache-2.0. See LICENSE and NOTICE.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.3.2

May 14, 2026

0.3.1

May 14, 2026

This version

0.3.0

May 13, 2026

0.2.1

May 13, 2026

0.2.0

May 13, 2026

0.1.3

May 13, 2026

0.1.2

May 13, 2026

0.1.1

May 12, 2026

0.1.0

May 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

perceive-0.3.0.tar.gz (52.4 kB view details)

Uploaded May 13, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

perceive-0.3.0-py3-none-any.whl (64.4 kB view details)

Uploaded May 13, 2026 Python 3

File details

Details for the file perceive-0.3.0.tar.gz.

File metadata

Download URL: perceive-0.3.0.tar.gz
Upload date: May 13, 2026
Size: 52.4 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for perceive-0.3.0.tar.gz
Algorithm	Hash digest
SHA256	`b612c03eebe01ed4f0d0872f0c5a1d417425a9b18e9ee58b9d604fd3922836f8`
MD5	`db108948c540eef327ff85560e6215c9`
BLAKE2b-256	`55d2890aea97e74555aad9b26b0f2b0868253be7f0f0069419c5869da48ddbce`

See more details on using hashes here.

Provenance

The following attestation bundles were made for perceive-0.3.0.tar.gz:

Publisher: publish.yml on gauthierpiarrette/perceive

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: perceive-0.3.0.tar.gz
- Subject digest: b612c03eebe01ed4f0d0872f0c5a1d417425a9b18e9ee58b9d604fd3922836f8
- Sigstore transparency entry: 1524797801
- Sigstore integration time: May 13, 2026
Source repository:
- Permalink: gauthierpiarrette/perceive@b0a1d4b6300f505425dcba107e3f1f21f0420a99
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/gauthierpiarrette
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@b0a1d4b6300f505425dcba107e3f1f21f0420a99
- Trigger Event: release

File details

Details for the file perceive-0.3.0-py3-none-any.whl.

File metadata

Download URL: perceive-0.3.0-py3-none-any.whl
Upload date: May 13, 2026
Size: 64.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for perceive-0.3.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8a68b22fb841f87d187fe5c436cf3b5b10ab32c082b0f6d47822a9244a8d0937`
MD5	`7cb9e1219e16a1de1800fc0b132f68b9`
BLAKE2b-256	`04635c9bf43314b3461f744d07905d749704aa91db4137251617f4e280ecd41d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for perceive-0.3.0-py3-none-any.whl:

Publisher: publish.yml on gauthierpiarrette/perceive

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: perceive-0.3.0-py3-none-any.whl
- Subject digest: 8a68b22fb841f87d187fe5c436cf3b5b10ab32c082b0f6d47822a9244a8d0937
- Sigstore transparency entry: 1524797832
- Sigstore integration time: May 13, 2026
Source repository:
- Permalink: gauthierpiarrette/perceive@b0a1d4b6300f505425dcba107e3f1f21f0420a99
- Branch / Tag: refs/tags/v0.3.0
- Owner: https://github.com/gauthierpiarrette
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@b0a1d4b6300f505425dcba107e3f1f21f0420a99
- Trigger Event: release

perceive 0.3.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

perceive

Benchmark results

Install

Three things perceive does that a raw accessibility tree does not

1. Filter unreachable elements

2. Filter modal-occluded elements

3. Stable refs across reflows, including for repeated elements

Why not just use Playwright locators?

Integration: feeding perceive output to an LLM

API

Limitations (v0.1)

Reproducing the benchmarks

Roadmap

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Three things `perceive` does that a raw accessibility tree does not

Integration: feeding `perceive` output to an LLM