Python SDK for the Browser Agent Protocol (BAP) - control browsers with AI agents
Project description
browser-agent-protocol
Python SDK for the Browser Agent Protocol (BAP) - control browsers with AI agents.
Installation
pip install browser-agent-protocol
Quick Start
Async API (recommended)
import asyncio
from browseragentprotocol import BAPClient, role, text, label
async def main():
async with BAPClient("ws://localhost:9222") as client:
# Launch browser
await client.launch(browser="chromium", headless=True)
# Create page and navigate
await client.create_page(url="https://example.com")
# Click using semantic selectors
await client.click(role("button", "Submit"))
# Fill form fields
await client.fill(label("Email"), "user@example.com")
# Take screenshot
screenshot = await client.screenshot()
print(f"Screenshot: {len(screenshot.data)} bytes")
# Get accessibility tree (ideal for AI agents)
tree = await client.accessibility()
print(f"Found {len(tree.tree)} nodes")
asyncio.run(main())
High-Level Session Helper
from browseragentprotocol.context import bap_session, role
async with bap_session(
"ws://localhost:9222",
start_url="https://example.com"
) as client:
await client.click(role("button", "Accept"))
content = await client.content()
Sync API (for scripts and notebooks)
from browseragentprotocol import BAPClientSync, role
with BAPClientSync("ws://localhost:9222") as client:
client.launch(browser="chromium", headless=True)
client.create_page(url="https://example.com")
client.click(role("button", "Submit"))
screenshot = client.screenshot()
CLI
# Test connection to a BAP server
bap connect ws://localhost:9222
# Get server info (with JSON output)
bap info ws://localhost:9222 --json
Semantic Selectors
BAP uses semantic selectors instead of brittle CSS selectors:
from browseragentprotocol import role, text, label, css, xpath, test_id, ref
# Recommended: Semantic selectors
role("button", "Submit") # ARIA role + accessible name
text("Sign in") # Visible text content
label("Email address") # Associated label
# Developer-controlled identifiers
test_id("submit-button") # data-testid attribute
# Stable element references
ref("@submitBtn") # Element ref from agent/observe
# Fallback: CSS/XPath
css(".btn-primary")
xpath("//button[@type='submit']")
AI Agent Methods
BAP provides three composite methods optimized for AI agents:
agent/observe - Get AI-optimized page snapshots
observation = await client.observe(
include_accessibility=True,
include_interactive_elements=True,
include_screenshot=True,
max_elements=50,
annotate_screenshot=True, # Set-of-Marks style annotation
)
# Interactive elements with stable refs
for element in observation.interactive_elements:
print(f"{element.ref}: {element.role} - {element.name}")
# @e1: button - Submit
# @e2: textbox - Email
# Screenshot with numbered badges linking to elements
if observation.annotation_map:
for annotation in observation.annotation_map:
print(f"[{annotation.label}] -> {annotation.ref}")
agent/act - Execute multi-step sequences atomically
from browseragentprotocol import BAPClient
result = await client.act([
BAPClient.step("action/fill", {"selector": label("Email"), "value": "user@example.com"}),
BAPClient.step("action/fill", {"selector": label("Password"), "value": "secret123"}),
BAPClient.step("action/click", {"selector": role("button", "Sign In")}),
])
print(f"Completed {result.completed}/{result.total} steps")
print(f"Success: {result.success}")
agent/extract - Extract structured data
data = await client.extract(
instruction="Extract all product names and prices",
schema={
"type": "array",
"items": {
"type": "object",
"properties": {
"name": {"type": "string"},
"price": {"type": "number"},
},
},
},
)
if data.success:
for product in data.data:
print(f"{product['name']}: ${product['price']}")
Multi-Context Support
Create isolated browser contexts with separate cookies/storage:
# Create isolated context
context = await client.create_context(
context_id="user-session",
options={
"viewport": {"width": 1920, "height": 1080},
"locale": "en-US",
},
)
# Create page in specific context
page = await client.create_page(
url="https://example.com",
context_id=context.context_id,
)
# Clean up
await client.destroy_context(context.context_id)
Frame Support
Navigate iframes and cross-origin frames:
# List frames
frames = await client.list_frames()
for frame in frames.frames:
print(f"{frame.frame_id}: {frame.url}")
# Switch to iframe
await client.switch_frame(selector=css("iframe#payment"))
# Interact within frame
await client.fill(label("Card number"), "4242424242424242")
# Return to main frame
await client.main_frame()
Human-in-the-Loop Approval
Handle approval requests for sensitive actions:
def handle_approval(params):
print(f"Approval needed: {params.rule}")
print(f"Action: {params.original_request}")
# In a real app, show UI to user
return "approve"
client.on_approval_required(handle_approval)
# Respond to approval request
await client.respond_to_approval(
request_id="...",
decision="approve", # or "deny", "approve-session"
reason="User approved the action",
)
Error Handling
from browseragentprotocol import (
BAPError,
BAPTimeoutError,
BAPElementNotFoundError,
BAPApprovalDeniedError,
)
try:
await client.click(role("button", "Missing"))
except BAPTimeoutError as e:
print(f"Timeout: {e.message}")
if e.retryable:
# Retry the operation
pass
except BAPElementNotFoundError as e:
print(f"Element not found: {e.details}")
except BAPApprovalDeniedError as e:
print(f"Action denied: {e.message}")
except BAPError as e:
print(f"Error {e.code}: {e.message}")
Requirements
- Python 3.10+
- aiohttp >= 3.9.0
- pydantic >= 2.0.0
- anyio >= 4.0.0
- httpx >= 0.27.0
- httpx-sse >= 0.4.0
License
Apache-2.0
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file browser_agent_protocol-0.9.0.tar.gz.
File metadata
- Download URL: browser_agent_protocol-0.9.0.tar.gz
- Upload date:
- Size: 34.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
56684b63775c9f407badf35ef79454b5574a055576ba8ae38256eda926dc2a4c
|
|
| MD5 |
a912492e80b4d59a7866390b2c76dfa5
|
|
| BLAKE2b-256 |
19196d94a9593db97af6d7d367c7a9a2087a8ec580fabb360d763ec3b4156ae1
|
Provenance
The following attestation bundles were made for browser_agent_protocol-0.9.0.tar.gz:
Publisher:
release.yml on browseragentprotocol/bap
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
browser_agent_protocol-0.9.0.tar.gz -
Subject digest:
56684b63775c9f407badf35ef79454b5574a055576ba8ae38256eda926dc2a4c - Sigstore transparency entry: 1154217764
- Sigstore integration time:
-
Permalink:
browseragentprotocol/bap@6e4d6bdff5b9c256552ccc6a3b2d1072f79bba9e -
Branch / Tag:
refs/heads/main - Owner: https://github.com/browseragentprotocol
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@6e4d6bdff5b9c256552ccc6a3b2d1072f79bba9e -
Trigger Event:
push
-
Statement type:
File details
Details for the file browser_agent_protocol-0.9.0-py3-none-any.whl.
File metadata
- Download URL: browser_agent_protocol-0.9.0-py3-none-any.whl
- Upload date:
- Size: 43.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c08b52dd21723149aeca23bdb807dfb729b581828718ef54c48c29ba623a256f
|
|
| MD5 |
ec69c23522656a1d49c7172f568c044d
|
|
| BLAKE2b-256 |
d352cf57c8a344d4e5e36e6ebeae0a16b57583ed8a1d90bd9cb49d38a0da1701
|
Provenance
The following attestation bundles were made for browser_agent_protocol-0.9.0-py3-none-any.whl:
Publisher:
release.yml on browseragentprotocol/bap
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
browser_agent_protocol-0.9.0-py3-none-any.whl -
Subject digest:
c08b52dd21723149aeca23bdb807dfb729b581828718ef54c48c29ba623a256f - Sigstore transparency entry: 1154217787
- Sigstore integration time:
-
Permalink:
browseragentprotocol/bap@6e4d6bdff5b9c256552ccc6a3b2d1072f79bba9e -
Branch / Tag:
refs/heads/main - Owner: https://github.com/browseragentprotocol
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@6e4d6bdff5b9c256552ccc6a3b2d1072f79bba9e -
Trigger Event:
push
-
Statement type: