Skip to main content

Python SDK for Sentience AI Agent Browser Automation

Project description

Sentience Python SDK

📜 License: Apache License 2.0

Python SDK for Sentience AI Agent Browser Automation. Build intelligent web automation agents that can see, understand, and interact with web pages like humans do.

Installation

pip install -e .

# Install Playwright browsers (required)
playwright install chromium

Quick Start

from sentience import SentienceBrowser, snapshot, find, click

# Start browser with extension
with SentienceBrowser(headless=False) as browser:
    browser.goto("https://example.com", wait_until="domcontentloaded")

    # Take snapshot - captures all interactive elements
    snap = snapshot(browser)
    print(f"Found {len(snap.elements)} elements")

    # Find and click a link using semantic selectors
    link = find(snap, "role=link text~'More information'")
    if link:
        result = click(browser, link.id)
        print(f"Click success: {result.success}")

Real-World Example: Amazon Shopping Bot

This example demonstrates navigating Amazon, finding products, and adding items to cart:

from sentience import SentienceBrowser, snapshot, find, click
import time

with SentienceBrowser(headless=False) as browser:
    # Navigate to Amazon Best Sellers
    browser.goto("https://www.amazon.com/gp/bestsellers/", wait_until="domcontentloaded")
    time.sleep(2)  # Wait for dynamic content

    # Take snapshot and find products
    snap = snapshot(browser)
    print(f"Found {len(snap.elements)} elements")

    # Find first product in viewport using spatial filtering
    products = [
        el for el in snap.elements
        if el.role == "link"
        and el.visual_cues.is_clickable
        and el.in_viewport
        and not el.is_occluded
        and el.bbox.y < 600  # First row
    ]

    if products:
        # Sort by position (left to right, top to bottom)
        products.sort(key=lambda e: (e.bbox.y, e.bbox.x))
        first_product = products[0]

        print(f"Clicking: {first_product.text}")
        result = click(browser, first_product.id)

        # Wait for product page
        browser.page.wait_for_load_state("networkidle")
        time.sleep(2)

        # Find and click "Add to Cart" button
        product_snap = snapshot(browser)
        add_to_cart = find(product_snap, "role=button text~'add to cart'")

        if add_to_cart:
            cart_result = click(browser, add_to_cart.id)
            print(f"Added to cart: {cart_result.success}")

See the complete tutorial: Amazon Shopping Guide

Core Features

Browser Control

  • SentienceBrowser - Playwright browser with Sentience extension pre-loaded
  • browser.goto(url) - Navigate with automatic extension readiness checks
  • Automatic bot evasion and stealth mode
  • Configurable headless/headed mode

Snapshot - Intelligent Page Analysis

  • snapshot(browser, screenshot=True) - Capture page state with AI-ranked elements
  • Returns semantic elements with roles, text, importance scores, and bounding boxes
  • Optional screenshot capture (PNG/JPEG)
  • Pydantic models for type safety
  • snapshot.save(filepath) - Export to JSON

Example:

snap = snapshot(browser, screenshot=True)

# Access structured data
print(f"URL: {snap.url}")
print(f"Viewport: {snap.viewport.width}x{snap.viewport.height}")
print(f"Elements: {len(snap.elements)}")

# Iterate over elements
for element in snap.elements:
    print(f"{element.role}: {element.text} (importance: {element.importance})")

Query Engine - Semantic Element Selection

  • query(snapshot, selector) - Find all matching elements
  • find(snapshot, selector) - Find single best match (by importance)
  • Powerful query DSL with multiple operators

Query Examples:

# Find by role and text
button = find(snap, "role=button text='Sign in'")

# Substring match (case-insensitive)
link = find(snap, "role=link text~'more info'")

# Spatial filtering
top_left = find(snap, "bbox.x<=100 bbox.y<=200")

# Multiple conditions (AND logic)
primary_btn = find(snap, "role=button clickable=true visible=true importance>800")

# Prefix/suffix matching
starts_with = find(snap, "text^='Add'")
ends_with = find(snap, "text$='Cart'")

# Numeric comparisons
important = query(snap, "importance>=700")
first_row = query(snap, "bbox.y<600")

📖 Complete Query DSL Guide - All operators, fields, and advanced patterns

Actions - Interact with Elements

  • click(browser, element_id) - Click element by ID
  • click_rect(browser, rect) - Click at center of rectangle (coordinate-based)
  • type_text(browser, element_id, text) - Type into input fields
  • press(browser, key) - Press keyboard keys (Enter, Escape, Tab, etc.)

All actions return ActionResult with success status, timing, and outcome:

result = click(browser, element.id)

print(f"Success: {result.success}")
print(f"Outcome: {result.outcome}")  # "navigated", "dom_updated", "error"
print(f"Duration: {result.duration_ms}ms")
print(f"URL changed: {result.url_changed}")

Coordinate-based clicking:

from sentience import click_rect

# Click at center of rectangle (x, y, width, height)
click_rect(browser, {"x": 100, "y": 200, "w": 50, "h": 30})

# With visual highlight (default: red border for 2 seconds)
click_rect(browser, {"x": 100, "y": 200, "w": 50, "h": 30}, highlight=True, highlight_duration=2.0)

# Using element's bounding box
snap = snapshot(browser)
element = find(snap, "role=button")
if element:
    click_rect(browser, {
        "x": element.bbox.x,
        "y": element.bbox.y,
        "w": element.bbox.width,
        "h": element.bbox.height
    })

Wait & Assertions

  • wait_for(browser, selector, timeout=5.0, interval=None, use_api=None) - Wait for element to appear
  • expect(browser, selector) - Assertion helper with fluent API

Examples:

# Wait for element (auto-detects optimal interval based on API usage)
result = wait_for(browser, "role=button text='Submit'", timeout=10.0)
if result.found:
    print(f"Found after {result.duration_ms}ms")

# Use local extension with fast polling (0.25s interval)
result = wait_for(browser, "role=button", timeout=5.0, use_api=False)

# Use remote API with network-friendly polling (1.5s interval)
result = wait_for(browser, "role=button", timeout=5.0, use_api=True)

# Custom interval override
result = wait_for(browser, "role=button", timeout=5.0, interval=0.5, use_api=False)

# Semantic wait conditions
wait_for(browser, "clickable=true", timeout=5.0)  # Wait for clickable element
wait_for(browser, "importance>100", timeout=5.0)  # Wait for important element
wait_for(browser, "role=link visible=true", timeout=5.0)  # Wait for visible link

# Assertions
expect(browser, "role=button text='Submit'").to_exist(timeout=5.0)
expect(browser, "role=heading").to_be_visible()
expect(browser, "role=button").to_have_text("Submit")
expect(browser, "role=link").to_have_count(10)

Content Reading

  • read(browser, format="text|markdown|raw") - Extract page content
    • format="text" - Plain text extraction
    • format="markdown" - High-quality markdown conversion (uses markdownify)
    • format="raw" - Cleaned HTML (default)

Example:

from sentience import read

# Get markdown content
result = read(browser, format="markdown")
print(result["content"])  # Markdown text

# Get plain text
result = read(browser, format="text")
print(result["content"])  # Plain text

Screenshots

  • screenshot(browser, format="png|jpeg", quality=80) - Standalone screenshot capture
    • Returns base64-encoded data URL
    • PNG or JPEG format
    • Quality control for JPEG (1-100)

Example:

from sentience import screenshot
import base64

# Capture PNG screenshot
data_url = screenshot(browser, format="png")

# Save to file
image_data = base64.b64decode(data_url.split(",")[1])
with open("screenshot.png", "wb") as f:
    f.write(image_data)

# JPEG with quality control (smaller file size)
data_url = screenshot(browser, format="jpeg", quality=85)

Element Properties

Elements returned by snapshot() have the following properties:

element.id              # Unique identifier for interactions
element.role            # ARIA role (button, link, textbox, heading, etc.)
element.text            # Visible text content
element.importance      # AI importance score (0-1000)
element.bbox            # Bounding box (x, y, width, height)
element.visual_cues     # Visual analysis (is_primary, is_clickable, background_color)
element.in_viewport     # Is element visible in current viewport?
element.is_occluded     # Is element covered by other elements?
element.z_index         # CSS stacking order

Query DSL Reference

Basic Operators

Operator Description Example
= Exact match role=button
!= Exclusion role!=link
~ Substring (case-insensitive) text~'sign in'
^= Prefix match text^='Add'
$= Suffix match text$='Cart'
>, >= Greater than importance>500
<, <= Less than bbox.y<600

Supported Fields

  • Role: role=button|link|textbox|heading|...
  • Text: text, text~, text^=, text$=
  • Visibility: clickable=true|false, visible=true|false
  • Importance: importance, importance>=N, importance<N
  • Position: bbox.x, bbox.y, bbox.width, bbox.height
  • Layering: z_index

Examples

See the examples/ directory for complete working examples:

  • hello.py - Extension bridge verification
  • basic_agent.py - Basic snapshot and element inspection
  • query_demo.py - Query engine demonstrations
  • wait_and_click.py - Waiting for elements and performing actions
  • read_markdown.py - Content extraction and markdown conversion

Testing

# Run all tests
pytest tests/

# Run specific test file
pytest tests/test_snapshot.py

# Run with verbose output
pytest -v tests/

Configuration

Viewport Size

Default viewport is 1280x800 pixels. You can customize it using Playwright's API:

with SentienceBrowser(headless=False) as browser:
    # Set custom viewport before navigating
    browser.page.set_viewport_size({"width": 1920, "height": 1080})

    browser.goto("https://example.com")

Headless Mode

# Headed mode (default in dev, shows browser window)
browser = SentienceBrowser(headless=False)

# Headless mode (default in CI environments)
browser = SentienceBrowser(headless=True)

# Auto-detect based on environment
browser = SentienceBrowser()  # headless=True if CI=true, else False

Best Practices

1. Wait for Dynamic Content

browser.goto("https://example.com", wait_until="domcontentloaded")
time.sleep(1)  # Extra buffer for AJAX/animations

2. Use Multiple Strategies for Finding Elements

# Try exact match first
btn = find(snap, "role=button text='Add to Cart'")

# Fallback to fuzzy match
if not btn:
    btn = find(snap, "role=button text~='cart'")

3. Check Element Visibility Before Clicking

if element.in_viewport and not element.is_occluded:
    click(browser, element.id)

4. Handle Navigation

result = click(browser, link_id)
if result.url_changed:
    browser.page.wait_for_load_state("networkidle")

5. Use Screenshots Sparingly

# Fast - no screenshot (only element data)
snap = snapshot(browser)

# Slower - with screenshot (for debugging/verification)
snap = snapshot(browser, screenshot=True)

Troubleshooting

"Extension failed to load"

Solution: Build the extension first:

cd sentience-chrome
./build.sh

"Element not found"

Solutions:

  • Ensure page is loaded: browser.page.wait_for_load_state("networkidle")
  • Use wait_for(): wait_for(browser, "role=button", timeout=10)
  • Debug elements: print([el.text for el in snap.elements])

Button not clickable

Solutions:

  • Check visibility: element.in_viewport and not element.is_occluded
  • Scroll to element: browser.page.evaluate(f"window.sentience_registry[{element.id}].scrollIntoView()")

Documentation

License

📜 License

This SDK is licensed under the Apache License 2.0.

Note: The SDK communicates with proprietary Sentience services and browser components that are not open source. Access to those components is governed by Sentience's Terms of Service.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sentience_python-0.10.5.tar.gz (44.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sentience_python-0.10.5-py3-none-any.whl (30.7 kB view details)

Uploaded Python 3

File details

Details for the file sentience_python-0.10.5.tar.gz.

File metadata

  • Download URL: sentience_python-0.10.5.tar.gz
  • Upload date:
  • Size: 44.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for sentience_python-0.10.5.tar.gz
Algorithm Hash digest
SHA256 c7bda88eb13b568383740e324ec28a17089c4ca475baf9b04be73282088ec229
MD5 992cf9655ae53ca805dcc7f7130d7c01
BLAKE2b-256 41417dfc410d46cee2b1b36ffc131d3085806c397b4c576ad2097613659dde9c

See more details on using hashes here.

File details

Details for the file sentience_python-0.10.5-py3-none-any.whl.

File metadata

File hashes

Hashes for sentience_python-0.10.5-py3-none-any.whl
Algorithm Hash digest
SHA256 f95ef8c2bb35859fdce6c790d806a4edadecb9b183f003932237bbf4322ae6f5
MD5 68462dda010945e51857d176a3a89b98
BLAKE2b-256 a670b6e3547d90b84c603de51247bc9162fdefbf629edde9afa1841a8c54807e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page