Skip to main content

Python SDK for Sentience AI Agent Browser Automation

Project description

Sentience Python SDK

The SDK is open under ELv2; the core semantic geometry and reliability logic runs in Sentience-hosted services.

๐Ÿ“ฆ Installation

# Install from PyPI
pip install sentienceapi

# Install Playwright browsers (required)
playwright install chromium

# For LLM Agent features (optional)
pip install openai  # For OpenAI models
pip install anthropic  # For Claude models
pip install transformers torch  # For local LLMs

For local development:

pip install -e .

๐Ÿš€ Quick Start: Choose Your Abstraction Level

Sentience SDK offers three abstraction levels - use what fits your needs:

๐ŸŽฏ Level 3: Natural Language (Easiest) - For non-technical users
from sentience import SentienceBrowser, ConversationalAgent
from sentience.llm_provider import OpenAIProvider

browser = SentienceBrowser()
llm = OpenAIProvider(api_key="your-key", model="gpt-4o")
agent = ConversationalAgent(browser, llm)

with browser:
    response = agent.execute("Search for magic mouse on google.com")
    print(response)
    # โ†’ "I searched for 'magic mouse' and found several results.
    #    The top result is from amazon.com selling Magic Mouse 2 for $79."

Best for: End users, chatbots, no-code platforms Code required: 3-5 lines Technical knowledge: None

โš™๏ธ Level 2: Technical Commands (Recommended) - For AI developers
from sentience import SentienceBrowser, SentienceAgent
from sentience.llm_provider import OpenAIProvider

browser = SentienceBrowser()
llm = OpenAIProvider(api_key="your-key", model="gpt-4o")
agent = SentienceAgent(browser, llm)

with browser:
    browser.page.goto("https://google.com")
    agent.act("Click the search box")
    agent.act("Type 'magic mouse' into the search field")
    agent.act("Press Enter key")

Best for: Building AI agents, automation scripts Code required: 10-15 lines Technical knowledge: Medium (Python basics)

๐Ÿ”ง Level 1: Direct SDK (Most Control) - For production automation
from sentience import SentienceBrowser, snapshot, find, click

with SentienceBrowser(headless=False) as browser:
    browser.page.goto("https://example.com")

    # Take snapshot - captures all interactive elements
    snap = snapshot(browser)
    print(f"Found {len(snap.elements)} elements")

    # Find and click a link using semantic selectors
    link = find(snap, "role=link text~'More information'")
    if link:
        result = click(browser, link.id)
        print(f"Click success: {result.success}")

Best for: Maximum control, performance-critical apps Code required: 20-50 lines Technical knowledge: High (SDK API, selectors)


๐Ÿ’ผ Real-World Example: Amazon Shopping Bot

This example demonstrates navigating Amazon, finding products, and adding items to cart:

from sentience import SentienceBrowser, snapshot, find, click
import time

with SentienceBrowser(headless=False) as browser:
    # Navigate to Amazon Best Sellers
    browser.goto("https://www.amazon.com/gp/bestsellers/", wait_until="domcontentloaded")
    time.sleep(2)  # Wait for dynamic content

    # Take snapshot and find products
    snap = snapshot(browser)
    print(f"Found {len(snap.elements)} elements")

    # Find first product in viewport using spatial filtering
    products = [
        el for el in snap.elements
        if el.role == "link"
        and el.visual_cues.is_clickable
        and el.in_viewport
        and not el.is_occluded
        and el.bbox.y < 600  # First row
    ]

    if products:
        # Sort by position (left to right, top to bottom)
        products.sort(key=lambda e: (e.bbox.y, e.bbox.x))
        first_product = products[0]

        print(f"Clicking: {first_product.text}")
        result = click(browser, first_product.id)

        # Wait for product page
        browser.page.wait_for_load_state("networkidle")
        time.sleep(2)

        # Find and click "Add to Cart" button
        product_snap = snapshot(browser)
        add_to_cart = find(product_snap, "role=button text~'add to cart'")

        if add_to_cart:
            cart_result = click(browser, add_to_cart.id)
            print(f"Added to cart: {cart_result.success}")

๐Ÿ“– See the complete tutorial: Amazon Shopping Guide


๐Ÿ“š Core Features

๐ŸŒ Browser Control

  • SentienceBrowser - Playwright browser with Sentience extension pre-loaded
  • browser.goto(url) - Navigate with automatic extension readiness checks
  • Automatic bot evasion and stealth mode
  • Configurable headless/headed mode

๐Ÿ“ธ Snapshot - Intelligent Page Analysis

snapshot(browser, screenshot=True, show_overlay=False) - Capture page state with AI-ranked elements

Features:

  • Returns semantic elements with roles, text, importance scores, and bounding boxes
  • Optional screenshot capture (PNG/JPEG)
  • Optional visual overlay to see what elements are detected
  • Pydantic models for type safety
  • snapshot.save(filepath) - Export to JSON

Example:

snap = snapshot(browser, screenshot=True, show_overlay=True)

# Access structured data
print(f"URL: {snap.url}")
print(f"Viewport: {snap.viewport.width}x{snap.viewport.height}")
print(f"Elements: {len(snap.elements)}")

# Iterate over elements
for element in snap.elements:
    print(f"{element.role}: {element.text} (importance: {element.importance})")

๐Ÿ” Query Engine - Semantic Element Selection

  • query(snapshot, selector) - Find all matching elements
  • find(snapshot, selector) - Find single best match (by importance)
  • Powerful query DSL with multiple operators

Query Examples:

# Find by role and text
button = find(snap, "role=button text='Sign in'")

# Substring match (case-insensitive)
link = find(snap, "role=link text~'more info'")

# Spatial filtering
top_left = find(snap, "bbox.x<=100 bbox.y<=200")

# Multiple conditions (AND logic)
primary_btn = find(snap, "role=button clickable=true visible=true importance>800")

# Prefix/suffix matching
starts_with = find(snap, "text^='Add'")
ends_with = find(snap, "text$='Cart'")

# Numeric comparisons
important = query(snap, "importance>=700")
first_row = query(snap, "bbox.y<600")

๐Ÿ“– Complete Query DSL Guide - All operators, fields, and advanced patterns

๐Ÿ‘† Actions - Interact with Elements

  • click(browser, element_id) - Click element by ID
  • click_rect(browser, rect) - Click at center of rectangle (coordinate-based)
  • type_text(browser, element_id, text) - Type into input fields
  • press(browser, key) - Press keyboard keys (Enter, Escape, Tab, etc.)

All actions return ActionResult with success status, timing, and outcome:

result = click(browser, element.id)

print(f"Success: {result.success}")
print(f"Outcome: {result.outcome}")  # "navigated", "dom_updated", "error"
print(f"Duration: {result.duration_ms}ms")
print(f"URL changed: {result.url_changed}")

Coordinate-based clicking:

from sentience import click_rect

# Click at center of rectangle (x, y, width, height)
click_rect(browser, {"x": 100, "y": 200, "w": 50, "h": 30})

# With visual highlight (default: red border for 2 seconds)
click_rect(browser, {"x": 100, "y": 200, "w": 50, "h": 30}, highlight=True, highlight_duration=2.0)

# Using element's bounding box
snap = snapshot(browser)
element = find(snap, "role=button")
if element:
    click_rect(browser, {
        "x": element.bbox.x,
        "y": element.bbox.y,
        "w": element.bbox.width,
        "h": element.bbox.height
    })

โฑ๏ธ Wait & Assertions

  • wait_for(browser, selector, timeout=5.0, interval=None, use_api=None) - Wait for element to appear
  • expect(browser, selector) - Assertion helper with fluent API

Examples:

# Wait for element (auto-detects optimal interval based on API usage)
result = wait_for(browser, "role=button text='Submit'", timeout=10.0)
if result.found:
    print(f"Found after {result.duration_ms}ms")

# Use local extension with fast polling (0.25s interval)
result = wait_for(browser, "role=button", timeout=5.0, use_api=False)

# Use remote API with network-friendly polling (1.5s interval)
result = wait_for(browser, "role=button", timeout=5.0, use_api=True)

# Custom interval override
result = wait_for(browser, "role=button", timeout=5.0, interval=0.5, use_api=False)

# Semantic wait conditions
wait_for(browser, "clickable=true", timeout=5.0)  # Wait for clickable element
wait_for(browser, "importance>100", timeout=5.0)  # Wait for important element
wait_for(browser, "role=link visible=true", timeout=5.0)  # Wait for visible link

# Assertions
expect(browser, "role=button text='Submit'").to_exist(timeout=5.0)
expect(browser, "role=heading").to_be_visible()
expect(browser, "role=button").to_have_text("Submit")
expect(browser, "role=link").to_have_count(10)

๐ŸŽจ Visual Overlay - Debug Element Detection

  • show_overlay(browser, elements, target_element_id=None) - Display visual overlay highlighting elements
  • clear_overlay(browser) - Clear overlay manually

Show color-coded borders around detected elements to debug, validate, and understand what Sentience sees:

from sentience import show_overlay, clear_overlay

# Take snapshot once
snap = snapshot(browser)

# Show overlay anytime without re-snapshotting
show_overlay(browser, snap)  # Auto-clears after 5 seconds

# Highlight specific target element in red
button = find(snap, "role=button text~'Submit'")
show_overlay(browser, snap, target_element_id=button.id)

# Clear manually before 5 seconds
import time
time.sleep(2)
clear_overlay(browser)

Color Coding:

  • ๐Ÿ”ด Red: Target element
  • ๐Ÿ”ต Blue: Primary elements (is_primary=true)
  • ๐ŸŸข Green: Regular interactive elements

Visual Indicators:

  • Border thickness/opacity scales with importance
  • Semi-transparent fill
  • Importance badges
  • Star icons for primary elements
  • Auto-clear after 5 seconds

๐Ÿ“„ Content Reading

read(browser, format="text|markdown|raw") - Extract page content

  • format="text" - Plain text extraction
  • format="markdown" - High-quality markdown conversion (uses markdownify)
  • format="raw" - Cleaned HTML (default)

Example:

from sentience import read

# Get markdown content
result = read(browser, format="markdown")
print(result["content"])  # Markdown text

# Get plain text
result = read(browser, format="text")
print(result["content"])  # Plain text

๐Ÿ“ท Screenshots

screenshot(browser, format="png|jpeg", quality=80) - Standalone screenshot capture

  • Returns base64-encoded data URL
  • PNG or JPEG format
  • Quality control for JPEG (1-100)

Example:

from sentience import screenshot
import base64

# Capture PNG screenshot
data_url = screenshot(browser, format="png")

# Save to file
image_data = base64.b64decode(data_url.split(",")[1])
with open("screenshot.png", "wb") as f:
    f.write(image_data)

# JPEG with quality control (smaller file size)
data_url = screenshot(browser, format="jpeg", quality=85)

๐Ÿ“‹ Reference

Element Properties

Elements returned by snapshot() have the following properties:

element.id              # Unique identifier for interactions
element.role            # ARIA role (button, link, textbox, heading, etc.)
element.text            # Visible text content
element.importance      # AI importance score (0-1000)
element.bbox            # Bounding box (x, y, width, height)
element.visual_cues     # Visual analysis (is_primary, is_clickable, background_color)
element.in_viewport     # Is element visible in current viewport?
element.is_occluded     # Is element covered by other elements?
element.z_index         # CSS stacking order

Query DSL Reference

Basic Operators

Operator Description Example
= Exact match role=button
!= Exclusion role!=link
~ Substring (case-insensitive) text~'sign in'
^= Prefix match text^='Add'
$= Suffix match text$='Cart'
>, >= Greater than importance>500
<, <= Less than bbox.y<600

Supported Fields

  • Role: role=button|link|textbox|heading|...
  • Text: text, text~, text^=, text$=
  • Visibility: clickable=true|false, visible=true|false
  • Importance: importance, importance>=N, importance<N
  • Position: bbox.x, bbox.y, bbox.width, bbox.height
  • Layering: z_index

โš™๏ธ Configuration

Viewport Size

Default viewport is 1280x800 pixels. You can customize it using Playwright's API:

with SentienceBrowser(headless=False) as browser:
    # Set custom viewport before navigating
    browser.page.set_viewport_size({"width": 1920, "height": 1080})

    browser.goto("https://example.com")

Headless Mode

# Headed mode (default in dev, shows browser window)
browser = SentienceBrowser(headless=False)

# Headless mode (default in CI environments)
browser = SentienceBrowser(headless=True)

# Auto-detect based on environment
browser = SentienceBrowser()  # headless=True if CI=true, else False

๐ŸŒ Residential Proxy Support

Use residential proxies to route traffic and protect your IP address. Supports HTTP, HTTPS, and SOCKS5 with automatic SSL certificate handling:

# Method 1: Direct configuration
browser = SentienceBrowser(proxy="http://user:pass@proxy.example.com:8080")

# Method 2: Environment variable
# export SENTIENCE_PROXY="http://user:pass@proxy.example.com:8080"
browser = SentienceBrowser()

# Works with agents
llm = OpenAIProvider(api_key="your-key", model="gpt-4o")
agent = SentienceAgent(browser, llm)

with browser:
    browser.page.goto("https://example.com")
    agent.act("Search for products")
    # All traffic routed through proxy with WebRTC leak protection

Features:

  • HTTP, HTTPS, SOCKS5 proxy support
  • Username/password authentication
  • Automatic self-signed SSL certificate handling
  • WebRTC IP leak protection (automatic)

See examples/residential_proxy_agent.py for complete examples.

๐Ÿ” Authentication Session Injection

Inject pre-recorded authentication sessions (cookies + localStorage) to start your agent already logged in, bypassing login screens, 2FA, and CAPTCHAs. This saves tokens and reduces costs by eliminating login steps.

# Workflow 1: Inject pre-recorded session from file
from sentience import SentienceBrowser, save_storage_state

# Save session after manual login
browser = SentienceBrowser()
browser.start()
browser.goto("https://example.com")
# ... log in manually ...
save_storage_state(browser.context, "auth.json")

# Use saved session in future runs
browser = SentienceBrowser(storage_state="auth.json")
browser.start()
# Agent starts already logged in!

# Workflow 2: Persistent sessions (cookies persist across runs)
browser = SentienceBrowser(user_data_dir="./chrome_profile")
browser.start()
# First run: Log in
# Second run: Already logged in (cookies persist automatically)

Benefits:

  • Bypass login screens and CAPTCHAs with valid sessions
  • Save 5-10 agent steps and hundreds of tokens per run
  • Maintain stateful sessions for accessing authenticated pages
  • Act as authenticated users (e.g., "Go to my Orders page")

See examples/auth_injection_agent.py for complete examples.


๐Ÿ’ก Best Practices

Click to expand best practices

1. Wait for Dynamic Content

browser.goto("https://example.com", wait_until="domcontentloaded")
time.sleep(1)  # Extra buffer for AJAX/animations

2. Use Multiple Strategies for Finding Elements

# Try exact match first
btn = find(snap, "role=button text='Add to Cart'")

# Fallback to fuzzy match
if not btn:
    btn = find(snap, "role=button text~='cart'")

3. Check Element Visibility Before Clicking

if element.in_viewport and not element.is_occluded:
    click(browser, element.id)

4. Handle Navigation

result = click(browser, link_id)
if result.url_changed:
    browser.page.wait_for_load_state("networkidle")

5. Use Screenshots Sparingly

# Fast - no screenshot (only element data)
snap = snapshot(browser)

# Slower - with screenshot (for debugging/verification)
snap = snapshot(browser, screenshot=True)

๐Ÿ› ๏ธ Troubleshooting

Click to expand common issues and solutions

"Extension failed to load"

Solution: Build the extension first:

cd sentience-chrome
./build.sh

"Element not found"

Solutions:

  • Ensure page is loaded: browser.page.wait_for_load_state("networkidle")
  • Use wait_for(): wait_for(browser, "role=button", timeout=10)
  • Debug elements: print([el.text for el in snap.elements])

Button not clickable

Solutions:

  • Check visibility: element.in_viewport and not element.is_occluded
  • Scroll to element: browser.page.evaluate(f"window.sentience_registry[{element.id}].scrollIntoView()")

๐Ÿ”ฌ Advanced Features (v0.12.0+)

๐Ÿ“Š Agent Tracing & Debugging

The SDK now includes built-in tracing infrastructure for debugging and analyzing agent behavior:

from sentience import SentienceBrowser, SentienceAgent
from sentience.llm_provider import OpenAIProvider
from sentience.tracing import Tracer, JsonlTraceSink
from sentience.agent_config import AgentConfig

# Create tracer to record agent execution
tracer = Tracer(
    run_id="my-agent-run-123",
    sink=JsonlTraceSink("trace.jsonl")
)

# Configure agent behavior
config = AgentConfig(
    snapshot_limit=50,
    temperature=0.0,
    max_retries=1,
    capture_screenshots=True
)

browser = SentienceBrowser()
llm = OpenAIProvider(api_key="your-key", model="gpt-4o")

# Pass tracer and config to agent
agent = SentienceAgent(browser, llm, tracer=tracer, config=config)

with browser:
    browser.page.goto("https://example.com")

    # All actions are automatically traced
    agent.act("Click the sign in button")
    agent.act("Type 'user@example.com' into email field")

# Trace events saved to trace.jsonl
# Events: step_start, snapshot, llm_query, action, step_end, error

Trace Events Captured:

  • step_start - Agent begins executing a goal
  • snapshot - Page state captured
  • llm_query - LLM decision made (includes tokens, model, response)
  • action - Action executed (click, type, press)
  • step_end - Step completed successfully
  • error - Error occurred during execution

Use Cases:

  • Debug why agent failed or got stuck
  • Analyze token usage and costs
  • Replay agent sessions
  • Train custom models from successful runs
  • Monitor production agents

๐Ÿงฐ Snapshot Utilities

New utility functions for working with snapshots:

from sentience import snapshot
from sentience.utils import compute_snapshot_digests, canonical_snapshot_strict
from sentience.formatting import format_snapshot_for_llm

snap = snapshot(browser)

# Compute snapshot fingerprints (detect page changes)
digests = compute_snapshot_digests(snap.elements)
print(f"Strict digest: {digests['strict']}")  # Changes when text changes
print(f"Loose digest: {digests['loose']}")   # Only changes when layout changes

# Format snapshot for LLM prompts
llm_context = format_snapshot_for_llm(snap, limit=50)
print(llm_context)
# Output: [1] <button> "Sign In" {PRIMARY,CLICKABLE} @ (100,50) (Imp:10)

๐Ÿ“– Documentation


๐Ÿ’ป Examples & Testing

Examples

See the examples/ directory for complete working examples:

  • hello.py - Extension bridge verification
  • basic_agent.py - Basic snapshot and element inspection
  • query_demo.py - Query engine demonstrations
  • wait_and_click.py - Waiting for elements and performing actions
  • read_markdown.py - Content extraction and markdown conversion

Testing

# Run all tests
pytest tests/

# Run specific test file
pytest tests/test_snapshot.py

# Run with verbose output
pytest -v tests/

๐Ÿ“œ License

This SDK is licensed under the Elastic License 2.0 (ELv2).

The Elastic License 2.0 allows you to use, modify, and distribute this SDK for internal, research, and non-competitive purposes. It does not permit offering this SDK or a derivative as a hosted or managed service, nor using it to build a competing product or service.

Important Notes

  • This SDK is a client-side library that communicates with proprietary Sentience services and browser components.

  • The Sentience backend services (including semantic geometry grounding, ranking, visual cues, and trace processing) are not open source and are governed by Sentience's Terms of Service.

  • Use of this SDK does not grant rights to operate, replicate, or reimplement Sentience's hosted services.

For commercial usage, hosted offerings, or enterprise deployments, please contact Sentience to obtain a commercial license.

See the full license text in LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sentienceapi-0.90.6.tar.gz (165.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sentienceapi-0.90.6-py3-none-any.whl (144.2 kB view details)

Uploaded Python 3

File details

Details for the file sentienceapi-0.90.6.tar.gz.

File metadata

  • Download URL: sentienceapi-0.90.6.tar.gz
  • Upload date:
  • Size: 165.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for sentienceapi-0.90.6.tar.gz
Algorithm Hash digest
SHA256 23de05a28cb289c19195829575c5c48e74e9dc7bb7a8bb151a422e269bc6e262
MD5 3c78235c29e7b23fd1ac576d36396645
BLAKE2b-256 7b643005b939b5ec03b316a3e9eef66b535abcf443619c99ad74a8239162df7d

See more details on using hashes here.

File details

Details for the file sentienceapi-0.90.6-py3-none-any.whl.

File metadata

  • Download URL: sentienceapi-0.90.6-py3-none-any.whl
  • Upload date:
  • Size: 144.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.14

File hashes

Hashes for sentienceapi-0.90.6-py3-none-any.whl
Algorithm Hash digest
SHA256 79d50073d954f372f2cc5b678cfba67d0e2a810a5b6879453c4ca3d1449b8a1c
MD5 2e2c35ae6aa109bbcda6d2c060c03998
BLAKE2b-256 f7064efa07f13ab21feaa79244e8e632435eba0cc21ca63ef6196ea0e0041546

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page