AI-first browser automation SDK with on-device vision model and natural language selectors

These details have not been verified by PyPI

Project links

Project description

Owl Browser SDK for Python

AI-first browser automation SDK with on-device vision model, natural language selectors, and comprehensive stealth features.

Features

Natural Language Selectors - Click, type, and interact using descriptions like "search button" or "login form"
On-Device Vision Model - Built-in Qwen 3 model for page understanding and CAPTCHA solving
Stealth Mode - Proxy support with timezone spoofing, WebRTC blocking, and anti-detection
Thread-Safe - Designed for concurrent usage with multiple pages
Simple API - Minimal code required for common tasks
Dual Mode - Connect to local browser binary OR remote HTTP server

Installation

pip install owl-browser

Note: You must also build the Owl Browser binary. See the main project README for build instructions.

Quick Start

from owl_browser import Browser

# Simple usage with context manager
with Browser() as browser:
    page = browser.new_page()
    page.goto("https://example.com")

    # Natural language selectors
    page.click("search button")
    page.type("search input", "hello world")

    # Take screenshot
    page.screenshot("screenshot.png")

Remote Mode (HTTP Server)

The SDK supports connecting to a remote Owl Browser HTTP server, enabling:

Cloud deployment - Run browser on remote servers
Distributed scraping - Connect multiple clients to one browser
Resource optimization - Share browser resources across applications

Basic Remote Usage

from owl_browser import Browser, RemoteConfig

# Connect to remote browser server
browser = Browser(remote=RemoteConfig(
    url="http://192.168.1.100:8080",
    token="your-secret-token"
))
browser.launch()

# API is identical to local mode!
page = browser.new_page()
page.goto("https://example.com")
page.click("search button")
page.type("search input", "hello world")
page.screenshot("screenshot.png")

browser.close()

Remote with Context Manager

from owl_browser import Browser, RemoteConfig

with Browser(remote=RemoteConfig(
    url="http://localhost:8080",
    token="secret-token"
)) as browser:
    page = browser.new_page()
    page.goto("https://example.com")
    page.screenshot("screenshot.png")

Async Remote Usage

import asyncio
from owl_browser import AsyncBrowser, RemoteConfig

async def main():
    async with AsyncBrowser(remote=RemoteConfig(
        url="http://192.168.1.100:8080",
        token="your-secret-token"
    )) as browser:
        page = await browser.new_page()
        await page.goto("https://example.com")
        await page.screenshot("screenshot.png")

asyncio.run(main())

JWT Authentication

For enhanced security, the SDK supports JWT (JSON Web Token) authentication with RSA signing. The SDK can automatically generate and refresh JWT tokens using your private key:

from owl_browser import Browser, RemoteConfig, JWTConfig, AuthMode

# Connect with JWT authentication (auto-generated tokens)
browser = Browser(remote=RemoteConfig(
    url="http://192.168.1.100:8080",
    auth_mode=AuthMode.JWT,
    jwt=JWTConfig(
        private_key="/path/to/private.pem",  # RSA private key
        expires_in=3600,                      # Token validity (1 hour)
        refresh_threshold=300,                # Refresh 5 min before expiry
        issuer="my-app",                      # Optional claims
        subject="user-123"
    )
))
browser.launch()

You can also use the JWT utilities directly:

from owl_browser import generate_jwt, decode_jwt, JWTManager, generate_key_pair

# Generate a single token
token = generate_jwt('/path/to/private.pem', expires_in=7200, issuer='my-app')

# Decode a token (without verification)
decoded = decode_jwt(token)
print(f"Expires at: {decoded['payload']['exp']}")

# Use JWTManager for auto-refresh
jwt_manager = JWTManager('/path/to/private.pem', expires_in=3600, refresh_threshold=300)
token = jwt_manager.get_token()  # Auto-refreshes when needed

# Generate new RSA key pair
private_key, public_key = generate_key_pair()
with open('private.pem', 'w') as f:
    f.write(private_key)
with open('public.pem', 'w') as f:
    f.write(public_key)

WebSocket Transport

For lower latency and persistent connections, use WebSocket transport instead of HTTP:

from owl_browser import Browser, RemoteConfig, TransportMode, ReconnectConfig

# WebSocket mode - real-time communication
browser = Browser(remote=RemoteConfig(
    url="http://192.168.1.100:8080",
    token="your-secret-token",
    transport=TransportMode.WEBSOCKET,  # Use WebSocket instead of HTTP
    reconnect=ReconnectConfig(
        enabled=True,
        max_attempts=5,
        initial_delay_ms=1000,
        max_delay_ms=30000
    )
))
browser.launch()

High-Performance Configuration

For high-concurrency workloads, configure retry, and concurrency limits:

from owl_browser import Browser, RemoteConfig, RetryConfig, ConcurrencyConfig

# High-performance HTTP configuration
browser = Browser(remote=RemoteConfig(
    url="http://192.168.1.100:8080",
    token="your-secret-token",
    timeout=30000,
    # Retry configuration with exponential backoff
    retry=RetryConfig(
        max_retries=5,
        initial_delay_ms=100,
        max_delay_ms=10000,
        backoff_multiplier=2.0,
        jitter_factor=0.1
    ),
    # Concurrency limiting
    concurrency=ConcurrencyConfig(
        max_concurrent=50
    )
))
browser.launch()

RemoteConfig Options

Option	Type	Default	Description
`url`	str	required	Base URL of HTTP server (e.g., `http://localhost:8080`)
`token`	str	-	Bearer token (required for TOKEN mode)
`auth_mode`	AuthMode	TOKEN	Authentication mode (TOKEN or JWT)
`jwt`	JWTConfig	-	JWT configuration (required for JWT mode)
`transport`	TransportMode	HTTP	Transport mode (HTTP or WEBSOCKET)
`timeout`	int	30000	Request timeout in milliseconds
`verify_ssl`	bool	True	Verify SSL certificates
`retry`	RetryConfig	-	Retry configuration for HTTP transport
`reconnect`	ReconnectConfig	-	Reconnection config for WebSocket
`concurrency`	ConcurrencyConfig	-	Concurrency limiting config

JWTConfig Options

Option	Type	Default	Description
`private_key`	str	required	Path to RSA private key or PEM string
`expires_in`	int	3600	Token validity in seconds
`refresh_threshold`	int	300	Seconds before expiry to refresh
`issuer`	str	-	Issuer claim (iss)
`subject`	str	-	Subject claim (sub)
`audience`	str	-	Audience claim (aud)
`claims`	dict	-	Additional custom claims

RetryConfig Options

Option	Type	Default	Description
`max_retries`	int	3	Maximum number of retry attempts
`initial_delay_ms`	int	100	Initial delay in milliseconds
`max_delay_ms`	int	10000	Maximum delay cap in milliseconds
`backoff_multiplier`	float	2.0	Multiplier for exponential backoff
`jitter_factor`	float	0.1	Random jitter factor (0-1)

ReconnectConfig Options

Option	Type	Default	Description
`enabled`	bool	True	Whether auto-reconnection is enabled
`max_attempts`	int	5	Maximum reconnection attempts (0 = infinite)
`initial_delay_ms`	int	1000	Initial delay in milliseconds
`max_delay_ms`	int	30000	Maximum delay cap in milliseconds
`backoff_multiplier`	float	2.0	Multiplier for exponential backoff
`jitter_factor`	float	0.1	Random jitter factor (0-1)

ConcurrencyConfig Options

Option	Type	Default	Description
`max_concurrent`	int	10	Maximum concurrent requests

Checking Connection Mode

from owl_browser import Browser, RemoteConfig, ConnectionMode

browser = Browser(remote=RemoteConfig(url="...", token="..."))
browser.launch()

# Check the connection mode
print(f"Mode: {browser.mode}")  # ConnectionMode.REMOTE
print(f"Is Remote: {browser.is_remote}")  # True

browser.close()

Setting Up the HTTP Server

See http-server/README.md for instructions on deploying the Owl Browser HTTP server.

# Start the HTTP server
./owl_browser --http --port 8080 --token "your-secret-token"

Usage Examples

Basic Navigation and Interaction

from owl_browser import Browser

browser = Browser()
browser.launch()

page = browser.new_page()
page.goto("https://example.com")

# Click using various selector types
page.click("#submit")           # CSS selector
page.click("100x200")           # Coordinates
page.click("login button")      # Natural language

# Type into inputs
page.type("#email", "user@example.com")
page.type("password field", "secret123")

# Select from dropdowns
page.pick("country dropdown", "United States")

# Press special keys
from owl_browser import KeyName
page.press_key(KeyName.ENTER)

browser.close()

AI-Powered Features

from owl_browser import Browser

with Browser() as browser:
    page = browser.new_page()
    page.goto("https://news.example.com")

    # Query the page using LLM
    summary = page.query_page("What are the main headlines?")
    print(summary)

    # Get structured page summary
    summary = page.summarize_page()
    print(summary)

    # Execute natural language commands
    page.execute_nla("scroll down and click the first article")

    # Auto-solve CAPTCHAs
    result = page.solve_captcha()
    if result.get("success"):
        print("CAPTCHA solved!")

Concurrent Scraping

from owl_browser import Browser
from concurrent.futures import ThreadPoolExecutor

browser = Browser()
browser.launch()

def scrape_url(url):
    """Scrape a single URL - each call gets its own isolated page."""
    page = browser.new_page()
    try:
        page.goto(url)
        return {
            "url": url,
            "title": page.get_title(),
            "text": page.extract_text("main content")
        }
    finally:
        page.close()

urls = [
    "https://example1.com",
    "https://example2.com",
    "https://example3.com",
]

# Scrape 5 pages concurrently
with ThreadPoolExecutor(max_workers=5) as executor:
    results = list(executor.map(scrape_url, urls))

browser.close()

for result in results:
    print(f"{result['title']}: {result['text'][:100]}...")

Proxy with Stealth

from owl_browser import Browser, ProxyConfig, ProxyType

with Browser() as browser:
    # Create page with proxy and timezone spoofing
    page = browser.new_page(proxy=ProxyConfig(
        type=ProxyType.SOCKS5H,  # SOCKS5 with remote DNS
        host="proxy.example.com",
        port=1080,
        username="user",
        password="pass",
        stealth=True,           # Block WebRTC leaks
        timezone_override="America/New_York"  # Match proxy location
    ))

    page.goto("https://whatismyip.com")
    page.screenshot("proxy-test.png")

    # Check proxy status
    status = page.get_proxy_status()
    print(f"Proxy connected: {status.connected}")

Cookie Management

from owl_browser import Browser

with Browser() as browser:
    page = browser.new_page()
    page.goto("https://example.com")

    # Get all cookies
    cookies = page.get_cookies()
    for cookie in cookies:
        print(f"{cookie.name}: {cookie.value}")

    # Set a cookie
    page.set_cookie(
        url="https://example.com",
        name="session",
        value="abc123",
        secure=True,
        http_only=True
    )

    # Delete specific cookie
    page.delete_cookies("https://example.com", "session")

    # Delete all cookies
    page.delete_cookies()

Video Recording

from owl_browser import Browser

with Browser() as browser:
    page = browser.new_page()

    # Start recording
    page.start_video_recording(fps=30)

    page.goto("https://example.com")
    page.click("some button")
    page.type("input field", "test data")

    # Stop and get video path
    video_path = page.stop_video_recording()
    print(f"Video saved to: {video_path}")

Content Extraction

from owl_browser import Browser, CleanLevel, ExtractionTemplate

with Browser() as browser:
    page = browser.new_page()
    page.goto("https://example.com")

    # Extract text
    text = page.extract_text()
    article = page.extract_text("main article")

    # Get as Markdown
    markdown = page.get_markdown(include_links=True, include_images=False)

    # Get clean HTML
    html = page.get_html(CleanLevel.AGGRESSIVE)

    # Extract structured JSON (auto-detects template)
    data = page.extract_json()

    # Use specific template
    data = page.extract_json(ExtractionTemplate.GOOGLE_SEARCH)

Test Execution

Run tests exported from the Developer Playground:

from owl_browser import Browser

with Browser() as browser:
    page = browser.new_page()

    # Run test from JSON file
    result = page.run_test("my-test.json", verbose=True)

    # Or define inline
    result = page.run_test({
        "name": "Login Test",
        "steps": [
            {"type": "navigate", "url": "https://example.com/login"},
            {"type": "type", "selector": "#email", "text": "user@example.com"},
            {"type": "type", "selector": "#password", "text": "password123"},
            {"type": "click", "selector": "button[type='submit']"},
            {"type": "wait", "duration": 2000},
            {"type": "screenshot", "filename": "logged-in.png"}
        ]
    })

    print(f"Test: {result.test_name}")
    print(f"Success: {result.successful_steps}/{result.total_steps}")
    print(f"Time: {result.execution_time}ms")

Quick Utilities

For simple one-off operations:

from owl_browser import quick_screenshot, quick_extract, quick_query

# Take a quick screenshot
quick_screenshot("https://example.com", "example.png")

# Extract text quickly
text = quick_extract("https://example.com", "main content")

# Query a page
answer = quick_query("https://news.com", "What is the top headline?")

Async/Await Usage

For asyncio-based applications:

import asyncio
from owl_browser import AsyncBrowser

async def main():
    async with AsyncBrowser() as browser:
        page = await browser.new_page()
        await page.goto("https://example.com")

        # All methods are async
        await page.click("search button")
        await page.type("search input", "hello world")

        text = await page.extract_text()
        await page.screenshot("screenshot.png")

asyncio.run(main())

Concurrent Async Scraping

import asyncio
from owl_browser import AsyncBrowser, ProxyConfig, ProxyType

async def scrape_url(browser, url):
    """Scrape a single URL."""
    page = await browser.new_page()
    try:
        await page.goto(url)
        return {
            "url": url,
            "title": await page.get_title(),
            "text": await page.extract_text("main")
        }
    finally:
        await page.close()

async def main():
    urls = [
        "https://example1.com",
        "https://example2.com",
        "https://example3.com",
    ]

    async with AsyncBrowser() as browser:
        # Scrape all URLs concurrently
        results = await asyncio.gather(*[
            scrape_url(browser, url) for url in urls
        ])

    for result in results:
        print(f"{result['title']}: {result['text'][:50]}...")

asyncio.run(main())

Quick Async Utilities

import asyncio
from owl_browser import async_screenshot, async_extract, async_query

async def main():
    # Quick screenshot
    await async_screenshot("https://example.com", "example.png")

    # Quick extraction
    text = await async_extract("https://example.com", "main content")

    # Quick LLM query
    answer = await async_query("https://news.com", "What is the top headline?")

asyncio.run(main())

API Reference

Browser

Method	Description
`launch()`	Start the browser process
`new_page(proxy?, llm?)`	Create a new page (context)
`pages()`	Get all active pages
`get_llm_status()`	Check LLM availability
`get_demographics()`	Get location, time, weather
`close()`	Close browser and all pages

BrowserContext (Page)

Navigation

Method	Description
`goto(url)`	Navigate to URL
`reload(ignore_cache?)`	Reload page
`go_back()`	Navigate back
`go_forward()`	Navigate forward

Interaction

Method	Description
`click(selector)`	Click element
`type(selector, text)`	Type into input
`pick(selector, value)`	Select from dropdown
`press_key(key)`	Press special key
`submit_form()`	Submit focused form
`highlight(selector)`	Highlight element

Content

Method	Description
`extract_text(selector?)`	Extract text content
`get_html(clean_level?)`	Get HTML
`get_markdown(...)`	Get as Markdown
`extract_json(template?)`	Extract structured JSON
`summarize_page()`	Get LLM page summary

AI Features

Method	Description
`query_page(query)`	Ask LLM about page
`execute_nla(command)`	Execute NL command
`solve_captcha()`	Auto-solve CAPTCHA

Screenshot & Video

Method	Description
`screenshot(path?)`	Take screenshot
`start_video_recording(fps?)`	Start recording
`stop_video_recording()`	Stop and save video

Cookies & Proxy

Method	Description
`get_cookies(url?)`	Get cookies
`set_cookie(...)`	Set a cookie
`delete_cookies(...)`	Delete cookies
`set_proxy(config)`	Configure proxy
`get_proxy_status()`	Get proxy status
`connect_proxy()`	Enable proxy
`disconnect_proxy()`	Disable proxy

Error Handling

The SDK provides specific exception types for different error scenarios:

from owl_browser import (
    Browser, RemoteConfig, JWTConfig, AuthMode,
    AuthenticationError, RateLimitError, IPBlockedError,
    LicenseError, BrowserInitializationError
)

try:
    browser = Browser(remote=RemoteConfig(
        url="http://localhost:8080",
        auth_mode=AuthMode.JWT,
        jwt=JWTConfig(private_key="/path/to/private.pem")
    ))
    browser.launch()

    page = browser.new_page()
    page.goto("https://example.com")

except AuthenticationError as e:
    # 401 - Invalid or expired token
    print(f"Auth failed: {e.message}")
    print(f"Reason: {e.reason}")

except RateLimitError as e:
    # 429 - Too many requests
    print(f"Rate limited. Retry after: {e.retry_after} seconds")

except IPBlockedError as e:
    # 403 - IP not whitelisted
    print(f"IP blocked: {e.ip_address}")

except LicenseError as e:
    # License validation failed
    print(f"License error: {e.status}")

except BrowserInitializationError as e:
    print(f"Failed to start browser: {e}")

finally:
    browser.close()

Exception Types

Exception	HTTP Code	Description
`AuthenticationError`	401	Invalid/expired token or JWT signature mismatch
`RateLimitError`	429	Too many requests, includes `retry_after` in seconds
`IPBlockedError`	403	Client IP not in whitelist
`LicenseError`	503	Browser license validation failed
`BrowserInitializationError`	-	Failed to start/connect to browser
`CommandTimeoutError`	-	Operation timed out
`ElementNotFoundError`	-	Element not found on page

Thread Safety

The SDK is designed for concurrent usage:

Browser instance can be shared across threads
Each BrowserContext (page) is isolated
Multiple pages can run operations simultaneously
IPC communication is thread-safe with proper locking

Best practice for concurrent scraping:

Create one Browser instance
Create separate pages for each concurrent task
Close pages when done to free resources

Requirements

Python 3.8+
macOS or Linux
Built Owl Browser binary

License

MIT License - see the main project for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.0.9

Mar 2, 2026

2.0.7

Feb 19, 2026

2.0.6

Feb 18, 2026

2.0.5

Feb 15, 2026

2.0.4

Feb 13, 2026

2.0.3

Feb 13, 2026

2.0.2

Feb 10, 2026

2.0.1

Feb 10, 2026

2.0.0

Jan 19, 2026

1.2.6

Jan 9, 2026

1.2.5

Jan 6, 2026

1.2.4

Jan 3, 2026

1.2.3

Dec 11, 2025

1.2.2

Dec 8, 2025

This version

1.0.0

Dec 7, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

owl_browser-1.0.0.tar.gz (59.9 kB view details)

Uploaded Dec 7, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

owl_browser-1.0.0-py3-none-any.whl (60.5 kB view details)

Uploaded Dec 7, 2025 Python 3

File details

Details for the file owl_browser-1.0.0.tar.gz.

File metadata

Download URL: owl_browser-1.0.0.tar.gz
Upload date: Dec 7, 2025
Size: 59.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for owl_browser-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`2f246864373582c8cf11f2acee0d9cb955dd4c08d1a403b414098b33a3f438e2`
MD5	`20d0bc002ab5203f0849e6d70021c202`
BLAKE2b-256	`368f78d90478b7df9f01393bfc9c712ece1c6a1579b0715e5f03ebf3b44d0dfc`

See more details on using hashes here.

File details

Details for the file owl_browser-1.0.0-py3-none-any.whl.

File metadata

Download URL: owl_browser-1.0.0-py3-none-any.whl
Upload date: Dec 7, 2025
Size: 60.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for owl_browser-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5ada293870283fe04c91a31ade184923126e59c69374fcafb464815705052775`
MD5	`6edd12575409ed4b9cc4b411aced0267`
BLAKE2b-256	`117f60e5aa459d5e346a6702abec710f0a1de53f03d5b1c8257eb71ab34e68a3`

See more details on using hashes here.

owl-browser 1.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Owl Browser SDK for Python

Features

Installation

Quick Start

Remote Mode (HTTP Server)

Basic Remote Usage

Remote with Context Manager

Async Remote Usage

JWT Authentication

WebSocket Transport

High-Performance Configuration

RemoteConfig Options

JWTConfig Options

RetryConfig Options

ReconnectConfig Options

ConcurrencyConfig Options

Checking Connection Mode

Setting Up the HTTP Server

Usage Examples

Basic Navigation and Interaction

AI-Powered Features

Concurrent Scraping

Proxy with Stealth

Cookie Management

Video Recording

Content Extraction

Test Execution

Quick Utilities

Async/Await Usage

Concurrent Async Scraping

Quick Async Utilities

API Reference

Browser

BrowserContext (Page)

Navigation

Interaction

Content

AI Features

Screenshot & Video

Cookies & Proxy

Error Handling

Exception Types

Thread Safety

Requirements

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes