Surfari: Modular browser automation with LLM
Project description
Surfari
Surfari is a modular, LLM-powered browser automation framework built on Playwright.
It enables secure, scriptable, and intelligent interactions with websites — perfect for data extraction, automated workflows, and AI-assisted navigation.
✨ Key Features
-
Automatic Record, Parameterize & Replay
Surfari automatically records both the exact sequence of LLM actions and a generalized, parameterized workflow at the same time.
When running new tasks, Surfari plugs in the new values, replays the known workflow, and invokes the LLM only for review or recovery.
🔑 Unique: Replays are fast and stable, while parameterization makes them flexible and reusable for new but structurally similar tasks. -
Self-Healing Replay
If the recorded path fails due to layout drift, Surfari seamlessly switches to real-time LLM reasoning for that step, then resumes deterministic replay — combining stability with resilience. -
Agent Delegation & Collaboration
A Navigation Agent can pause its own run and delegate subtasks to another agent in a separate tab, then resume after the subtask completes.
Enables branching workflows, multi-agent collaboration, and parallel subtasks — like a team of agents cooperating inside one browser. -
Human-in-the-Loop Delegation
When needed, Surfari can gracefully delegate control back to a human operator.
You complete the missing step in the live browser, then the agent continues the workflow automatically. -
Stable, Text-Based UI Targets
Instead of brittle XPaths or random IDs, Surfari uses semantic text annotations as selectors.
Enables highly stable record/replay with stable, meaningful UI targets. -
Visual Decisioning (Action Box Overlay)
Surfari can show the LLM’s reasoning and intended action in an on-page action box overlay next to the targeted element — making the agent’s decisions transparent, reviewable, and debuggable. -
Configurable LLM Models (No Coding Required)
Swap models like Google Gemini, OpenAI GPT, Anthropic Claude, just by name in config — no code changes needed. -
Information Masking
Automatically masks and unmasks account numbers, balances, and any digit-like strings, ensuring sensitive data remains protected during logs, prompts, and replays. -
One or Multiple Actions Per Turn
Choose between step-by-step interactivity (safer on dynamic sites) or multi-action per turn (faster on static or more predictable sites/workflows). -
Custom Value Resolvers (Beyond Tool Calling)
Unknown form values (inputs, select options, etc.) can be resolved automatically via direct APIs, retrieval-augmented search, or custom resolvers — without requiring tool calls through the LLM. -
Tool Calling Integration
- Python Tools: Easy integration via function calling.
- MCP Tools: Stdio or HTTP servers supported for external integrations.
-
Screenshots for Grounding
Use screenshots as additional context for the LLM to ensure accurate reasoning (a tad slower) Supports saving screenshots for later review. -
PDF Download Automation
Downloads PDFs from both direct download links and embedded Chrome PDF viewers. -
Batch Execution from CSV
Run or schedule multiple tasks in one batch — each task can target a different site, goal, or credential set, with its own settings (e.g., single vs. multi-action per turn, record/replay on/off, masking enabled/disabled, screenshots enabled/disabled). -
OTP Handling
Automatically solves text-message OTPs by setting up SMS forwarding from your phone to your Gmail, then auto-filling them during login. -
Google Tools Integration
Out-of-the-box support for Gmail, Google Sheets, and Google Docs. -
Deployment Options
- CLI Binaries: Platform-specific executables — no Python setup required. Just download and run.
- Docker Deployment: Cloud mode with VNC-based browser streaming. Provision a VM and access the remote browser directly from your web browser.
🚀 Quick Start
Surfari can be used in two ways:
- Directly via the prebuilt CLI (no Python setup needed)
- From Python code (full flexibility in your own scripts)
Option 1: Run the CLI
- Download the prebuilt CLI zip for your platform (Linux, Windows, macOS) from the Surfari Releases page.
- Unzip the archive.
- Open a terminal / command prompt and change into the
navigation_clifolder. - Set your API key environment variable (example: Gemini):
export GEMINI_API_KEY=your_api_key_here # macOS / Linux set GEMINI_API_KEY=your_api_key_here # Windows CMD
Other supported keys:
OPENAI_API_KEYfor OpenAI GPT modelsANTHROPIC_API_KEYfor Anthropic Claude models
- Check the CLI help:
./navigation_cli --help # macOS / Linux navigation_cli.exe --help # Windows
- Adjust configuration (optional):
- Edit
_internal/surfari/util/config.json, or - Pass overrides with command-line arguments.
- Edit
Option 2: Run from Python
-
Install Surfari:
pip install surfari # optionally install chromium. If chromium is not installed, system Chrome browser will be used. python -m playwright install chromium
-
Set your API key as above (
GEMINI_API_KEY,OPENAI_API_KEY, orANTHROPIC_API_KEY).
You can also put it in a.envfile and load it withdotenv. -
Write a script (example below uses Expedia):
import asyncio from dotenv import load_dotenv load_dotenv() # load .env file if present from surfari.agents.navigation_agent import NavigationAgent async def main(): site_name = "Expedia" task_goal = "Find cheapest direct flight ticket from SFO to New York leaving on first week of Nov 2025, returning 10 days later" nav_agent = NavigationAgent(site_name=site_name) answer = await nav_agent.run(task_goal=task_goal) print("Final answer:", answer) asyncio.run(main())
-
Run your script:
python my_script.py -
Switch models (optional):
nav_agent = NavigationAgent(site_name="Expedia", model="gpt-5-mini") # uses OPENAI_API_KEY
🔐 Credential Storage
- Linux: Key stored in
~/.surfari/key_stringwith permissionsrw-------(chmod 600). - macOS: Key stored in
~/.surfari/key_stringor system keyring (viakeyringlibrary). - Windows: Key stored in system keyring (via
keyringlibrary). - Database: Encrypted SQLite in your Surfari environment.
🛠 Development
git clone https://github.com/surfari-ai/surfari.git
cd surfari
pip install -e .[dev]
python -m playwright install chromium
📂 Project Structure
src/surfari/
├── __init__.py
├── util/config.json
├── security/site_credential_manager.py
├── agents/
│ └── navigation_agent/
├── view/html_to_text.js
└── security/credentials.db
🤝 Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/new-thing) - Commit changes (
git commit -m "Add new thing") - Push to branch (
git push origin feature/new-thing) - Open a Pull Request
📜 License
MIT License — see LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file surfari-0.1.10.tar.gz.
File metadata
- Download URL: surfari-0.1.10.tar.gz
- Upload date:
- Size: 141.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
98a28b74f6fe2a92c3ab6f9f14c45ea046ab576b71abf6943d5051aae7294f5a
|
|
| MD5 |
b7b06eeeaf4014232f4fc17ccf7eecdb
|
|
| BLAKE2b-256 |
ddf1610a3592b09f9cfabccc9986d55def4c7ca997f7abc5325882ab767d8924
|
File details
Details for the file surfari-0.1.10-py3-none-any.whl.
File metadata
- Download URL: surfari-0.1.10-py3-none-any.whl
- Upload date:
- Size: 148.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8fb135dab282a17219cb0637ccee0d71a9736c9fec9d7e58517a5d26f6244edf
|
|
| MD5 |
e2cf8e226a6664cc24a35eceff3751e6
|
|
| BLAKE2b-256 |
d123bfe47ff3b52a8b21ffa31c265f08e807cdf4037bf4b11264df47bf4dabc6
|