Skip to main content

Clawome — AI browser agent. One command to run any web task.

Project description

中文 | English

Clawome

Clawome

One API call. Any web task. Done.
Give your AI agent a natural language goal — Clawome plans, browses, and returns structured results.

Task Agent APIQuick StartDOM CompressionBenchmarksRoadmap


Task Agent API

One POST request. Clawome handles the rest — planning subtasks, controlling the browser, reading pages, and returning results.

curl -X POST http://localhost:5001/api/agent/start \
  -H "Content-Type: application/json" \
  -d '{"description": "Find AI-related graduate programs at NYU Tandon School of Engineering"}'

Poll progress:

curl http://localhost:5001/api/agent/status
{
  "status": "completed",
  "final_result": "NYU Tandon offers these AI-related programs: ...",
  "subtasks": [
    {"step": 1, "goal": "Visit NYU Tandon website", "status": "completed"},
    {"step": 2, "goal": "Extract program list", "status": "completed"}
  ],
  "llm_usage": {"calls": 12, "input_tokens": 25000, "total_tokens": 28000}
}

Cancel if needed:

curl -X POST http://localhost:5001/api/agent/stop
Method Endpoint Description
POST /api/agent/start Submit a task (natural language)
GET /api/agent/status Poll progress, subtasks, and results
POST /api/agent/stop Cancel running task

Start parameters:

Field Type Description
task string Task description (required)
max_steps number Override step limit for this task (default: 15)

Status values: idlestartingrunningcompleted / failed / cancelled

Tips for Writing Tasks

Bad:  "打开深圳大学网站看看有什么内容"
Good: "打开 https://www.szu.edu.cn 首页,提取导航栏、最新3条新闻和通知公告"
  • Give a URL — avoid letting the agent guess where to go
  • Specify what to extract — "top 5 news" is better than "all news"
  • Complex tasks? Increase steps"max_steps": 30 for multi-page tasks
  • Or split into smaller tasks — each task focused on one page or one goal

How It Works

Your API call → Task Agent → Plan subtasks → Execute browser actions → Return results
                                  ↑                                        |
                                  └── evaluate & replan if needed ─────────┘

The agent uses a LangGraph state machine internally: perceive page → plan next step → execute action → sense result → repeat until done.

Features

  • Natural language tasks — Describe what you want in plain language
  • Multi-step planning — Automatically breaks complex tasks into subtasks
  • Smart execution — Perceive → Plan → Act → Sense loop with retry and anomaly detection
  • Markdown results — Final results formatted in Markdown with structured data
  • 12+ LLM providers — OpenAI, Anthropic, Google, DeepSeek, DashScope, Moonshot, Zhipu, Mistral, Groq, xAI, and more
  • Safety constraints — Browser-only actions, hard step limits

DOM Compression

Under the hood, the Task Agent sees web pages through Clawome's DOM compressor — turning 300K tokens of raw HTML into ~3K tokens of clean, structured trees.

You can also use this directly as a standalone API for your own agents:

# Open a page
curl -X POST http://localhost:5001/api/browser/open \
  -d '{"url": "https://www.google.com"}'

# Read compressed DOM
curl http://localhost:5001/api/browser/dom
[1] form(role="search")
  [1.1] textarea(name="q", placeholder="Search")
  [1.2] button: Google Search
  [1.3] button: I'm Feeling Lucky
[2] a(href): About
[3] a(href): Gmail
  • 100:1 compression ratio on typical web pages
  • Preserves visible text, interactive elements, and semantic structure
  • Hierarchical node IDs (e.g., 1.2.3) for precise element targeting
  • Site-specific optimizers for Google, Wikipedia, Stack Overflow, YouTube, etc.
  • Lite mode for even more aggressive token savings

Dashboard

  • Browser Playground — Interactive DOM viewer and browser control
  • Agent UI — Task input, real-time progress tracking, collapsible step details
  • Settings — LLM provider config, browser options, compression settings
  • API Docs — Built-in documentation with Chinese/English support

Quick Start

Prerequisites: Python 3.10+

Install via pip (Recommended)

pip install clawome         # Install from PyPI
clawome start               # Guided setup + start server

If clawome command is not found after install, use:

python -m clawome start     # Alternative way to run

clawome start will:

  1. Walk you through LLM configuration (provider, API key, model)
  2. Install Playwright Chromium browser automatically
  3. Start the backend server with Dashboard
Server & Dashboard:  http://localhost:5001

Then open another terminal and run tasks:

clawome "Find top AI news on Hacker News"    # Submit task & auto-poll
clawome status                               # Check progress
clawome stop                                 # Cancel task
clawome "complex task" --max-steps 30        # Override step limit
clawome setup                                # Reconfigure LLM settings

Configuration is saved to ~/.clawome/.env. You can also configure via Dashboard > Settings.

Install from source

Clone and run with start.sh
git clone https://github.com/CodingLucasLi/Clawome.git
cd Clawome
cp .env.example .env       # Fill in your LLM API key
./start.sh                 # Start backend + frontend
Dashboard:  http://localhost:5173
API:        http://localhost:5001
Manual setup
# Backend
cd backend
python -m venv venv
source venv/bin/activate    # Windows: venv\Scripts\activate
pip install -r requirements.txt
playwright install chromium
python app.py               # http://localhost:5001

# Frontend (in another terminal)
cd frontend
npm install
npm run dev                 # http://localhost:5173

Full API Reference

Browser APIs — Navigation, DOM, Interaction (used internally by Task Agent, also available standalone)

Navigation

Method Endpoint Description
POST /api/browser/open Open URL (launches browser if needed)
POST /api/browser/back Navigate back
POST /api/browser/forward Navigate forward
POST /api/browser/refresh Reload page

DOM

Method Endpoint Description
GET/POST /api/browser/dom Get compressed DOM tree
POST /api/browser/dom/detail Get element details (rect, attributes)
POST /api/browser/text Get plain text content of a node
GET /api/browser/source Get raw page HTML

Interaction

Method Endpoint Description
POST /api/browser/click Click element
POST /api/browser/type Type text (keyboard events)
POST /api/browser/fill Fill input field
POST /api/browser/select Select dropdown option
POST /api/browser/check Toggle checkbox
POST /api/browser/hover Hover element
POST /api/browser/scroll/down Scroll down
POST /api/browser/scroll/up Scroll up
POST /api/browser/keypress Press key
POST /api/browser/hotkey Press key combo

Token Optimization

All action endpoints support optional parameters to reduce response size:

  • refresh_dom: false — Skip DOM refresh after action (saves tokens)
  • fields: ["dom", "stats"] — Return only selected fields

Benchmarks

Page Raw HTML Compressed Savings Completeness
Google Homepage 51K 238 99.5% 100%
Google Search 298K 2,866 99.0% 100%
Wikipedia Article 225K 40K 82.1% 99.7%
Baidu Homepage 192K 457 99.8% 100%
Baidu Search 390K 4,960 98.7% 100%

Completeness = percentage of visible text preserved in the compressed tree.

Supported LLM Providers

Provider Model Examples
DashScope (Qwen) qwen-plus, qwen-max, qwen3.5-plus
OpenAI gpt-4o, gpt-4o-mini
Anthropic claude-sonnet-4-20250514, claude-haiku
Google gemini-2.0-flash, gemini-pro
DeepSeek deepseek-chat, deepseek-reasoner
Mistral mistral-large-latest
Groq llama-3.1-70b
xAI grok-2
Moonshot moonshot-v1-8k
Zhipu glm-4
Custom Any OpenAI-compatible endpoint

Roadmap

  • DOM compression API with pluggable site-specific scripts
  • Task Agent with multi-step planning and autonomous browsing
  • Multi-provider LLM support (12+ providers)
  • Chinese/English bilingual dashboard
  • MCP (Model Context Protocol) server integration
  • Visual grounding — screenshot-based element location
  • Multi-agent collaboration

Third-Party Libraries

Library License Usage
Playwright Apache 2.0 Browser automation
Flask BSD 3-Clause REST API server
React MIT Frontend UI
LangGraph MIT Agent workflow engine
LiteLLM MIT Multi-provider LLM routing
Pydantic MIT Schema validation

License

Apache License 2.0 - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clawome-0.1.7.tar.gz (3.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clawome-0.1.7-py3-none-any.whl (3.1 MB view details)

Uploaded Python 3

File details

Details for the file clawome-0.1.7.tar.gz.

File metadata

  • Download URL: clawome-0.1.7.tar.gz
  • Upload date:
  • Size: 3.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for clawome-0.1.7.tar.gz
Algorithm Hash digest
SHA256 63ffb41ecb2ba3b53d38271b9cbacf7fc7e3ddfb1f410b7d2754b3ce8183116e
MD5 e879778e906feba76526d7b23bea26da
BLAKE2b-256 1ae10843f84fd4042027a0ed538c3c902e01d983b211f03688835cddb8635fa8

See more details on using hashes here.

File details

Details for the file clawome-0.1.7-py3-none-any.whl.

File metadata

  • Download URL: clawome-0.1.7-py3-none-any.whl
  • Upload date:
  • Size: 3.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for clawome-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 ce90251994e478b32175d0ed94bec088742fe448239d016c9d996cf985094d26
MD5 b3c1e2cec16a838b15b7dd081f72602f
BLAKE2b-256 554b8893cded9c2e0698c43bf6a155253469eb03e3a0ef5b7ad2a4db38d36f18

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page