clawome

Clawome — AI browser agent. One command to run any web task.

Project description

中文 | English

Clawome

One API call. Any web task. Done.
Give your AI agent a natural language goal — Clawome plans, browses, and returns structured results.

Task Agent API • Quick Start • DOM Compression • Benchmarks • Roadmap

Task Agent API

One POST request. Clawome handles the rest — planning subtasks, controlling the browser, reading pages, and returning results.

curl -X POST http://localhost:5001/api/agent/start \
  -H "Content-Type: application/json" \
  -d '{"description": "Find AI-related graduate programs at NYU Tandon School of Engineering"}'

Poll progress:

curl http://localhost:5001/api/agent/status

{
  "status": "completed",
  "final_result": "NYU Tandon offers these AI-related programs: ...",
  "subtasks": [
    {"step": 1, "goal": "Visit NYU Tandon website", "status": "completed"},
    {"step": 2, "goal": "Extract program list", "status": "completed"}
  ],
  "llm_usage": {"calls": 12, "input_tokens": 25000, "total_tokens": 28000}
}

Cancel if needed:

curl -X POST http://localhost:5001/api/agent/stop

Method	Endpoint	Description
POST	`/api/agent/start`	Submit a task (natural language)
GET	`/api/agent/status`	Poll progress, subtasks, and results
POST	`/api/agent/stop`	Cancel running task

Start parameters:

Field	Type	Description
`task`	string	Task description (required)
`max_steps`	number	Override step limit for this task (default: 15)

Status values: idle → starting → running → completed / failed / cancelled

Tips for Writing Tasks

Bad:  "打开深圳大学网站看看有什么内容"
Good: "打开 https://www.szu.edu.cn 首页，提取导航栏、最新3条新闻和通知公告"

Give a URL — avoid letting the agent guess where to go
Specify what to extract — "top 5 news" is better than "all news"
Complex tasks? Increase steps — "max_steps": 30 for multi-page tasks
Or split into smaller tasks — each task focused on one page or one goal

How It Works

Your API call → Task Agent → Plan subtasks → Execute browser actions → Return results
                                  ↑                                        |
                                  └── evaluate & replan if needed ─────────┘

The agent uses a LangGraph state machine internally: perceive page → plan next step → execute action → sense result → repeat until done.

Features

Natural language tasks — Describe what you want in plain language
Multi-step planning — Automatically breaks complex tasks into subtasks
Smart execution — Perceive → Plan → Act → Sense loop with retry and anomaly detection
Markdown results — Final results formatted in Markdown with structured data
12+ LLM providers — OpenAI, Anthropic, Google, DeepSeek, DashScope, Moonshot, Zhipu, Mistral, Groq, xAI, and more
Safety constraints — Browser-only actions, hard step limits

DOM Compression

Under the hood, the Task Agent sees web pages through Clawome's DOM compressor — turning 300K tokens of raw HTML into ~3K tokens of clean, structured trees.

You can also use this directly as a standalone API for your own agents:

# Open a page
curl -X POST http://localhost:5001/api/browser/open \
  -d '{"url": "https://www.google.com"}'

# Read compressed DOM
curl http://localhost:5001/api/browser/dom

[1] form(role="search")
  [1.1] textarea(name="q", placeholder="Search")
  [1.2] button: Google Search
  [1.3] button: I'm Feeling Lucky
[2] a(href): About
[3] a(href): Gmail

100:1 compression ratio on typical web pages
Preserves visible text, interactive elements, and semantic structure
Hierarchical node IDs (e.g., 1.2.3) for precise element targeting
Site-specific optimizers for Google, Wikipedia, Stack Overflow, YouTube, etc.
Lite mode for even more aggressive token savings

Dashboard

Browser Playground — Interactive DOM viewer and browser control
Agent UI — Task input, real-time progress tracking, collapsible step details
Settings — LLM provider config, browser options, compression settings
API Docs — Built-in documentation with Chinese/English support

Quick Start

Prerequisites: Python 3.10+

pip install clawome         # Install from PyPI
clawome start               # Guided setup + start server

If clawome command is not found after install, use:

python -m clawome start     # Alternative way to run

clawome start will walk you through LLM configuration (provider, API key, model), then start the backend and install Playwright chromium automatically.

Dashboard:  http://localhost:5173
API:        http://localhost:5001

Then run tasks from the terminal:

clawome "去Hacker News找最新AI新闻"          # Submit task & auto-poll
clawome status                               # Check progress
clawome stop                                 # Cancel task
clawome "complex task" --max-steps 30        # Override step limit
clawome setup                                # Reconfigure LLM settings

You can also skip CLI setup and configure via Dashboard > Settings.

Start backend or frontend separately

./start-backend.sh         # Only API server → http://localhost:5001
./start-frontend.sh        # Only Dashboard  → http://localhost:5173

Manual setup

# Backend
cd backend
python -m venv venv
source venv/bin/activate    # Windows: venv\Scripts\activate
pip install -r requirements.txt
playwright install chromium
python app.py               # http://localhost:5001

# Frontend (in another terminal)
cd frontend
npm install
npm run dev                 # http://localhost:5173

Full API Reference

Browser APIs — Navigation, DOM, Interaction (used internally by Task Agent, also available standalone)

Navigation

Method	Endpoint	Description
POST	`/api/browser/open`	Open URL (launches browser if needed)
POST	`/api/browser/back`	Navigate back
POST	`/api/browser/forward`	Navigate forward
POST	`/api/browser/refresh`	Reload page

DOM

Method	Endpoint	Description
GET/POST	`/api/browser/dom`	Get compressed DOM tree
POST	`/api/browser/dom/detail`	Get element details (rect, attributes)
POST	`/api/browser/text`	Get plain text content of a node
GET	`/api/browser/source`	Get raw page HTML

Interaction

Method	Endpoint	Description
POST	`/api/browser/click`	Click element
POST	`/api/browser/type`	Type text (keyboard events)
POST	`/api/browser/fill`	Fill input field
POST	`/api/browser/select`	Select dropdown option
POST	`/api/browser/check`	Toggle checkbox
POST	`/api/browser/hover`	Hover element
POST	`/api/browser/scroll/down`	Scroll down
POST	`/api/browser/scroll/up`	Scroll up
POST	`/api/browser/keypress`	Press key
POST	`/api/browser/hotkey`	Press key combo

Token Optimization

All action endpoints support optional parameters to reduce response size:

refresh_dom: false — Skip DOM refresh after action (saves tokens)
fields: ["dom", "stats"] — Return only selected fields

Benchmarks

Page	Raw HTML	Compressed	Savings	Completeness
Google Homepage	51K	238	99.5%	100%
Google Search	298K	2,866	99.0%	100%
Wikipedia Article	225K	40K	82.1%	99.7%
Baidu Homepage	192K	457	99.8%	100%
Baidu Search	390K	4,960	98.7%	100%

Completeness = percentage of visible text preserved in the compressed tree.

Supported LLM Providers

Provider	Model Examples
DashScope (Qwen)	qwen-plus, qwen-max, qwen3.5-plus
OpenAI	gpt-4o, gpt-4o-mini
Anthropic	claude-sonnet-4-20250514, claude-haiku
Google	gemini-2.0-flash, gemini-pro
DeepSeek	deepseek-chat, deepseek-reasoner
Mistral	mistral-large-latest
Groq	llama-3.1-70b
xAI	grok-2
Moonshot	moonshot-v1-8k
Zhipu	glm-4
Custom	Any OpenAI-compatible endpoint

Roadmap

DOM compression API with pluggable site-specific scripts
Task Agent with multi-step planning and autonomous browsing
Multi-provider LLM support (12+ providers)
Chinese/English bilingual dashboard
MCP (Model Context Protocol) server integration
Visual grounding — screenshot-based element location
Multi-agent collaboration

Third-Party Libraries

Library	License	Usage
Playwright	Apache 2.0	Browser automation
Flask	BSD 3-Clause	REST API server
React	MIT	Frontend UI
LangGraph	MIT	Agent workflow engine
LiteLLM	MIT	Multi-provider LLM routing
Pydantic	MIT	Schema validation

License

Apache License 2.0 - see LICENSE for details.

Project details

Release history Release notifications | RSS feed

0.1.8

Mar 9, 2026

0.1.7

Mar 3, 2026

0.1.6

Feb 26, 2026

This version

0.1.5

Feb 26, 2026

0.1.3

Feb 26, 2026

0.1.2

Feb 26, 2026

0.1.1

Feb 26, 2026

0.1.0

Feb 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clawome-0.1.5.tar.gz (3.1 MB view details)

Uploaded Feb 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

clawome-0.1.5-py3-none-any.whl (3.1 MB view details)

Uploaded Feb 26, 2026 Python 3

File details

Details for the file clawome-0.1.5.tar.gz.

File metadata

Download URL: clawome-0.1.5.tar.gz
Upload date: Feb 26, 2026
Size: 3.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for clawome-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`376bc5e80da6243b566597ddaf4207ed1016e0d51c5b3ce5c52e480c2743ca9a`
MD5	`e3d693d2f65c2b08d3af221dce949fb2`
BLAKE2b-256	`31b3fe35e74f607d08c3462b2b30ec9c78834a3e53a9e3c1722e70d7fba62fdc`

See more details on using hashes here.

File details

Details for the file clawome-0.1.5-py3-none-any.whl.

File metadata

Download URL: clawome-0.1.5-py3-none-any.whl
Upload date: Feb 26, 2026
Size: 3.1 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for clawome-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`5e771d0702d6340201ebba23a2619cefc7a528fdae67bb8151eb0202fa7193ac`
MD5	`0d198f1b448e0d68f505eb2165a0c027`
BLAKE2b-256	`cbc3fa3961a08aac9ca09fffa4a4a31c0d68f7c0aa9f46a2b3eec97194559172`

See more details on using hashes here.

clawome 0.1.5

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Clawome

Task Agent API

Tips for Writing Tasks

How It Works

Features

DOM Compression

Dashboard

Quick Start

Full API Reference

Navigation

DOM

Interaction

Token Optimization

Benchmarks

Supported LLM Providers

Roadmap

Third-Party Libraries

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes