Clawome — AI browser agent. One command to run any web task.
Project description
中文 | English
Clawome
One API call. Any web task. Done.
Give your AI agent a natural language goal — Clawome plans, browses, and returns structured results.
Task Agent API • Quick Start • DOM Compression • Benchmarks • Roadmap
Task Agent API
One POST request. Clawome handles the rest — planning subtasks, controlling the browser, reading pages, and returning results.
curl -X POST http://localhost:5001/api/agent/start \
-H "Content-Type: application/json" \
-d '{"description": "Find AI-related graduate programs at NYU Tandon School of Engineering"}'
Poll progress:
curl http://localhost:5001/api/agent/status
{
"status": "completed",
"final_result": "NYU Tandon offers these AI-related programs: ...",
"subtasks": [
{"step": 1, "goal": "Visit NYU Tandon website", "status": "completed"},
{"step": 2, "goal": "Extract program list", "status": "completed"}
],
"llm_usage": {"calls": 12, "input_tokens": 25000, "total_tokens": 28000}
}
Cancel if needed:
curl -X POST http://localhost:5001/api/agent/stop
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/agent/start |
Submit a task (natural language) |
| GET | /api/agent/status |
Poll progress, subtasks, and results |
| POST | /api/agent/stop |
Cancel running task |
Start parameters:
| Field | Type | Description |
|---|---|---|
task |
string | Task description (required) |
max_steps |
number | Override step limit for this task (default: 15) |
Status values: idle → starting → running → completed / failed / cancelled
Tips for Writing Tasks
Bad: "打开深圳大学网站看看有什么内容"
Good: "打开 https://www.szu.edu.cn 首页,提取导航栏、最新3条新闻和通知公告"
- Give a URL — avoid letting the agent guess where to go
- Specify what to extract — "top 5 news" is better than "all news"
- Complex tasks? Increase steps —
"max_steps": 30for multi-page tasks - Or split into smaller tasks — each task focused on one page or one goal
How It Works
Your API call → Task Agent → Plan subtasks → Execute browser actions → Return results
↑ |
└── evaluate & replan if needed ─────────┘
The agent uses a LangGraph state machine internally: perceive page → plan next step → execute action → sense result → repeat until done.
Features
- Natural language tasks — Describe what you want in plain language
- Multi-step planning — Automatically breaks complex tasks into subtasks
- Smart execution — Perceive → Plan → Act → Sense loop with retry and anomaly detection
- Markdown results — Final results formatted in Markdown with structured data
- 12+ LLM providers — OpenAI, Anthropic, Google, DeepSeek, DashScope, Moonshot, Zhipu, Mistral, Groq, xAI, and more
- Safety constraints — Browser-only actions, hard step limits
DOM Compression
Under the hood, the Task Agent sees web pages through Clawome's DOM compressor — turning 300K tokens of raw HTML into ~3K tokens of clean, structured trees.
You can also use this directly as a standalone API for your own agents:
# Open a page
curl -X POST http://localhost:5001/api/browser/open \
-d '{"url": "https://www.google.com"}'
# Read compressed DOM
curl http://localhost:5001/api/browser/dom
[1] form(role="search")
[1.1] textarea(name="q", placeholder="Search")
[1.2] button: Google Search
[1.3] button: I'm Feeling Lucky
[2] a(href): About
[3] a(href): Gmail
- 100:1 compression ratio on typical web pages
- Preserves visible text, interactive elements, and semantic structure
- Hierarchical node IDs (e.g.,
1.2.3) for precise element targeting - Site-specific optimizers for Google, Wikipedia, Stack Overflow, YouTube, etc.
- Lite mode for even more aggressive token savings
Dashboard
- Browser Playground — Interactive DOM viewer and browser control
- Agent UI — Task input, real-time progress tracking, collapsible step details
- Settings — LLM provider config, browser options, compression settings
- API Docs — Built-in documentation with Chinese/English support
Quick Start
Prerequisites: Python 3.10+
pip install clawome # Install from PyPI
clawome start # Guided setup + start server
If clawome command is not found after install, use:
python -m clawome start # Alternative way to run
clawome start will walk you through LLM configuration (provider, API key, model), then start the backend and install Playwright chromium automatically.
Dashboard: http://localhost:5173
API: http://localhost:5001
Then run tasks from the terminal:
clawome "去Hacker News找最新AI新闻" # Submit task & auto-poll
clawome status # Check progress
clawome stop # Cancel task
clawome "complex task" --max-steps 30 # Override step limit
clawome setup # Reconfigure LLM settings
You can also skip CLI setup and configure via Dashboard > Settings.
Start backend or frontend separately
./start-backend.sh # Only API server → http://localhost:5001
./start-frontend.sh # Only Dashboard → http://localhost:5173
Manual setup
# Backend
cd backend
python -m venv venv
source venv/bin/activate # Windows: venv\Scripts\activate
pip install -r requirements.txt
playwright install chromium
python app.py # http://localhost:5001
# Frontend (in another terminal)
cd frontend
npm install
npm run dev # http://localhost:5173
Full API Reference
Browser APIs — Navigation, DOM, Interaction (used internally by Task Agent, also available standalone)
Navigation
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/browser/open |
Open URL (launches browser if needed) |
| POST | /api/browser/back |
Navigate back |
| POST | /api/browser/forward |
Navigate forward |
| POST | /api/browser/refresh |
Reload page |
DOM
| Method | Endpoint | Description |
|---|---|---|
| GET/POST | /api/browser/dom |
Get compressed DOM tree |
| POST | /api/browser/dom/detail |
Get element details (rect, attributes) |
| POST | /api/browser/text |
Get plain text content of a node |
| GET | /api/browser/source |
Get raw page HTML |
Interaction
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/browser/click |
Click element |
| POST | /api/browser/type |
Type text (keyboard events) |
| POST | /api/browser/fill |
Fill input field |
| POST | /api/browser/select |
Select dropdown option |
| POST | /api/browser/check |
Toggle checkbox |
| POST | /api/browser/hover |
Hover element |
| POST | /api/browser/scroll/down |
Scroll down |
| POST | /api/browser/scroll/up |
Scroll up |
| POST | /api/browser/keypress |
Press key |
| POST | /api/browser/hotkey |
Press key combo |
Token Optimization
All action endpoints support optional parameters to reduce response size:
refresh_dom: false— Skip DOM refresh after action (saves tokens)fields: ["dom", "stats"]— Return only selected fields
Benchmarks
| Page | Raw HTML | Compressed | Savings | Completeness |
|---|---|---|---|---|
| Google Homepage | 51K | 238 | 99.5% | 100% |
| Google Search | 298K | 2,866 | 99.0% | 100% |
| Wikipedia Article | 225K | 40K | 82.1% | 99.7% |
| Baidu Homepage | 192K | 457 | 99.8% | 100% |
| Baidu Search | 390K | 4,960 | 98.7% | 100% |
Completeness = percentage of visible text preserved in the compressed tree.
Supported LLM Providers
| Provider | Model Examples |
|---|---|
| DashScope (Qwen) | qwen-plus, qwen-max, qwen3.5-plus |
| OpenAI | gpt-4o, gpt-4o-mini |
| Anthropic | claude-sonnet-4-20250514, claude-haiku |
| gemini-2.0-flash, gemini-pro | |
| DeepSeek | deepseek-chat, deepseek-reasoner |
| Mistral | mistral-large-latest |
| Groq | llama-3.1-70b |
| xAI | grok-2 |
| Moonshot | moonshot-v1-8k |
| Zhipu | glm-4 |
| Custom | Any OpenAI-compatible endpoint |
Roadmap
- DOM compression API with pluggable site-specific scripts
- Task Agent with multi-step planning and autonomous browsing
- Multi-provider LLM support (12+ providers)
- Chinese/English bilingual dashboard
- MCP (Model Context Protocol) server integration
- Visual grounding — screenshot-based element location
- Multi-agent collaboration
Third-Party Libraries
| Library | License | Usage |
|---|---|---|
| Playwright | Apache 2.0 | Browser automation |
| Flask | BSD 3-Clause | REST API server |
| React | MIT | Frontend UI |
| LangGraph | MIT | Agent workflow engine |
| LiteLLM | MIT | Multi-provider LLM routing |
| Pydantic | MIT | Schema validation |
License
Apache License 2.0 - see LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file clawome-0.1.6.tar.gz.
File metadata
- Download URL: clawome-0.1.6.tar.gz
- Upload date:
- Size: 3.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
82cbeb2c72cff4cfd84c013061d0f518f033ee8974393f383044ad16f8e59a9e
|
|
| MD5 |
742b06fc73cd2b4c079243621c61f861
|
|
| BLAKE2b-256 |
4653ed892be4b84d03b2a75492c32329158baa5e7e27a53c9734cf7a6f4bad5e
|
File details
Details for the file clawome-0.1.6-py3-none-any.whl.
File metadata
- Download URL: clawome-0.1.6-py3-none-any.whl
- Upload date:
- Size: 3.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
67693238e599ef7571405be65c902cb8af4a0fad8a6d56bde4c6a7fa2c8e1cec
|
|
| MD5 |
f5f95b9a2704d0ccb838f5601a897269
|
|
| BLAKE2b-256 |
849a88d4187e6ae0d3e418a01fc6e9b6baf67592b6c026968aff3acb535c675a
|