Skip to main content

Clawome — AI browser agent. One command to run any web task.

Project description

中文 | English

Clawome

Clawome

One API call. Any web task. Done.
Give your AI agent a natural language goal — Clawome plans, browses, and returns structured results.

Task Agent APIQuick StartDOM CompressionBenchmarksRoadmap


Task Agent API

One POST request. Clawome handles the rest — planning subtasks, controlling the browser, reading pages, and returning results.

curl -X POST http://localhost:5001/api/agent/start \
  -H "Content-Type: application/json" \
  -d '{"description": "Find AI-related graduate programs at NYU Tandon School of Engineering"}'

Poll progress:

curl http://localhost:5001/api/agent/status
{
  "status": "completed",
  "final_result": "NYU Tandon offers these AI-related programs: ...",
  "subtasks": [
    {"step": 1, "goal": "Visit NYU Tandon website", "status": "completed"},
    {"step": 2, "goal": "Extract program list", "status": "completed"}
  ],
  "llm_usage": {"calls": 12, "input_tokens": 25000, "total_tokens": 28000}
}

Cancel if needed:

curl -X POST http://localhost:5001/api/agent/stop
Method Endpoint Description
POST /api/agent/start Submit a task (natural language)
GET /api/agent/status Poll progress, subtasks, and results
POST /api/agent/stop Cancel running task

Start parameters:

Field Type Description
task string Task description (required)
max_steps number Override step limit for this task (default: 15)

Status values: idlestartingrunningcompleted / failed / cancelled

Tips for Writing Tasks

Bad:  "打开深圳大学网站看看有什么内容"
Good: "打开 https://www.szu.edu.cn 首页,提取导航栏、最新3条新闻和通知公告"
  • Give a URL — avoid letting the agent guess where to go
  • Specify what to extract — "top 5 news" is better than "all news"
  • Complex tasks? Increase steps"max_steps": 30 for multi-page tasks
  • Or split into smaller tasks — each task focused on one page or one goal

How It Works

Your API call → Task Agent → Plan subtasks → Execute browser actions → Return results
                                  ↑                                        |
                                  └── evaluate & replan if needed ─────────┘

The agent uses a LangGraph state machine internally: perceive page → plan next step → execute action → sense result → repeat until done.

Features

  • Natural language tasks — Describe what you want in plain language
  • Multi-step planning — Automatically breaks complex tasks into subtasks
  • Smart execution — Perceive → Plan → Act → Sense loop with retry and anomaly detection
  • Markdown results — Final results formatted in Markdown with structured data
  • 12+ LLM providers — OpenAI, Anthropic, Google, DeepSeek, DashScope, Moonshot, Zhipu, Mistral, Groq, xAI, and more
  • Safety constraints — Browser-only actions, hard step limits

DOM Compression

Under the hood, the Task Agent sees web pages through Clawome's DOM compressor — turning 300K tokens of raw HTML into ~3K tokens of clean, structured trees.

You can also use this directly as a standalone API for your own agents:

# Open a page
curl -X POST http://localhost:5001/api/browser/open \
  -d '{"url": "https://www.google.com"}'

# Read compressed DOM
curl http://localhost:5001/api/browser/dom
[1] form(role="search")
  [1.1] textarea(name="q", placeholder="Search")
  [1.2] button: Google Search
  [1.3] button: I'm Feeling Lucky
[2] a(href): About
[3] a(href): Gmail
  • 100:1 compression ratio on typical web pages
  • Preserves visible text, interactive elements, and semantic structure
  • Hierarchical node IDs (e.g., 1.2.3) for precise element targeting
  • Site-specific optimizers for Google, Wikipedia, Stack Overflow, YouTube, etc.
  • Lite mode for even more aggressive token savings

Dashboard

  • Browser Playground — Interactive DOM viewer and browser control
  • Agent UI — Task input, real-time progress tracking, collapsible step details
  • Settings — LLM provider config, browser options, compression settings
  • API Docs — Built-in documentation with Chinese/English support

Quick Start

Prerequisites: Python 3.10+

pip install clawome         # Install from PyPI
clawome start               # Guided setup + start server

If clawome command is not found after install, use:

python -m clawome start     # Alternative way to run

clawome start will walk you through LLM configuration (provider, API key, model), then start the backend and install Playwright chromium automatically.

Dashboard:  http://localhost:5173
API:        http://localhost:5001

Then run tasks from the terminal:

clawome "去Hacker News找最新AI新闻"          # Submit task & auto-poll
clawome status                               # Check progress
clawome stop                                 # Cancel task
clawome "complex task" --max-steps 30        # Override step limit
clawome setup                                # Reconfigure LLM settings

You can also skip CLI setup and configure via Dashboard > Settings.

Start backend or frontend separately
./start-backend.sh         # Only API server → http://localhost:5001
./start-frontend.sh        # Only Dashboard  → http://localhost:5173
Manual setup
# Backend
cd backend
python -m venv venv
source venv/bin/activate    # Windows: venv\Scripts\activate
pip install -r requirements.txt
playwright install chromium
python app.py               # http://localhost:5001

# Frontend (in another terminal)
cd frontend
npm install
npm run dev                 # http://localhost:5173

Full API Reference

Browser APIs — Navigation, DOM, Interaction (used internally by Task Agent, also available standalone)

Navigation

Method Endpoint Description
POST /api/browser/open Open URL (launches browser if needed)
POST /api/browser/back Navigate back
POST /api/browser/forward Navigate forward
POST /api/browser/refresh Reload page

DOM

Method Endpoint Description
GET/POST /api/browser/dom Get compressed DOM tree
POST /api/browser/dom/detail Get element details (rect, attributes)
POST /api/browser/text Get plain text content of a node
GET /api/browser/source Get raw page HTML

Interaction

Method Endpoint Description
POST /api/browser/click Click element
POST /api/browser/type Type text (keyboard events)
POST /api/browser/fill Fill input field
POST /api/browser/select Select dropdown option
POST /api/browser/check Toggle checkbox
POST /api/browser/hover Hover element
POST /api/browser/scroll/down Scroll down
POST /api/browser/scroll/up Scroll up
POST /api/browser/keypress Press key
POST /api/browser/hotkey Press key combo

Token Optimization

All action endpoints support optional parameters to reduce response size:

  • refresh_dom: false — Skip DOM refresh after action (saves tokens)
  • fields: ["dom", "stats"] — Return only selected fields

Benchmarks

Page Raw HTML Compressed Savings Completeness
Google Homepage 51K 238 99.5% 100%
Google Search 298K 2,866 99.0% 100%
Wikipedia Article 225K 40K 82.1% 99.7%
Baidu Homepage 192K 457 99.8% 100%
Baidu Search 390K 4,960 98.7% 100%

Completeness = percentage of visible text preserved in the compressed tree.

Supported LLM Providers

Provider Model Examples
DashScope (Qwen) qwen-plus, qwen-max, qwen3.5-plus
OpenAI gpt-4o, gpt-4o-mini
Anthropic claude-sonnet-4-20250514, claude-haiku
Google gemini-2.0-flash, gemini-pro
DeepSeek deepseek-chat, deepseek-reasoner
Mistral mistral-large-latest
Groq llama-3.1-70b
xAI grok-2
Moonshot moonshot-v1-8k
Zhipu glm-4
Custom Any OpenAI-compatible endpoint

Roadmap

  • DOM compression API with pluggable site-specific scripts
  • Task Agent with multi-step planning and autonomous browsing
  • Multi-provider LLM support (12+ providers)
  • Chinese/English bilingual dashboard
  • MCP (Model Context Protocol) server integration
  • Visual grounding — screenshot-based element location
  • Multi-agent collaboration

Third-Party Libraries

Library License Usage
Playwright Apache 2.0 Browser automation
Flask BSD 3-Clause REST API server
React MIT Frontend UI
LangGraph MIT Agent workflow engine
LiteLLM MIT Multi-provider LLM routing
Pydantic MIT Schema validation

License

Apache License 2.0 - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clawome-0.1.5.tar.gz (3.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clawome-0.1.5-py3-none-any.whl (3.1 MB view details)

Uploaded Python 3

File details

Details for the file clawome-0.1.5.tar.gz.

File metadata

  • Download URL: clawome-0.1.5.tar.gz
  • Upload date:
  • Size: 3.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for clawome-0.1.5.tar.gz
Algorithm Hash digest
SHA256 376bc5e80da6243b566597ddaf4207ed1016e0d51c5b3ce5c52e480c2743ca9a
MD5 e3d693d2f65c2b08d3af221dce949fb2
BLAKE2b-256 31b3fe35e74f607d08c3462b2b30ec9c78834a3e53a9e3c1722e70d7fba62fdc

See more details on using hashes here.

File details

Details for the file clawome-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: clawome-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 3.1 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for clawome-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 5e771d0702d6340201ebba23a2619cefc7a528fdae67bb8151eb0202fa7193ac
MD5 0d198f1b448e0d68f505eb2165a0c027
BLAKE2b-256 cbc3fa3961a08aac9ca09fffa4a4a31c0d68f7c0aa9f46a2b3eec97194559172

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page