Clawome — AI browser agent. One command to run any web task.
Project description
中文 | English
Clawome
Open-source AI browser agent. Tell it what you want — it browses the web and brings back results.
Quick Start • How It Works • Chat API • DOM Compression • Roadmap
What Can It Do?
clawome "Find the top 3 AI stories on Hacker News today"
> Find the top 3 AI stories on Hacker News today
I'll browse Hacker News and find the top AI stories for you.
[task] Opening https://news.ycombinator.com ...
[task] Scanning front page for AI-related stories ...
[task] Extracting titles, scores, and links ...
[result] Here are today's top 3 AI stories on Hacker News:
1. "GPT-5 benchmark results leaked" — 842 points
2. "Open-source vision model beats proprietary ones" — 631 points
3. "Show HN: AI browser agent that actually works" — 529 points
No browser extensions. No complex setup. Just describe what you want in plain language.
Quick Start
Prerequisites: Python 3.10+
Install & Run
pip install clawome
clawome start
This walks you through LLM setup (pick a provider, enter API key), installs Chromium, and starts the server.
Server & Dashboard: http://localhost:5001
Run Tasks from Terminal
clawome "Find AI graduate programs at Stanford"
clawome "Compare iPhone 16 Pro vs Samsung S25 Ultra specs"
clawome "What's the weather in Tokyo this weekend?"
clawome status # Check progress
clawome stop # Cancel
Or Use the Web Dashboard
Open http://localhost:5001 — chat with Beanie, the built-in AI assistant. It understands context, handles follow-ups, and delegates complex browsing tasks automatically.
Multi-turn conversation example:
You: Find the top 3 AI papers on arxiv today
Beanie: Here are today's top 3 AI papers:
1. "Scaling Laws for..." — 45 citations
2. "Efficient Fine-tuning..." — 32 citations
3. "Multi-modal Agents..." — 28 citations
You: Tell me more about the first one
Beanie: "Scaling Laws for Neural Architecture Search"
Authors: ... Abstract: ...
You: What about the second author's other recent work?
Beanie: I'll look up their profile on Google Scholar...
[browses Google Scholar, extracts papers]
Here are their recent publications: ...
Each message builds on previous context — no need to repeat yourself.
Install from source
git clone https://github.com/CodingLucasLi/Clawome.git
cd Clawome
cp .env.example .env # Fill in your LLM API key
./start.sh # Start backend + frontend dev server
Dashboard: http://localhost:5173
API: http://localhost:5001
Or manually:
cd backend && python -m venv venv && source venv/bin/activate
pip install -r requirements.txt && playwright install chromium
python app.py # http://localhost:5001
cd frontend && npm install && npm run dev # http://localhost:5173
How It Works
Clawome uses a two-layer agent architecture:
You ──→ Beanie (Chat Agent) ──→ Runner (Task Engine) ──→ Browser
│ │
│ Understands context │ Plans subtasks
│ Calls browser tools │ Perceive → Plan → Act → Sense
│ Manages sessions │ Guard nodes (CAPTCHA, cookies, loops)
│ Delegates complex │ Anomaly detection & recovery
│ tasks to Runner │ Reports back to Beanie
│ │
└── Watchdog ────────────┘ (monitors progress, intervenes if stuck)
Beanie handles simple questions and browser actions directly. For complex multi-step tasks, it delegates to the Runner — a LangGraph state machine that autonomously plans, browses, and extracts information.
Key Features
| Feature | Description |
|---|---|
| Natural language | Just describe what you want |
| Chat interface | Context-aware conversations with follow-ups |
| Smart execution | Perceive → Plan → Act → Sense loop with retry |
| Guard nodes | Auto-handles CAPTCHAs, cookie popups, blocked pages |
| 100:1 DOM compression | 300K HTML → 3K tokens for efficient LLM processing |
| 12+ LLM providers | OpenAI, Anthropic, Google, DeepSeek, Qwen, and more |
| Bilingual UI | Full Chinese/English support |
| Session persistence | Resume conversations across restarts |
Chat API
Send a message, poll for the response. Beanie decides whether to answer directly or launch a browsing task.
# Send a message
curl -X POST http://localhost:5001/api/chat/send \
-H "Content-Type: application/json" \
-d '{"message": "Find AI graduate programs at NYU Tandon"}'
# Poll for response
curl http://localhost:5001/api/chat/status?since=0
# Stop processing
curl -X POST http://localhost:5001/api/chat/stop
# Start fresh
curl -X POST http://localhost:5001/api/chat/reset
Response format:
{
"status": "processing",
"session_id": "session_a1b2c3d4",
"messages": [
{"role": "user", "type": "text", "content": "Find AI programs..."},
{"role": "agent", "type": "result", "content": "I found 5 programs..."}
]
}
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/chat/send |
Send a message |
| GET | /api/chat/status?since=N |
Poll messages (incremental) |
| POST | /api/chat/stop |
Stop current processing |
| POST | /api/chat/reset |
Start a new session |
| GET | /api/chat/sessions |
List saved sessions |
| POST | /api/chat/sessions/restore |
Restore a session |
| POST | /api/chat/sessions/delete |
Delete a session |
Status values: processing (agent is working) → ready (waiting for input)
Tips for Better Results
- Give a URL when possible —
"Go to https://example.com and find..."avoids guesswork - Be specific —
"top 5 news headlines"beats"what's on the page" - Ask follow-ups — Beanie remembers context within a session
DOM Compression
Clawome's DOM compressor turns raw HTML into concise, LLM-friendly trees. Use it standalone for your own agents:
# Open a page
curl -X POST http://localhost:5001/api/browser/open \
-d '{"url": "https://www.google.com"}'
# Read compressed DOM
curl http://localhost:5001/api/browser/dom
[1] form(role="search")
[1.1] textarea(name="q", placeholder="Search")
[1.2] button: Google Search
[1.3] button: I'm Feeling Lucky
[2] a(href): About
[3] a(href): Gmail
| Page | Raw HTML | Compressed | Savings |
|---|---|---|---|
| Google Homepage | 51K | 238 | 99.5% |
| Google Search | 298K | 2,866 | 99.0% |
| Wikipedia Article | 225K | 40K | 82.1% |
| Baidu Homepage | 192K | 457 | 99.8% |
Features:
- 100:1 compression on typical pages
- Preserves visible text, interactive elements, and semantic structure
- Hierarchical node IDs (
1.2.3) for precise element targeting - Site-specific optimizers (Google, Wikipedia, Stack Overflow, YouTube, etc.)
- Custom compressor scripts via Dashboard
Full Browser API Reference
Navigation
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/browser/open |
Open URL (launches browser if needed) |
| POST | /api/browser/back |
Navigate back |
| POST | /api/browser/forward |
Navigate forward |
| POST | /api/browser/refresh |
Reload page |
DOM
| Method | Endpoint | Description |
|---|---|---|
| GET | /api/browser/dom |
Get compressed DOM tree |
| POST | /api/browser/dom/detail |
Get element details (rect, attributes) |
| POST | /api/browser/text |
Get plain text content of a node |
| GET | /api/browser/source |
Get raw page HTML |
Interaction
| Method | Endpoint | Description |
|---|---|---|
| POST | /api/browser/click |
Click element |
| POST | /api/browser/type |
Type text (keyboard events) |
| POST | /api/browser/fill |
Fill input field |
| POST | /api/browser/select |
Select dropdown option |
| POST | /api/browser/check |
Toggle checkbox |
| POST | /api/browser/hover |
Hover element |
| POST | /api/browser/scroll/down |
Scroll down |
| POST | /api/browser/scroll/up |
Scroll up |
| POST | /api/browser/keypress |
Press key |
| POST | /api/browser/hotkey |
Press key combo |
Token Optimization
All action endpoints support optional parameters:
refresh_dom: false— Skip DOM refresh after actionfields: ["dom", "stats"]— Return only selected fields
Supported LLM Providers
| Provider | Model Examples |
|---|---|
| OpenAI | gpt-4o, gpt-4o-mini |
| Anthropic | claude-sonnet-4-20250514, claude-haiku |
| gemini-2.0-flash, gemini-pro | |
| DeepSeek | deepseek-chat, deepseek-reasoner |
| DashScope (Qwen) | qwen-plus, qwen-max, qwen3.5-plus |
| Mistral | mistral-large-latest |
| Groq | llama-3.1-70b |
| xAI | grok-2 |
| Moonshot | moonshot-v1-8k |
| Zhipu | glm-4 |
| Custom | Any OpenAI-compatible endpoint |
Roadmap
- DOM compression with pluggable site-specific scripts
- Chat agent with session persistence and follow-ups
- Autonomous task engine with multi-step planning
- Guard nodes: CAPTCHA detection, cookie dismissal, loop prevention
- Watchdog monitoring with automatic intervention
- 12+ LLM provider support
- Bilingual Chinese/English dashboard
- MCP (Model Context Protocol) server integration
- Visual grounding — screenshot-based element location
- Multi-agent collaboration
Third-Party Libraries
| Library | License | Usage |
|---|---|---|
| Playwright | Apache 2.0 | Browser automation |
| Flask | BSD 3-Clause | REST API server |
| React | MIT | Frontend UI |
| LangGraph | MIT | Agent workflow engine |
| LiteLLM | MIT | Multi-provider LLM routing |
License
Apache License 2.0 — see LICENSE for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file clawome-0.1.8.tar.gz.
File metadata
- Download URL: clawome-0.1.8.tar.gz
- Upload date:
- Size: 3.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8adb4feb0636f53613f01f34ea82beffe2166b42db6db3b8e2e46c41f39797f7
|
|
| MD5 |
baa6098657d89e9c5e774516041b9c01
|
|
| BLAKE2b-256 |
37954215065e22f39282c266c3c8cf77b72c3271f50f1a8017fd830ddda58e30
|
File details
Details for the file clawome-0.1.8-py3-none-any.whl.
File metadata
- Download URL: clawome-0.1.8-py3-none-any.whl
- Upload date:
- Size: 3.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2f95e8fb502a0def4bd69eb0c4543cba6ebcec1bc76b114fd9fca931a2f902c0
|
|
| MD5 |
ac02d2642c23f02151c71136f6ef857d
|
|
| BLAKE2b-256 |
f00cc788ac16ad4d07b3f4306300c2324f06f43b21cdd4dd6b6b74d7b12f44d8
|