clawome

Clawome — AI browser agent. One command to run any web task.

Project description

中文 | English

Clawome

Open-source AI browser agent. Tell it what you want — it browses the web and brings back results.

Python

Quick Start • How It Works • Chat API • DOM Compression • Roadmap

What Can It Do?

clawome "Find the top 3 AI stories on Hacker News today"

  > Find the top 3 AI stories on Hacker News today

  I'll browse Hacker News and find the top AI stories for you.

  [task] Opening https://news.ycombinator.com ...
  [task] Scanning front page for AI-related stories ...
  [task] Extracting titles, scores, and links ...

  [result] Here are today's top 3 AI stories on Hacker News:
  1. "GPT-5 benchmark results leaked" — 842 points
  2. "Open-source vision model beats proprietary ones" — 631 points
  3. "Show HN: AI browser agent that actually works" — 529 points

No browser extensions. No complex setup. Just describe what you want in plain language.

Quick Start

Prerequisites: Python 3.10+

Install & Run

pip install clawome
clawome start

This walks you through LLM setup (pick a provider, enter API key), installs Chromium, and starts the server.

Server & Dashboard:  http://localhost:5001

Run Tasks from Terminal

clawome "Find AI graduate programs at Stanford"
clawome "Compare iPhone 16 Pro vs Samsung S25 Ultra specs"
clawome "What's the weather in Tokyo this weekend?"
clawome status          # Check progress
clawome stop            # Cancel

Or Use the Web Dashboard

Open http://localhost:5001 — chat with Beanie, the built-in AI assistant. It understands context, handles follow-ups, and delegates complex browsing tasks automatically.

Multi-turn conversation example:

You:    Find the top 3 AI papers on arxiv today
Beanie: Here are today's top 3 AI papers:
        1. "Scaling Laws for..." — 45 citations
        2. "Efficient Fine-tuning..." — 32 citations
        3. "Multi-modal Agents..." — 28 citations

You:    Tell me more about the first one
Beanie: "Scaling Laws for Neural Architecture Search"
        Authors: ... Abstract: ...

You:    What about the second author's other recent work?
Beanie: I'll look up their profile on Google Scholar...
        [browses Google Scholar, extracts papers]
        Here are their recent publications: ...

Each message builds on previous context — no need to repeat yourself.

Install from source

git clone https://github.com/CodingLucasLi/Clawome.git
cd Clawome
cp .env.example .env       # Fill in your LLM API key
./start.sh                 # Start backend + frontend dev server

Dashboard:  http://localhost:5173
API:        http://localhost:5001

Or manually:

cd backend && python -m venv venv && source venv/bin/activate
pip install -r requirements.txt && playwright install chromium
python app.py               # http://localhost:5001

cd frontend && npm install && npm run dev   # http://localhost:5173

How It Works

Clawome uses a two-layer agent architecture:

You ──→ Beanie (Chat Agent) ──→ Runner (Task Engine) ──→ Browser
         │                        │
         │ Understands context    │ Plans subtasks
         │ Calls browser tools   │ Perceive → Plan → Act → Sense
         │ Manages sessions      │ Guard nodes (CAPTCHA, cookies, loops)
         │ Delegates complex     │ Anomaly detection & recovery
         │ tasks to Runner       │ Reports back to Beanie
         │                        │
         └── Watchdog ────────────┘ (monitors progress, intervenes if stuck)

Beanie handles simple questions and browser actions directly. For complex multi-step tasks, it delegates to the Runner — a LangGraph state machine that autonomously plans, browses, and extracts information.

Key Features

Feature	Description
Natural language	Just describe what you want
Chat interface	Context-aware conversations with follow-ups
Smart execution	Perceive → Plan → Act → Sense loop with retry
Guard nodes	Auto-handles CAPTCHAs, cookie popups, blocked pages
100:1 DOM compression	300K HTML → 3K tokens for efficient LLM processing
12+ LLM providers	OpenAI, Anthropic, Google, DeepSeek, Qwen, and more
Bilingual UI	Full Chinese/English support
Session persistence	Resume conversations across restarts

Chat API

Send a message, poll for the response. Beanie decides whether to answer directly or launch a browsing task.

# Send a message
curl -X POST http://localhost:5001/api/chat/send \
  -H "Content-Type: application/json" \
  -d '{"message": "Find AI graduate programs at NYU Tandon"}'

# Poll for response
curl http://localhost:5001/api/chat/status?since=0

# Stop processing
curl -X POST http://localhost:5001/api/chat/stop

# Start fresh
curl -X POST http://localhost:5001/api/chat/reset

Response format:

{
  "status": "processing",
  "session_id": "session_a1b2c3d4",
  "messages": [
    {"role": "user", "type": "text", "content": "Find AI programs..."},
    {"role": "agent", "type": "result", "content": "I found 5 programs..."}
  ]
}

Method	Endpoint	Description
POST	`/api/chat/send`	Send a message
GET	`/api/chat/status?since=N`	Poll messages (incremental)
POST	`/api/chat/stop`	Stop current processing
POST	`/api/chat/reset`	Start a new session
GET	`/api/chat/sessions`	List saved sessions
POST	`/api/chat/sessions/restore`	Restore a session
POST	`/api/chat/sessions/delete`	Delete a session

Status values: processing (agent is working) → ready (waiting for input)

Tips for Better Results

Give a URL when possible — "Go to https://example.com and find..." avoids guesswork
Be specific — "top 5 news headlines" beats "what's on the page"
Ask follow-ups — Beanie remembers context within a session

DOM Compression

Clawome's DOM compressor turns raw HTML into concise, LLM-friendly trees. Use it standalone for your own agents:

# Open a page
curl -X POST http://localhost:5001/api/browser/open \
  -d '{"url": "https://www.google.com"}'

# Read compressed DOM
curl http://localhost:5001/api/browser/dom

[1] form(role="search")
  [1.1] textarea(name="q", placeholder="Search")
  [1.2] button: Google Search
  [1.3] button: I'm Feeling Lucky
[2] a(href): About
[3] a(href): Gmail

Page	Raw HTML	Compressed	Savings
Google Homepage	51K	238	99.5%
Google Search	298K	2,866	99.0%
Wikipedia Article	225K	40K	82.1%
Baidu Homepage	192K	457	99.8%

Features:

100:1 compression on typical pages
Preserves visible text, interactive elements, and semantic structure
Hierarchical node IDs (1.2.3) for precise element targeting
Site-specific optimizers (Google, Wikipedia, Stack Overflow, YouTube, etc.)
Custom compressor scripts via Dashboard

Full Browser API Reference

Navigation

Method	Endpoint	Description
POST	`/api/browser/open`	Open URL (launches browser if needed)
POST	`/api/browser/back`	Navigate back
POST	`/api/browser/forward`	Navigate forward
POST	`/api/browser/refresh`	Reload page

DOM

Method	Endpoint	Description
GET	`/api/browser/dom`	Get compressed DOM tree
POST	`/api/browser/dom/detail`	Get element details (rect, attributes)
POST	`/api/browser/text`	Get plain text content of a node
GET	`/api/browser/source`	Get raw page HTML

Interaction

Method	Endpoint	Description
POST	`/api/browser/click`	Click element
POST	`/api/browser/type`	Type text (keyboard events)
POST	`/api/browser/fill`	Fill input field
POST	`/api/browser/select`	Select dropdown option
POST	`/api/browser/check`	Toggle checkbox
POST	`/api/browser/hover`	Hover element
POST	`/api/browser/scroll/down`	Scroll down
POST	`/api/browser/scroll/up`	Scroll up
POST	`/api/browser/keypress`	Press key
POST	`/api/browser/hotkey`	Press key combo

Token Optimization

All action endpoints support optional parameters:

refresh_dom: false — Skip DOM refresh after action
fields: ["dom", "stats"] — Return only selected fields

Supported LLM Providers

Provider	Model Examples
OpenAI	gpt-4o, gpt-4o-mini
Anthropic	claude-sonnet-4-20250514, claude-haiku
Google	gemini-2.0-flash, gemini-pro
DeepSeek	deepseek-chat, deepseek-reasoner
DashScope (Qwen)	qwen-plus, qwen-max, qwen3.5-plus
Mistral	mistral-large-latest
Groq	llama-3.1-70b
xAI	grok-2
Moonshot	moonshot-v1-8k
Zhipu	glm-4
Custom	Any OpenAI-compatible endpoint

Roadmap

DOM compression with pluggable site-specific scripts
Chat agent with session persistence and follow-ups
Autonomous task engine with multi-step planning
Guard nodes: CAPTCHA detection, cookie dismissal, loop prevention
Watchdog monitoring with automatic intervention
12+ LLM provider support
Bilingual Chinese/English dashboard
MCP (Model Context Protocol) server integration
Visual grounding — screenshot-based element location
Multi-agent collaboration

Third-Party Libraries

Library	License	Usage
Playwright	Apache 2.0	Browser automation
Flask	BSD 3-Clause	REST API server
React	MIT	Frontend UI
LangGraph	MIT	Agent workflow engine
LiteLLM	MIT	Multi-provider LLM routing

License

Apache License 2.0 — see LICENSE for details.

Project details

Release history Release notifications | RSS feed

This version

0.1.8

Mar 9, 2026

0.1.7

Mar 3, 2026

0.1.6

Feb 26, 2026

0.1.5

Feb 26, 2026

0.1.3

Feb 26, 2026

0.1.2

Feb 26, 2026

0.1.1

Feb 26, 2026

0.1.0

Feb 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clawome-0.1.8.tar.gz (3.1 MB view details)

Uploaded Mar 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

clawome-0.1.8-py3-none-any.whl (3.1 MB view details)

Uploaded Mar 9, 2026 Python 3

File details

Details for the file clawome-0.1.8.tar.gz.

File metadata

Download URL: clawome-0.1.8.tar.gz
Upload date: Mar 9, 2026
Size: 3.1 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for clawome-0.1.8.tar.gz
Algorithm	Hash digest
SHA256	`8adb4feb0636f53613f01f34ea82beffe2166b42db6db3b8e2e46c41f39797f7`
MD5	`baa6098657d89e9c5e774516041b9c01`
BLAKE2b-256	`37954215065e22f39282c266c3c8cf77b72c3271f50f1a8017fd830ddda58e30`

See more details on using hashes here.

File details

Details for the file clawome-0.1.8-py3-none-any.whl.

File metadata

Download URL: clawome-0.1.8-py3-none-any.whl
Upload date: Mar 9, 2026
Size: 3.1 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for clawome-0.1.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2f95e8fb502a0def4bd69eb0c4543cba6ebcec1bc76b114fd9fca931a2f902c0`
MD5	`ac02d2642c23f02151c71136f6ef857d`
BLAKE2b-256	`f00cc788ac16ad4d07b3f4306300c2324f06f43b21cdd4dd6b6b74d7b12f44d8`

See more details on using hashes here.

clawome 0.1.8

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

Clawome

What Can It Do?

Quick Start

Install & Run

Run Tasks from Terminal

Or Use the Web Dashboard

How It Works

Key Features

Chat API

Tips for Better Results

DOM Compression

Navigation

DOM

Interaction

Token Optimization

Supported LLM Providers

Roadmap

Third-Party Libraries

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes