Open source, lightweight headless browser for AI agents. pip install ember-browser.
Project description
███████╗███╗ ███╗██████╗ ███████╗██████╗ ██╔════╝████╗ ████║██╔══██╗██╔════╝██╔══██╗ █████╗ ██╔████╔██║██████╔╝█████╗ ██████╔╝ ██╔══╝ ██║╚██╔╝██║██╔══██╗██╔══╝ ██╔══██╗ ███████╗██║ ╚═╝ ██║██████╔╝███████╗██║ ██║ ╚══════╝╚═╝ ╚═╝╚═════╝ ╚══════╝╚═╝ ╚═╝
Open source, lightweight headless browser for AI agents.
pip install ember-browser
No Docker. No API key to start.
Why ember
Most web tools for agents ship with Chromium (641 MB) or require Docker just to get started. We needed something an agent could use on a VPS, a laptop, or a Raspberry Pi without thinking about it.
ember runs at ~17 MB idle. It decides whether a page needs a browser — you just pass it a URL.
| ember | Crawl4AI | |
|---|---|---|
| Import footprint | ~54 MB | 171.8 MB |
| Browser binary | 20 MB (Lightpanda) | 641 MB (Chromium) |
| Scrape success rate | ~85% (trafilatura) / ~95%+ (+ Lightpanda) | 90% |
| Docker required | No | No |
| API key required | No | No |
Quick start
pip install ember-browser
ember # start the interactive session
ember url https://example.com # or run a one-shot command
ember serve # start the REST API
CLI
Interactive session
ember with no arguments opens a persistent session. Commands and a save guide are shown on startup — no need to type help first.
███████╗███╗ ███╗██████╗ ███████╗██████╗
...
╚══════╝╚═╝ ╚═╝╚═════╝ ╚══════╝╚═╝ ╚═╝
v0.1.0 lightweight headless browser for AI agents
url <url> scrape a page to markdown
search <query> web search
crawl <url> crawl a whole website
map <url> discover all URLs on a site
interact <url> control a browser with natural language
extract <url> pull structured data with an LLM
batch <urls.txt> scrape many URLs concurrently
─── saving results ───────────────────────────────────────────
one result url example.com -o page.md
everything output ./research/ then all results auto-save
last result save page.md after any command
ember › url andausman.com
ember › save page.md
ember › output ./research/ # auto-save everything from here
ember/research › search "python asyncio" -n 10
ember/research › crawl docs.example.com
ember/research › output clear # stop auto-saving
ember › quit
One-shot commands
Every command works standalone too:
ember url https://example.com # scrape a page
ember search "AI agents python" -n 10 # web search
ember crawl https://docs.example.com --max-pages 20 # crawl a site
ember map https://example.com # discover all URLs
ember interact https://amazon.com \
--prompt "find a mechanical keyboard under $100"
ember extract https://example.com/pricing \
--prompt "list all plans and prices as JSON"
Saving results
All commands accept -o to save that run:
ember url https://example.com -o page.md
ember search "python" -o results.json
ember crawl https://docs.example.com -o ./pages/ # one .md per page
ember map https://example.com -o urls.txt
ember extract https://example.com -o data.json
Set a default save directory so you never need -o:
ember config --save-dir ./research/ # persists across sessions
ember config # show current settings
ember config --save-dir "" # clear it
Or use an environment variable for the current shell:
EMBER_SAVE_DIR=./out ember url https://example.com
In a session, the three ways to save:
ember › url example.com -o page.md # save just this run
ember › save page.md # save the last result
ember › output ./research/ # auto-save all results from now on
Async batch scraping
# urls.txt — one URL per line, # = comment
ember batch urls.txt # 5 concurrent by default
ember batch urls.txt -c 20 -o ./pages/ # 20 parallel, save to dir
Python API
from emb.scrape import scrape_url, scrape_markdown
from emb.search import search
from emb.crawl import crawl
from emb.map import map_url
# Scrape a page → ScrapeResult
result = scrape_url("https://example.com")
print(result.markdown) # full page content as markdown
print(result.title) # page title
print(result.success) # True / False
# Just the markdown text
md = scrape_markdown("https://example.com")
# Crawl a site
result = crawl("https://docs.example.com", max_pages=20, max_depth=3)
for page in result.pages:
print(page.url, len(page.markdown))
# Discover URLs
result = map_url("https://example.com", max_links=100)
print(result.links) # list[str]
# Search the web
results = search("python asyncio tutorial", limit=5)
for r in results:
print(r.title, r.url)
# Browser interaction with natural language
from emb.interact import interact
result = interact("https://example.com", prompt="click the login button")
print(result.content) # what the agent did / saw
# LLM-powered structured extraction
from emb.agent import extract
data = extract("https://example.com/pricing", prompt="list all plans and prices")
print(data) # dict
Async
import asyncio
from emb.scrape import scrape_url_async
async def main():
results = await asyncio.gather(
scrape_url_async("https://example.com"),
scrape_url_async("https://httpbin.org/get"),
)
for r in results:
print(r.url, r.success)
asyncio.run(main())
REST API
ember serve # http://127.0.0.1:51251
ember serve --port 8080 # custom port
EMBER_API_KEY=your-secret ember serve # require auth
curl -X POST http://localhost:51251/scrape \
-H "Content-Type: application/json" \
-H "X-API-Key: your-secret" \
-d '{"url": "https://example.com"}'
curl -X POST http://localhost:51251/search \
-H "Content-Type: application/json" \
-d '{"query": "AI agents", "limit": 5}'
curl -X POST http://localhost:51251/crawl \
-H "Content-Type: application/json" \
-d '{"url": "https://docs.example.com", "max_pages": 10}'
Endpoints: /scrape /search /crawl /map /interact /extract /agent /health
MCP
{
"mcpServers": {
"ember": {
"command": "ember",
"args": ["mcp"]
}
}
}
Works with Claude Code, Cursor, and any MCP-compatible host.
Available tools: scrape, search_web, crawl_site, map_site, batch_scrape, interact_page, extract_data.
How it works
Not every page needs a browser. ember knows the difference.
Tier 1 — trafilatura handles ~90% of the web: blogs, news, documentation, Wikipedia. Pure HTTP, no browser process, no memory overhead.
Tier 2 — Lightpanda handles JavaScript-heavy pages, SPAs, and interactive content. It's a real browser engine written in Zig, built for machines rather than humans — 20 MB total. ember downloads and caches it automatically on first use, and only falls back to it when tier 1 produces thin content.
Most requests never reach the browser.
Memory footprint
| State | RAM |
|---|---|
| Idle | ~17 MB |
| Scraping a static page | ~20 MB |
| Running the browser | ~140 MB |
Firecrawl needs 4–8 GB in Docker. Crawl4AI imports at 171 MB before scraping anything. ember fits where your agent already runs.
Environment variables
| Variable | Default | Description |
|---|---|---|
EMBER_SAVE_DIR |
(none) | Default directory for saved results. Overrides ember config --save-dir for the current shell. |
EMBER_API_KEY |
(none) | Enables API key auth on the REST server (X-API-Key header). |
EMBER_PORT |
51251 |
Default port for ember serve. Overridden by --port flag. |
EMBER_INTERACT_PROVIDER |
openai |
LLM provider for interact (openai, anthropic, ollama, etc.). |
EMBER_LLM_API_KEY |
(none) | API key for LLM-powered extraction. |
EMBER_LLM_BASE_URL |
https://api.openai.com/v1 |
LLM API endpoint for extraction. |
EMBER_LLM_MODEL |
gpt-4o-mini |
Model used by extract. |
EMBER_LIGHTPANDA_PATH |
(auto) | Path to a custom Lightpanda binary. Skips auto-download if set. |
License
AGPL-3.0 — open source forever.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ember_browser-0.1.0.tar.gz.
File metadata
- Download URL: ember_browser-0.1.0.tar.gz
- Upload date:
- Size: 44.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8389505dd7cc06105dc680f2d63535c6c70014e0927ffc26a407bd5866d70129
|
|
| MD5 |
fb9ceb87e5aebcafd8717f51ddda987b
|
|
| BLAKE2b-256 |
ee1908d30123c3eb7448870510a5a04adbbfd2625c9455e55160034eaf3487c5
|
Provenance
The following attestation bundles were made for ember_browser-0.1.0.tar.gz:
Publisher:
release.yml on andalabx/ember
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ember_browser-0.1.0.tar.gz -
Subject digest:
8389505dd7cc06105dc680f2d63535c6c70014e0927ffc26a407bd5866d70129 - Sigstore transparency entry: 1997723136
- Sigstore integration time:
-
Permalink:
andalabx/ember@78b7a3bb617b91ab8ee119fb2206e157d6ded759 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/andalabx
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@78b7a3bb617b91ab8ee119fb2206e157d6ded759 -
Trigger Event:
push
-
Statement type:
File details
Details for the file ember_browser-0.1.0-py3-none-any.whl.
File metadata
- Download URL: ember_browser-0.1.0-py3-none-any.whl
- Upload date:
- Size: 33.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
003bd96c8c4d64c11f96c9175fdfb006169d1e82642b5011a58d1116a522f5f8
|
|
| MD5 |
b35d151aa7ef5f5c2404b8c79ae3cf14
|
|
| BLAKE2b-256 |
9e855f1243e8ea96de60900749c6ca51100f1a9a63a9b63bfd5fca0f559493fd
|
Provenance
The following attestation bundles were made for ember_browser-0.1.0-py3-none-any.whl:
Publisher:
release.yml on andalabx/ember
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
ember_browser-0.1.0-py3-none-any.whl -
Subject digest:
003bd96c8c4d64c11f96c9175fdfb006169d1e82642b5011a58d1116a522f5f8 - Sigstore transparency entry: 1997723351
- Sigstore integration time:
-
Permalink:
andalabx/ember@78b7a3bb617b91ab8ee119fb2206e157d6ded759 -
Branch / Tag:
refs/tags/v0.1.0 - Owner: https://github.com/andalabx
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@78b7a3bb617b91ab8ee119fb2206e157d6ded759 -
Trigger Event:
push
-
Statement type: