Skip to main content

Open source, lightweight headless browser for AI agents. pip install ember-browser.

Project description

  ███████╗███╗   ███╗██████╗ ███████╗██████╗ 
  ██╔════╝████╗ ████║██╔══██╗██╔════╝██╔══██╗
  █████╗  ██╔████╔██║██████╔╝█████╗  ██████╔╝
  ██╔══╝  ██║╚██╔╝██║██╔══██╗██╔══╝  ██╔══██╗
  ███████╗██║ ╚═╝ ██║██████╔╝███████╗██║  ██║
  ╚══════╝╚═╝     ╚═╝╚═════╝ ╚══════╝╚═╝  ╚═╝

Open source, lightweight headless browser for AI agents.

PyPI Python License: AGPL v3

pip install ember-browser

No Docker. No API key to start.


Why ember

Most web tools for agents ship with Chromium (641 MB) or require Docker just to get started. We needed something an agent could use on a VPS, a laptop, or a Raspberry Pi without thinking about it.

ember runs at ~17 MB idle. It decides whether a page needs a browser — you just pass it a URL.

ember Crawl4AI
Import footprint ~54 MB 171.8 MB
Browser binary 20 MB (Lightpanda) 641 MB (Chromium)
Scrape success rate ~85% (trafilatura) / ~95%+ (+ Lightpanda) 90%
Docker required No No
API key required No No

Quick start

pip install ember-browser

ember                          # start the interactive session
ember url https://example.com  # or run a one-shot command
ember serve                    # start the REST API

CLI

Interactive session

ember with no arguments opens a persistent session. Commands and a save guide are shown on startup — no need to type help first.

  ███████╗███╗   ███╗██████╗ ███████╗██████╗
  ...
  ╚══════╝╚═╝     ╚═╝╚═════╝ ╚══════╝╚═╝  ╚═╝

  v0.1.0  lightweight headless browser for AI agents

  url        <url>              scrape a page to markdown
  search     <query>            web search
  crawl      <url>              crawl a whole website
  map        <url>              discover all URLs on a site
  interact   <url>              control a browser with natural language
  extract    <url>              pull structured data with an LLM
  batch      <urls.txt>         scrape many URLs concurrently

  ─── saving results ───────────────────────────────────────────
  one result   url example.com -o page.md
  everything   output ./research/  then all results auto-save
  last result  save page.md        after any command

ember › url andausman.com
ember › save page.md

ember › output ./research/       # auto-save everything from here
ember/research › search "python asyncio" -n 10
ember/research › crawl docs.example.com
ember/research › output clear    # stop auto-saving
ember › quit

One-shot commands

Every command works standalone too:

ember url https://example.com                         # scrape a page
ember search "AI agents python" -n 10                 # web search
ember crawl https://docs.example.com --max-pages 20   # crawl a site
ember map https://example.com                         # discover all URLs
ember interact https://amazon.com \
  --prompt "find a mechanical keyboard under $100"
ember extract https://example.com/pricing \
  --prompt "list all plans and prices as JSON"

Saving results

All commands accept -o to save that run:

ember url https://example.com -o page.md
ember search "python" -o results.json
ember crawl https://docs.example.com -o ./pages/   # one .md per page
ember map https://example.com -o urls.txt
ember extract https://example.com -o data.json

Set a default save directory so you never need -o:

ember config --save-dir ./research/    # persists across sessions
ember config                           # show current settings
ember config --save-dir ""             # clear it

Or use an environment variable for the current shell:

EMBER_SAVE_DIR=./out ember url https://example.com

In a session, the three ways to save:

ember › url example.com -o page.md     # save just this run
ember › save page.md                   # save the last result
ember › output ./research/             # auto-save all results from now on

Async batch scraping

# urls.txt — one URL per line, # = comment
ember batch urls.txt                      # 5 concurrent by default
ember batch urls.txt -c 20 -o ./pages/   # 20 parallel, save to dir

Python API

from emb.scrape import scrape_url, scrape_markdown
from emb.search import search
from emb.crawl import crawl
from emb.map import map_url

# Scrape a page → ScrapeResult
result = scrape_url("https://example.com")
print(result.markdown)   # full page content as markdown
print(result.title)      # page title
print(result.success)    # True / False

# Just the markdown text
md = scrape_markdown("https://example.com")

# Crawl a site
result = crawl("https://docs.example.com", max_pages=20, max_depth=3)
for page in result.pages:
    print(page.url, len(page.markdown))

# Discover URLs
result = map_url("https://example.com", max_links=100)
print(result.links)   # list[str]

# Search the web
results = search("python asyncio tutorial", limit=5)
for r in results:
    print(r.title, r.url)

# Browser interaction with natural language
from emb.interact import interact

result = interact("https://example.com", prompt="click the login button")
print(result.content)   # what the agent did / saw

# LLM-powered structured extraction
from emb.agent import extract

data = extract("https://example.com/pricing", prompt="list all plans and prices")
print(data)   # dict

Async

import asyncio
from emb.scrape import scrape_url_async

async def main():
    results = await asyncio.gather(
        scrape_url_async("https://example.com"),
        scrape_url_async("https://httpbin.org/get"),
    )
    for r in results:
        print(r.url, r.success)

asyncio.run(main())

REST API

ember serve               # http://127.0.0.1:51251
ember serve --port 8080   # custom port

EMBER_API_KEY=your-secret ember serve   # require auth
curl -X POST http://localhost:51251/scrape \
  -H "Content-Type: application/json" \
  -H "X-API-Key: your-secret" \
  -d '{"url": "https://example.com"}'

curl -X POST http://localhost:51251/search \
  -H "Content-Type: application/json" \
  -d '{"query": "AI agents", "limit": 5}'

curl -X POST http://localhost:51251/crawl \
  -H "Content-Type: application/json" \
  -d '{"url": "https://docs.example.com", "max_pages": 10}'

Endpoints: /scrape /search /crawl /map /interact /extract /agent /health


MCP

{
  "mcpServers": {
    "ember": {
      "command": "ember",
      "args": ["mcp"]
    }
  }
}

Works with Claude Code, Cursor, and any MCP-compatible host.

Available tools: scrape, search_web, crawl_site, map_site, batch_scrape, interact_page, extract_data.


How it works

Not every page needs a browser. ember knows the difference.

Tier 1 — trafilatura handles ~90% of the web: blogs, news, documentation, Wikipedia. Pure HTTP, no browser process, no memory overhead.

Tier 2 — Lightpanda handles JavaScript-heavy pages, SPAs, and interactive content. It's a real browser engine written in Zig, built for machines rather than humans — 20 MB total. ember downloads and caches it automatically on first use, and only falls back to it when tier 1 produces thin content.

Most requests never reach the browser.

Memory footprint

State RAM
Idle ~17 MB
Scraping a static page ~20 MB
Running the browser ~140 MB

Firecrawl needs 4–8 GB in Docker. Crawl4AI imports at 171 MB before scraping anything. ember fits where your agent already runs.


Environment variables

Variable Default Description
EMBER_SAVE_DIR (none) Default directory for saved results. Overrides ember config --save-dir for the current shell.
EMBER_API_KEY (none) Enables API key auth on the REST server (X-API-Key header).
EMBER_PORT 51251 Default port for ember serve. Overridden by --port flag.
EMBER_INTERACT_PROVIDER openai LLM provider for interact (openai, anthropic, ollama, etc.).
EMBER_LLM_API_KEY (none) API key for LLM-powered extraction.
EMBER_LLM_BASE_URL https://api.openai.com/v1 LLM API endpoint for extraction.
EMBER_LLM_MODEL gpt-4o-mini Model used by extract.
EMBER_LIGHTPANDA_PATH (auto) Path to a custom Lightpanda binary. Skips auto-download if set.

License

AGPL-3.0 — open source forever.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ember_browser-0.1.0.tar.gz (44.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ember_browser-0.1.0-py3-none-any.whl (33.6 kB view details)

Uploaded Python 3

File details

Details for the file ember_browser-0.1.0.tar.gz.

File metadata

  • Download URL: ember_browser-0.1.0.tar.gz
  • Upload date:
  • Size: 44.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ember_browser-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8389505dd7cc06105dc680f2d63535c6c70014e0927ffc26a407bd5866d70129
MD5 fb9ceb87e5aebcafd8717f51ddda987b
BLAKE2b-256 ee1908d30123c3eb7448870510a5a04adbbfd2625c9455e55160034eaf3487c5

See more details on using hashes here.

Provenance

The following attestation bundles were made for ember_browser-0.1.0.tar.gz:

Publisher: release.yml on andalabx/ember

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ember_browser-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: ember_browser-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 33.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for ember_browser-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 003bd96c8c4d64c11f96c9175fdfb006169d1e82642b5011a58d1116a522f5f8
MD5 b35d151aa7ef5f5c2404b8c79ae3cf14
BLAKE2b-256 9e855f1243e8ea96de60900749c6ca51100f1a9a63a9b63bfd5fca0f559493fd

See more details on using hashes here.

Provenance

The following attestation bundles were made for ember_browser-0.1.0-py3-none-any.whl:

Publisher: release.yml on andalabx/ember

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page