LionScraper bridge daemon, thin MCP stdio, and CLI — local HTTP + WebSocket to Chrome extension (Python)

These details have not been verified by PyPI

Project links

Homepage

Project description

LionScraper MCP + CLI + API service (Python)

Website: lionscraper.com
PyPI: project lionscraper

What is this?

LionScraper is a browser extension that can collect lists, articles, links, images, and more from web pages. This PyPI package provides the companion bridge to that extension in three ways:

MCP (lionscraper-mcp): connect your AI app (e.g. Cursor) so the model can call scraping tools in chat.
CLI (lionscraper): run daemon, scrape, and ping from a terminal over the same local HTTP/WebSocket port as the extension.
HTTP API: call the same tools over loopback JSON HTTP (/v1/...) when the daemon is running—useful for scripts, services, or any HTTP client without MCP or the CLI front-end.

The real scraping logic runs in the extension; this package connects and forwards.

Before you start

Browser: Chrome or Edge (follow what the extension actually supports).
LionScraper extension: Install and enable it from your browser’s store (the listing title may vary by storefront).
- Chrome: Chrome Web Store — LionScraper
- Microsoft Edge: Edge Add-ons — LionScraper
Python: 3.10 or newer on your machine. If you do not have it yet, download an installer from the Python website and follow the prompts (check the option to add Python to PATH on Windows if the installer offers it).
For MCP: an AI app that supports MCP (e.g. Cursor, Trae).
For the HTTP API: same browser, extension, and daemon as the CLI; use curl, fetch, or any HTTP client against http://127.0.0.1:$PORT (see HTTP API (local REST) below).

This package uses aiohttp for outbound HTTP and WebSocket probes to the daemon. When Chrome/Edge are not detected and the extension is not connected, ping / scrape* can use a server-side HTTP fallback (no in-page JS).

Install (pip)

This package is published on PyPI as lionscraper.

pip install -U lionscraper

You may use a virtual environment (recommended) or pip install -U --user lionscraper if you prefer not to install into the system interpreter.

You get two commands; together they support three integration styles (MCP, CLI, HTTP API):

Command	Role
`lionscraper-mcp`	Thin MCP server (stdio) for AI apps
`lionscraper`	CLI: `daemon`, `stop`, `scrape`, `ping`, … (also runs the process that serves the HTTP API)

If lionscraper-mcp is not on your PATH after install, use the full path to the script (e.g. under your venv’s Scripts/ or bin/), or the python -m lionscraper form below in MCP (still requires a normal pip install -U lionscraper).

MCP (AI applications)

Add MCP in your AI app

Examples assume lionscraper-mcp is on your PATH (UIs differ). In MCP JSON, every env value is a string.

Minimal config (omit env for built-in defaults; PORT defaults to 13808 and must match the extension bridge port):

{
  "mcpServers": {
    "lionscraper": {
      "command": "lionscraper-mcp"
    }
  }
}

Full env example (drop keys you do not need; empty strings behave like “unset” for most of these):

{
  "mcpServers": {
    "lionscraper": {
      "command": "lionscraper-mcp",
      "env": {
        "PORT": "13808",
        "TIMEOUT": "120000",
        "LANG": "en-US",
        "TOKEN": "",
        "DAEMON": ""
      }
    }
  }
}

PORT: HTTP + WebSocket listen port; default 13808; must match the extension bridge port.
TIMEOUT: Milliseconds to wait for a previous instance to release the port before forcing takeover; default 120000; 0 means force quickly.
LANG: Tool descriptions and stderr log language (en-US, zh-CN, or POSIX forms like en_US.UTF-8).
TOKEN: Bearer token shared with the daemon; empty means no Authorization header.
DAEMON: Only 0 disables auto-spawning lionscraper daemon from thin MCP; empty or other values match omitting the key (auto-start allowed).

Restart MCP or the app so the config applies.

If `lionscraper-mcp` is not on `PATH`

After pip install -U lionscraper, you can run the same routing with python -m lionscraper: no arguments after the module name selects thin MCP over stdio; any extra argument (including --debug) uses the CLI instead. Example:

{
  "mcpServers": {
    "lionscraper": {
      "command": "python",
      "args": ["-m", "lionscraper"]
    }
  }
}

Use the same python executable you used to install the package (or python3 on some systems).

Match the port in the browser extension

Open LionScraper settings / options.
Set bridge port to the same value as PORT (e.g. 13808).
If needed, use Reconnect in the extension, or reload the extension / restart the browser.

Day-to-day use (MCP)

Keep the extension enabled and target pages open as required.
Ask in natural language, e.g. “Check if LionScraper is connected” or “Scrape lists / article / emails / phones / links / images from this page.”
If you see not connected or timeout, retry a connection check and confirm PORT matches.

MCP tools (summary)

The server registers tools that mirror extension capabilities. Names and shapes are what your MCP client shows to the model.

Tool	Purpose (short)
`ping`	Check that the extension is connected and registered
`scrape`	Auto-detect structure (lists, tables, …); supports pagination
`scrape_article`	Article body (e.g. Markdown) and metadata
`scrape_emails`	Email addresses on the page
`scrape_phones`	Phone numbers (structured)
`scrape_urls`	Hyperlink URLs
`scrape_images`	Image URLs and basic metadata

Parameters: A full JSON schema for every field belongs in the tool definitions your client displays and in release-accurate docs; duplicating it here goes stale quickly. Use the tool list / descriptions in the AI app when calling MCP.

MCP Resources / Prompts

The thin MCP process (lionscraper-mcp) exposes Resources and Prompts in addition to Tools:

Resources: static Markdown at stable URIs, e.g. lionscraper://guide/connection (PORT alignment, ping troubleshooting), lionscraper://guide/when-to-use-tools (prefer LionScraper over WebFetch/curl/wget by scenario), lionscraper://guide/cli (terminal CLI), lionscraper://reference/tools, lionscraper://reference/common-params. Clients list/read them into context; they are served inside the stdio process and do not require the daemon HTTP path (works even if the extension is offline). For the loopback HTTP control plane (/v1/...), see HTTP API (local REST) below (not an MCP resource URI).
Prompts: workflow templates (e.g. ping-then-scrape, multi-URL, scrape_article, prefer_lionscraper_scraping, extension troubleshooting). Clients list/get prompts; UI varies by host (Cursor, Trae, …).

Copy follows LANG (e.g. zh-CN), same as tool metadata.

CLI (terminal)

The lionscraper command is the terminal front-end to the same stack as MCP: lionscraper daemon listens on PORT (default 13808) for HTTP (used by the CLI and by the thin lionscraper-mcp process) and WebSocket (used by the extension). Set PORT (and optional TOKEN) to match the extension bridge port and any MCP config. Use the CLI for scripts, CI, or quick one-off runs without opening an AI chat.

The CLI talks to the daemon HTTP API on http://127.0.0.1:$PORT (default port 13808, same as the extension). If you do not pass --api-url, a local daemon is auto-started when possible when you run scrape or ping.

Common commands:

lionscraper --help
lionscraper daemon              # keep running; HTTP + WebSocket on PORT
lionscraper stop                # stop daemon on configured PORT
lionscraper ping
lionscraper scrape -u https://www.example.com
lionscraper scrape --method article -u https://www.example.com
# Shorthand: lionscraper -u https://www.example.com   → same as scrape

--method selects which tool the daemon runs (default scrape): scrape, article, emails, phones, urls, images. Repeat -u / --url to pass several URLs in one run.

Set PORT (and optional TOKEN) in the environment so the CLI matches the extension and MCP. Use --api-url http://127.0.0.1:PORT if the daemon is not on the default base URL. Run lionscraper --help for every flag the binary accepts.

Scrape: parameters and richer examples

Below flags are forwarded to the extension (except --bridge-timeout-ms, which only caps how long the CLI waits on the bridge). Actual behavior still depends on the extension version.

Output and connection

Flag	Meaning
`--format json` or `pretty`	JSON one-line vs. indented (default `json`)
`--raw`	Print the tool’s text block as returned, without re-formatting
`-o` / `--output <file>`	Write the result to a file instead of stdout

Timing and load

Flag	Meaning
`--delay <ms>`	Wait after page load before scraping (dynamic content)
`--timeout-ms <ms>`	Per-URL timeout on the extension side
`--bridge-timeout-ms <ms>`	Max wait for this CLI → daemon call
`--scrape-interval <ms>`	Delay between starting multiple URLs
`--concurrency <n>`	Concurrency hint for multi-URL runs
`--scroll-speed <px>`	Global scroll speed (extension semantics)
`--max-pages <n>`	Pagination cap for list-style `scrape`

Lazy-loaded / infinite scroll (maps to waitForScroll in the tool payload)

Flag	Meaning
`--wait-scroll-speed <px>`	Pixels per step while scrolling
`--wait-scroll-interval <ms>`	Delay between scroll steps
`--wait-max-scroll-height <px>`	Optional max scroll distance
`--scroll-container <selector>`	Optional scrollable container selector

Locale

Flag	Meaning
`--lang zh-CN` or `en-US`	Language hint for messages / extension UI strings

Extra payload

Flag	Meaning
`--include-html` + `true` / `false`	Ask for full-page HTML in meta when supported
`--include-text` + `true` / `false`	Ask for full-page plain text in meta when supported

Method-specific filters (examples; see --help for the full set)

Emails (--method emails): --email-domain, --email-keyword, --email-limit
Phones (--method phones): --phone-type, --phone-area-code, --phone-keyword, --phone-limit
URLs (--method urls): --url-domain, --url-keyword, --url-pattern, --url-limit
Images (--method images): --img-min-width, --img-min-height, --img-format, --img-keyword, --img-limit

Optional browser automation hints (when the extension supports them): --auto-launch-browser, --no-auto-launch-browser, --post-launch-wait-ms.

Example A — List / table scrape with pagination, delays, and scroll assist

lionscraper scrape \
  -u https://www.example.com/items \
  --max-pages 5 \
  --delay 800 \
  --timeout-ms 90000 \
  --bridge-timeout-ms 180000 \
  --wait-scroll-speed 400 \
  --wait-scroll-interval 350 \
  --lang zh-CN \
  --format pretty \
  -o items.json

--max-pages: stop after at most five “pages” of list accumulation for this job.
--delay: give the page extra time after load before extraction.
--timeout-ms / --bridge-timeout-ms: extension vs. CLI-side waits; raise both for slow sites or heavy pages.
--wait-scroll-*: gentle scrolling so lazy-loaded rows can appear before scraping.
-o / --format pretty: save a human-readable JSON file for inspection.

Example B — Article body with optional HTML snapshot

lionscraper scrape --method article \
  -u https://www.example.com/blog/post-1 \
  --include-html true \
  --timeout-ms 120000 \
  --format json

--method article: maps to the scrape_article tool (Markdown-style body + metadata when the extension provides it).
--include-html true: requests additional HTML in the result meta where supported (larger payload).

Example C — Emails and URLs with filters, multiple pages

lionscraper scrape --method emails \
  -u https://www.example.com/contact \
  --email-domain example.com \
  --email-keyword support \
  --email-limit 30 \
  --format pretty

lionscraper scrape --method urls \
  -u https://www.example.com \
  -u https://www.example.com/docs \
  --url-domain example.com \
  --url-limit 200 \
  --scrape-interval 500

Email flags: narrow to addresses in a domain, matching a keyword, and cap the count.
Two -u values: two URLs processed in one invocation; --scrape-interval spaces out task starts.

Example D — Images with size / format filters

lionscraper scrape --method images \
  -u https://www.example.com/gallery \
  --img-min-width 240 \
  --img-min-height 240 \
  --img-format webp \
  --img-limit 40 \
  -o gallery-images.json

Filters out small thumbnails and non-WebP assets when the extension honors these fields; --img-limit caps how many entries are returned.

HTTP API (local REST)

Base URL: http://127.0.0.1:$PORT (default 13808). Binds 127.0.0.1 only. The extension must be connected on that port and a daemon must be listening (same process model as the rest of this package).

Method	Path	Response (success)
`GET`	`/v1/health`	`{ "ok", "identity", "bridgePort", "sessionCount" }`
`POST`	`/v1/daemon/shutdown`	`{ "ok": true }` then the daemon exits
`POST`	`/v1/tools/call`	Tool result JSON (below)

Anything else → 404 { "ok": false, "error": { "code": "NOT_FOUND", "message": "Not found" } }.

Auth: If env TOKEN is set on the daemon, every request needs Authorization: Bearer <TOKEN>; otherwise omit the header.

`POST /v1/tools/call`

Headers: Content-Type: application/json. For streaming, also Accept: application/x-ndjson.
Body:

{ "name": "<tool>", "arguments": { }, "progressToken": "<optional>" }

Field	Meaning
`name`	One of: `ping`, `scrape`, `scrape_article`, `scrape_emails`, `scrape_phones`, `scrape_urls`, `scrape_images`
`arguments`	Tool payload; omit or `{}` if empty
`progressToken`	Any string or number; with `Accept: application/x-ndjson`, the body is NDJSON (lines `type: "progress"`, then a final `type: "result"` or `type: "error"`)

`arguments` (JSON keys)

ping only: optional lang ("en-US" | "zh-CN"), autoLaunchBrowser (boolean), postLaunchWaitMs (number, 3000–60000).

All scrape-family tools (scrape, scrape_article, scrape_emails, scrape_phones, scrape_urls, scrape_images) share:

Key	Type	Constraint / note
`url`	string \| string[]	Required — one URL or an array of URLs
`lang`	`"en-US"` \| `"zh-CN"`	Optional
`delay`	number	Optional, ≥ 0
`timeoutMs`	number	Optional, ≥ 1000
`bridgeTimeoutMs`	number	Optional, ≥ 1000
`includeHtml`	boolean	Optional
`includeText`	boolean	Optional
`scrapeInterval`	number	Optional
`concurrency`	number	Optional
`scrollSpeed`	number	Optional
`autoLaunchBrowser`	boolean	Optional
`postLaunchWaitMs`	number	Optional, 3000–60000
`waitForScroll`	object	Optional; if set, must include `scrollSpeed` and `scrollInterval`; may include `maxScrollHeight`, `scrollContainerSelector`

Only scrape: optional maxPages (number, ≥ 1).

Optional filter object (only on the matching tool):

`name`	`filter` properties
`scrape_emails`	`domain`, `keyword`, `limit` (≥ 1)
`scrape_phones`	`type`, `areaCode`, `keyword`, `limit` (≥ 1)
`scrape_urls`	`domain`, `keyword`, `pattern`, `limit` (≥ 1)
`scrape_images`	`minWidth`, `minHeight` (≥ 0), `format`, `keyword`, `limit` (≥ 1)

Unknown name, bad JSON, or schema violations → 400 { "code": "BAD_REQUEST", ... }. Bad bearer → 401 UNAUTHORIZED. Tool crash → 500 INTERNAL.

200 body: { "content": [ { "type": "text", "text": "..." } ], "isError"?: boolean }.

Examples (`PORT=13808`)

curl -sS "http://127.0.0.1:13808/v1/health"
curl -sS -X POST "http://127.0.0.1:13808/v1/tools/call" \
  -H "Content-Type: application/json" \
  -d '{"name":"ping","arguments":{}}'
curl -sS -X POST "http://127.0.0.1:13808/v1/tools/call" \
  -H "Content-Type: application/json" \
  -d '{"name":"scrape_article","arguments":{"url":"https://www.example.com","timeoutMs":120000}}'
# With TOKEN on the daemon:
curl -sS -X POST "http://127.0.0.1:13808/v1/tools/call" \
  -H "Content-Type: application/json" -H "Authorization: Bearer YOUR_TOKEN" \
  -d '{"name":"ping","arguments":{}}'

FAQ (plain language)

Q: Extension not connected or scraping fails?

Extension enabled?
PORT in MCP (or env for CLI) exactly matches the extension bridge port?
Avoid multiple conflicting MCP/CLI setups on one machine.

Q: Many “tools” visible in the AI app—does that mean the extension is connected?

Not necessarily. Tools only confirm AI → MCP server; the extension must still connect on the same port and register.

Q: CLI says it cannot reach the daemon?

Start lionscraper daemon in another terminal, or fix PORT / --api-url.

Q: I want to drive scraping from my own HTTP client?

Keep the daemon and extension on the same PORT, then POST /v1/tools/call with name and arguments as in HTTP API (local REST). GET /v1/health checks that the listener is LionScraper.

License

MIT (same as the lionscraper package on PyPI).

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.0.6

Apr 8, 2026

1.0.5

Apr 8, 2026

1.0.3

Apr 6, 2026

1.0.2

Apr 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lionscraper-1.0.6.tar.gz (77.0 kB view details)

Uploaded Apr 8, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lionscraper-1.0.6-py3-none-any.whl (79.1 kB view details)

Uploaded Apr 8, 2026 Python 3

File details

Details for the file lionscraper-1.0.6.tar.gz.

File metadata

Download URL: lionscraper-1.0.6.tar.gz
Upload date: Apr 8, 2026
Size: 77.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for lionscraper-1.0.6.tar.gz
Algorithm	Hash digest
SHA256	`827176570ca4c4f2816b058d1d24eb302c0b6262e7011da17f578300bec9864b`
MD5	`72c77f88bcefbf16b46b78c18631bd0d`
BLAKE2b-256	`dc5a0fa39c615727570933fa894419d06812f9634cfab3701bcfeab172409766`

See more details on using hashes here.

File details

Details for the file lionscraper-1.0.6-py3-none-any.whl.

File metadata

Download URL: lionscraper-1.0.6-py3-none-any.whl
Upload date: Apr 8, 2026
Size: 79.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for lionscraper-1.0.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`feecb0522a047549af3cb3825fdf1f1526a9a7d5fc33e181e98acd65e5a3e94c`
MD5	`6f8ebcedfc5872746f6c9768574ac745`
BLAKE2b-256	`fe1ccfc3cd58666a07c125a2990555435e18f1b0b8391c2f9d6154ece22d5966`

See more details on using hashes here.

lionscraper 1.0.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

LionScraper MCP + CLI + API service (Python)

What is this?

Before you start

Install (pip)

MCP (AI applications)

Add MCP in your AI app

If lionscraper-mcp is not on PATH

Match the port in the browser extension

Day-to-day use (MCP)

MCP tools (summary)

MCP Resources / Prompts

CLI (terminal)

Scrape: parameters and richer examples

HTTP API (local REST)

POST /v1/tools/call

arguments (JSON keys)

Examples (PORT=13808)

FAQ (plain language)

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

If `lionscraper-mcp` is not on `PATH`

`POST /v1/tools/call`

`arguments` (JSON keys)

Examples (`PORT=13808`)