Skip to main content

Advanced Web Crawling Platform with Deep Analysis and MCP Server

Project description

Crawlemoon MCP Server

Crawlemoon MCP Server — free, AI-native web crawling for the agent era

python 3.10+ · pypi 1.1.0 · MIT · MCP-native · code style black

A free, open-source MCP server that gives any agent (Claude Code, Cursor, Windsurf, …) 55 production-grade tools for the full web-crawling stack: deep analysis, stealth, API discovery, session recording → runnable crawler, smart extraction. No proprietary API. No per-request fee.

Crawlemoon capabilities — deep analysis, stealth, record→crawler, smart extraction


Quick start

Three install paths — uvx, pipx, pip

The recommended path needs no install — uvx runs straight from PyPI:

{
  "mcpServers": {
    "crawlemoon": {
      "command": "uvx",
      "args": ["crawlemoon"]
    }
  }
}

Requires uv. Install once: curl -LsSf https://astral.sh/uv/install.sh | sh. Or use pipx run crawlemoon / pip install crawlemoon instead.

Where to put that JSON: Cursor → Settings → MCP. Claude Code → ~/.config/claude/mcp_settings.json. Windsurf → Settings → MCP Servers.


How it works

Agent → Crawlemoon → Browser/HTTP/Proxy → target web

Your agent talks to Crawlemoon over the Model Context Protocol. Crawlemoon owns a hardened browser pool, an HTTP stack with TLS fingerprinting, and a rotating proxy pool. While it fetches pages, it captures network traffic, reads scripts, and introspects schemas — so the agent gets clean structured data, not raw HTML.


What's in the box

A short list — see the source for the full set of 55 tools.

Group Tools
Deep analysis deep_analyze, discover_apis, introspect_graphql, analyze_websocket, analyze_auth, detect_protection, detect_technology
Stealth stealth_request, configure_proxies, configure_rate_limit, add_proxy, test_proxy
Record → crawler record_session, stop_recording, export_recording, generate_crawler
Extraction smart_extract, extract_article, extract_tables, extract_links, extract_forms, extract_metadata, convert_to_markdown
Page interaction take_screenshot, fill_form, wait_and_extract, compare_pages, measure_performance, check_accessibility, get_dom_tree
Sessions & cache save_session, load_session, get_cookies, get_storage, clear_cache, get_cache_stats
Advanced (opt-in) execute_js, execute_cdp, deobfuscate_js, extract_from_js, solve_captcha

Smart extraction — bring any LLM, including free ones

smart_extract works without any API key using pattern matching. Plug in any OpenAI-compatible endpoint for higher accuracy — including FREE tiers:

# OpenRouter (free models exist)
CRAWLEMOON_LLM_PROVIDER=openrouter
CRAWLEMOON_LLM_API_KEY=sk-or-v1-xxx
CRAWLEMOON_LLM_MODEL=meta-llama/llama-3.2-3b-instruct:free

# Groq (free, very fast)
CRAWLEMOON_LLM_PROVIDER=groq
CRAWLEMOON_LLM_API_KEY=gsk_xxx

# Local Ollama (no key needed)
CRAWLEMOON_LLM_PROVIDER=ollama
CRAWLEMOON_LLM_MODEL=llama3.2

Together, DeepSeek, Mistral, Fireworks, and standard OpenAI also work via CRAWLEMOON_LLM_BASE_URL.


Configuration

Variable Default Notes
CRAWLEMOON_HEADLESS true Run browser without UI
CRAWLEMOON_BROWSER chromium chromium / firefox / webkit
CRAWLEMOON_POOL_SIZE 5 Max concurrent browsers
CRAWLEMOON_NAV_TIMEOUT 30.0 Page-load timeout (s)
CRAWLEMOON_API_KEY unset If set, every tool call must include matching _api_key
CRAWLEMOON_ALLOW_DANGEROUS_JS false Required for execute_js / execute_cdp / deobfuscate_js
CRAWLEMOON_JS_MAX_LENGTH 50000 Length cap for JS payloads
CRAWLEMOON_JS_EXEC_TIMEOUT 10.0 Per-script timeout (s)

Security

execute_js, execute_cdp, and deobfuscate_js are disabled by default — they execute or operate on arbitrary code in a real browser. Enable on trusted networks with CRAWLEMOON_ALLOW_DANGEROUS_JS=true. Even then, payloads are length-capped, time-bounded, and a denylist rejects eval, new Function, dynamic import(), document.write, importScripts, and WebAssembly.{compile,instantiate}. Set CRAWLEMOON_API_KEY so MCP clients must present a matching _api_key.

These are mitigations, not a sandbox: do not expose this server to untrusted clients.


Develop

git clone https://github.com/razavioo/crawlemoon.git
cd crawlemoon
make dev-install      # editable install + dev/captcha/ocr extras + pre-commit
make test             # pytest
make lint             # ruff + mypy

Releases

This project uses Trusted Publishing (OIDC) via GitHub Actions to automate publishing releases directly to PyPI.

To release a new version:

  1. Bump the version number in pyproject.toml.
  2. Commit the change and create a git tag matching the version (e.g. v1.1.8):
    git add pyproject.toml
    git commit -m "chore: bump version to 1.1.8"
    git tag v1.1.8
    
  3. Push your branch and the tag to GitHub:
    git push origin main --tags
    

GitHub Actions will automatically run tests, build the package, and publish it securely to PyPI under the crawlemoon package space.

PRs welcome. Particularly interested in: distributed mode (Redis queue), result sinks (Postgres / S3), Prometheus metrics. See MIT License.

Made by emad.dev

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crawlemoon-1.1.8.tar.gz (163.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crawlemoon-1.1.8-py3-none-any.whl (137.1 kB view details)

Uploaded Python 3

File details

Details for the file crawlemoon-1.1.8.tar.gz.

File metadata

  • Download URL: crawlemoon-1.1.8.tar.gz
  • Upload date:
  • Size: 163.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for crawlemoon-1.1.8.tar.gz
Algorithm Hash digest
SHA256 d3ae4501977ab3458b555f65a906f412d8fd86842c94c795cf3f17de7624f317
MD5 d11b41c67f2e46a2aa8f47092cfb6177
BLAKE2b-256 2aad4e2712a1f70b5636929c5afd8166438918427eb59da5905710cc8a726528

See more details on using hashes here.

Provenance

The following attestation bundles were made for crawlemoon-1.1.8.tar.gz:

Publisher: release.yml on razavioo/crawlemoon

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file crawlemoon-1.1.8-py3-none-any.whl.

File metadata

  • Download URL: crawlemoon-1.1.8-py3-none-any.whl
  • Upload date:
  • Size: 137.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for crawlemoon-1.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 a58968a8c818c28eac3c3f4702d709bfea2b016fb667d49edbfa859dabf98d9c
MD5 8f830c5e60d5cd972876f116c8dd3b7b
BLAKE2b-256 bb4282549110dc294782ca839a5e2a512912ad1a0308bb49b4b387a2bd52c639

See more details on using hashes here.

Provenance

The following attestation bundles were made for crawlemoon-1.1.8-py3-none-any.whl:

Publisher: release.yml on razavioo/crawlemoon

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page