Advanced Web Crawling Platform with Deep Analysis and MCP Server
Project description
Crawlemoon MCP Server
A free, open-source MCP server that gives any agent (Claude Code, Cursor, Windsurf, …) 55 production-grade tools for the full web-crawling stack: deep analysis, stealth, API discovery, session recording → runnable crawler, smart extraction. No proprietary API. No per-request fee.
Quick start
The recommended path needs no install — uvx runs straight from PyPI:
{
"mcpServers": {
"crawlemoon": {
"command": "uvx",
"args": ["crawlemoon"]
}
}
}
Requires
uv. Install once:curl -LsSf https://astral.sh/uv/install.sh | sh. Or usepipx run crawlemoon/pip install crawlemooninstead.
Where to put that JSON: Cursor → Settings → MCP. Claude Code → ~/.config/claude/mcp_settings.json. Windsurf → Settings → MCP Servers.
How it works
Your agent talks to Crawlemoon over the Model Context Protocol. Crawlemoon owns a hardened browser pool, an HTTP stack with TLS fingerprinting, and a rotating proxy pool. While it fetches pages, it captures network traffic, reads scripts, and introspects schemas — so the agent gets clean structured data, not raw HTML.
What's in the box
A short list — see the source for the full set of 55 tools.
| Group | Tools |
|---|---|
| Deep analysis | deep_analyze, discover_apis, introspect_graphql, analyze_websocket, analyze_auth, detect_protection, detect_technology |
| Stealth | stealth_request, configure_proxies, configure_rate_limit, add_proxy, test_proxy |
| Record → crawler | record_session, stop_recording, export_recording, generate_crawler |
| Extraction | smart_extract, extract_article, extract_tables, extract_links, extract_forms, extract_metadata, convert_to_markdown |
| Page interaction | take_screenshot, fill_form, wait_and_extract, compare_pages, measure_performance, check_accessibility, get_dom_tree |
| Sessions & cache | save_session, load_session, get_cookies, get_storage, clear_cache, get_cache_stats |
| Advanced (opt-in) | execute_js, execute_cdp, deobfuscate_js, extract_from_js, solve_captcha |
Smart extraction — bring any LLM, including free ones
smart_extract works without any API key using pattern matching. Plug in any OpenAI-compatible endpoint for higher accuracy — including FREE tiers:
# OpenRouter (free models exist)
CRAWLEMOON_LLM_PROVIDER=openrouter
CRAWLEMOON_LLM_API_KEY=sk-or-v1-xxx
CRAWLEMOON_LLM_MODEL=meta-llama/llama-3.2-3b-instruct:free
# Groq (free, very fast)
CRAWLEMOON_LLM_PROVIDER=groq
CRAWLEMOON_LLM_API_KEY=gsk_xxx
# Local Ollama (no key needed)
CRAWLEMOON_LLM_PROVIDER=ollama
CRAWLEMOON_LLM_MODEL=llama3.2
Together, DeepSeek, Mistral, Fireworks, and standard OpenAI also work via CRAWLEMOON_LLM_BASE_URL.
Configuration
| Variable | Default | Notes |
|---|---|---|
CRAWLEMOON_HEADLESS |
true |
Run browser without UI |
CRAWLEMOON_BROWSER |
chromium |
chromium / firefox / webkit |
CRAWLEMOON_POOL_SIZE |
5 |
Max concurrent browsers |
CRAWLEMOON_NAV_TIMEOUT |
30.0 |
Page-load timeout (s) |
CRAWLEMOON_API_KEY |
unset | If set, every tool call must include matching _api_key |
CRAWLEMOON_ALLOW_DANGEROUS_JS |
false |
Required for execute_js / execute_cdp / deobfuscate_js |
CRAWLEMOON_JS_MAX_LENGTH |
50000 |
Length cap for JS payloads |
CRAWLEMOON_JS_EXEC_TIMEOUT |
10.0 |
Per-script timeout (s) |
Security
execute_js, execute_cdp, and deobfuscate_js are disabled by default — they execute or operate on arbitrary code in a real browser. Enable on trusted networks with CRAWLEMOON_ALLOW_DANGEROUS_JS=true. Even then, payloads are length-capped, time-bounded, and a denylist rejects eval, new Function, dynamic import(), document.write, importScripts, and WebAssembly.{compile,instantiate}. Set CRAWLEMOON_API_KEY so MCP clients must present a matching _api_key.
These are mitigations, not a sandbox: do not expose this server to untrusted clients.
Develop
git clone https://github.com/razavioo/crawlemoon.git
cd crawlemoon
make dev-install # editable install + dev/captcha/ocr extras + pre-commit
make test # pytest
make lint # ruff + mypy
Releases
This project uses Trusted Publishing (OIDC) via GitHub Actions to automate publishing releases directly to PyPI.
To release a new version:
- Bump the version number in
pyproject.toml. - Commit the change and create a git tag matching the version (e.g.
v1.1.8):git add pyproject.toml git commit -m "chore: bump version to 1.1.8" git tag v1.1.8
- Push your branch and the tag to GitHub:
git push origin main --tags
GitHub Actions will automatically run tests, build the package, and publish it securely to PyPI under the crawlemoon package space.
PRs welcome. Particularly interested in: distributed mode (Redis queue), result sinks (Postgres / S3), Prometheus metrics. See MIT License.
Made by emad.dev
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file crawlemoon-1.1.8.tar.gz.
File metadata
- Download URL: crawlemoon-1.1.8.tar.gz
- Upload date:
- Size: 163.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d3ae4501977ab3458b555f65a906f412d8fd86842c94c795cf3f17de7624f317
|
|
| MD5 |
d11b41c67f2e46a2aa8f47092cfb6177
|
|
| BLAKE2b-256 |
2aad4e2712a1f70b5636929c5afd8166438918427eb59da5905710cc8a726528
|
Provenance
The following attestation bundles were made for crawlemoon-1.1.8.tar.gz:
Publisher:
release.yml on razavioo/crawlemoon
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
crawlemoon-1.1.8.tar.gz -
Subject digest:
d3ae4501977ab3458b555f65a906f412d8fd86842c94c795cf3f17de7624f317 - Sigstore transparency entry: 1691946180
- Sigstore integration time:
-
Permalink:
razavioo/crawlemoon@282014eef4d937d67873f8316dbaad6357cdb027 -
Branch / Tag:
refs/tags/v1.1.8 - Owner: https://github.com/razavioo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@282014eef4d937d67873f8316dbaad6357cdb027 -
Trigger Event:
push
-
Statement type:
File details
Details for the file crawlemoon-1.1.8-py3-none-any.whl.
File metadata
- Download URL: crawlemoon-1.1.8-py3-none-any.whl
- Upload date:
- Size: 137.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a58968a8c818c28eac3c3f4702d709bfea2b016fb667d49edbfa859dabf98d9c
|
|
| MD5 |
8f830c5e60d5cd972876f116c8dd3b7b
|
|
| BLAKE2b-256 |
bb4282549110dc294782ca839a5e2a512912ad1a0308bb49b4b387a2bd52c639
|
Provenance
The following attestation bundles were made for crawlemoon-1.1.8-py3-none-any.whl:
Publisher:
release.yml on razavioo/crawlemoon
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
crawlemoon-1.1.8-py3-none-any.whl -
Subject digest:
a58968a8c818c28eac3c3f4702d709bfea2b016fb667d49edbfa859dabf98d9c - Sigstore transparency entry: 1691946400
- Sigstore integration time:
-
Permalink:
razavioo/crawlemoon@282014eef4d937d67873f8316dbaad6357cdb027 -
Branch / Tag:
refs/tags/v1.1.8 - Owner: https://github.com/razavioo
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@282014eef4d937d67873f8316dbaad6357cdb027 -
Trigger Event:
push
-
Statement type: