MCP server for web scraping via DataImpulse residential proxies with per-request country targeting
Project description
DataImpulse Scraper (MCP)
Give your AI agent — Claude Code, Cursor, Codex, Claude Desktop, … — the power to scrape the web from any country, through DataImpulse residential proxies, with full JavaScript rendering to clean, LLM‑ready markdown.
The only thing you configure is your DataImpulse login + password. No clone, no Python setup, no .env.
🟢 Get your DataImpulse residential proxy plan → — residential proxies from $1/GB, pay‑as‑you‑go, no expiry.
Install
Grab your credentials first: open your DataImpulse dashboard → → Residential Proxy → Proxy Access → copy your Login and Password. Then pick a lane:
⚡ Option 1 — one command (macOS / Linux)
curl -LsSf https://raw.githubusercontent.com/sonnysangha/dataimpulse-scraper/main/install.sh | sh
That's the whole install. The script sets up uv if you don't have it, asks for your DataImpulse login + password, verifies them with a live proxy check, and auto‑configures Claude Code, Cursor, and Claude Desktop — whichever you have. (Non‑interactive? DI_USER=... DI_PASS=... sh install.sh)
🖱️ Option 2 — one click (Cursor)
Click, approve, then replace the two YOUR_DATAIMPULSE_* placeholders with your login and password. Done. (Needs uv installed — see Option 4, step 1.)
📦 Option 3 — double‑click (Claude Desktop)
Download dataimpulse-scraper.mcpb, double‑click it (or drag it onto Claude Desktop), and type your login + password into the form it shows you — the password is stored in your OS keychain, not a file.
🛠️ Option 4 — manual (any client, incl. Windows)
1. Install uv — the tiny, fast runner that launches the server and auto‑downloads everything else:
# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh
# Windows (PowerShell)
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"
2. Add the server to your AI app:
Claude Code — one command:
claude mcp add dataimpulse-scraper \
--env DI_USER=YOUR_LOGIN \
--env DI_PASS=YOUR_PASSWORD \
-- uvx dataimpulse-scraper
Cursor / Claude Desktop / Codex / anything else — paste into your MCP config (e.g. .cursor/mcp.json):
{
"mcpServers": {
"dataimpulse-scraper": {
"command": "uvx",
"args": ["dataimpulse-scraper"],
"env": {
"DI_USER": "your_dataimpulse_login",
"DI_PASS": "your_dataimpulse_password"
}
}
}
}
Then: restart your AI app and ask
MCP servers load at startup (in Claude Code, /mcp should show dataimpulse-scraper connected). Then just ask:
"Read https://www.zillow.com/homes/for_sale/ and list the first 5 homes." "Check our exit IP from Japan." "Compare what's trending on Reddit in the US, UK, and Japan."
That's it. 🎉 The first read_page downloads a headless browser (~1 min, one‑time, auto‑cached) — every run after is instant.
📓 More copy‑paste recipes with real output in EXAMPLES.md.
The tools your agent gets
| Tool | What it does | When to use |
|---|---|---|
read_page(url, country) |
Primary. Real browser → clean markdown (Crawl4AI) | Any page, especially JS‑heavy / SPAs |
read_page_from_regions(url, regions) |
The same page from many countries at once → {region: markdown} |
Compare prices / stock / content by region |
fetch_html(url, country) |
Raw HTML / JSON, fast (no browser) | Static pages, APIs |
check_proxy(country) |
Exit IP + geolocation | Prove the proxy / geo works |
Country codes are 2‑letter ISO (us, de, jp, gb, …). Every request routes through a fresh residential IP in that country — just ask in plain language ("read this as a German visitor") and the agent picks the right tool.
What works vs what needs a login
read_page fetches public pages like a real browser. It does not log in or bypass auth walls.
| ✅ Works (public) | ❌ Needs a login |
|---|---|
| Reddit, YouTube, Bluesky, Mastodon, news, e‑commerce, SERPs | X/Twitter, Instagram, Facebook, LinkedIn |
A residential proxy beats IP‑based blocking and geo‑walls — not authentication. For login‑gated platforms, use their official API.
Notes
- Country targeting only — city/state/ZIP filters bill at 2×, so stay country‑level.
- Each call gets a fresh residential IP; change
countryfor geo or to dodge IP rate‑limits. - If the headless browser ever fails to auto‑install:
uvx --from dataimpulse-scraper playwright install chromium - Respect each site's Terms of Service and
robots.txt. Scrape public data responsibly. - Security: credentials live in your MCP client's
envblock. Never commit proxy passwords — rotate them in the dashboard if exposed.
Local development (running from this folder)
Until the package is on PyPI — or when hacking on your own fork — run it straight from a checkout:
git clone https://github.com/sonnysangha/dataimpulse-scraper
cd dataimpulse-scraper
uv sync # creates .venv, installs deps
cp .env.example .env # local dev only — add DI_USER / DI_PASS
uv run dataimpulse-scraper --selftest us # prints a US exit IP → you're live
Then point your MCP config at the checkout instead of PyPI:
{
"mcpServers": {
"dataimpulse-scraper": {
"command": "uvx",
"args": [
"--from",
"/absolute/path/to/dataimpulse-scraper",
"dataimpulse-scraper"
],
"env": { "DI_USER": "your_login", "DI_PASS": "your_password" }
}
}
}
.env is only for local development (it's gitignored). End users never create one.
Maintainer? Publishing to PyPI (GitHub Action, trusted publishing, release loop) is covered step‑by‑step in PUBLISHING.md.
Built with DataImpulse residential proxies. Get a plan — $1/GB →
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dataimpulse_scraper-1.0.0.tar.gz.
File metadata
- Download URL: dataimpulse_scraper-1.0.0.tar.gz
- Upload date:
- Size: 318.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
451640946c584fc7b144e07295067d5ab8f39454e29e59eaf0c8df34f5214c4a
|
|
| MD5 |
d7e88029b110c191e063eb5679be0ee1
|
|
| BLAKE2b-256 |
4a34d4328ceb5d37d9cf28d1e437094c41e5b38cb35eb4a18ea8dee79e26f771
|
File details
Details for the file dataimpulse_scraper-1.0.0-py3-none-any.whl.
File metadata
- Download URL: dataimpulse_scraper-1.0.0-py3-none-any.whl
- Upload date:
- Size: 10.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9ffc06856402fc56aa7d527e0cc5520fc387f62a5bf41cd113559230e90c8fbe
|
|
| MD5 |
1df91cd82290fb92a2b58d07948a0119
|
|
| BLAKE2b-256 |
bb6b35aedee8429e8be590ed4881ae46681192c27fdfa901635fb31edbc18a76
|