Skip to main content

MCP server for web scraping via DataImpulse residential proxies with per-request country targeting

Project description

DataImpulse Scraper (MCP)

Give your AI agent — Claude Code, Cursor, Codex, Claude Desktop, … — the power to scrape the web from any country, through DataImpulse residential proxies, with full JavaScript rendering to clean, LLM‑ready markdown.

The only thing you configure is your DataImpulse login + password. No clone, no Python setup, no .env.

🟢 Get your DataImpulse residential proxy plan → — residential proxies from $1/GB, pay‑as‑you‑go, no expiry.


Install

Grab your credentials first: open your DataImpulse dashboard →Residential Proxy → Proxy Access → copy your Login and Password. Then pick a lane:

⚡ Option 1 — one command (macOS / Linux)

curl -LsSf https://raw.githubusercontent.com/sonnysangha/dataimpulse-scraper/main/install.sh | sh

That's the whole install. The script sets up uv if you don't have it, asks for your DataImpulse login + password, verifies them with a live proxy check, and auto‑configures Claude Code, Cursor, and Claude Desktop — whichever you have. (Non‑interactive? DI_USER=... DI_PASS=... sh install.sh)

🖱️ Option 2 — one click (Cursor)

Install MCP Server

Click, approve, then replace the two YOUR_DATAIMPULSE_* placeholders with your login and password. Done. (Needs uv installed — see Option 4, step 1.)

📦 Option 3 — double‑click (Claude Desktop)

Download dataimpulse-scraper.mcpb, double‑click it (or drag it onto Claude Desktop), and type your login + password into the form it shows you — the password is stored in your OS keychain, not a file.

🛠️ Option 4 — manual (any client, incl. Windows)

1. Install uv — the tiny, fast runner that launches the server and auto‑downloads everything else:

# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows (PowerShell)
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

2. Add the server to your AI app:

Claude Code — one command:

claude mcp add dataimpulse-scraper \
  --env DI_USER=YOUR_LOGIN \
  --env DI_PASS=YOUR_PASSWORD \
  -- uvx dataimpulse-scraper

Cursor / Claude Desktop / Codex / anything else — paste into your MCP config (e.g. .cursor/mcp.json):

{
  "mcpServers": {
    "dataimpulse-scraper": {
      "command": "uvx",
      "args": ["dataimpulse-scraper"],
      "env": {
        "DI_USER": "your_dataimpulse_login",
        "DI_PASS": "your_dataimpulse_password"
      }
    }
  }
}

Then: restart your AI app and ask

MCP servers load at startup (in Claude Code, /mcp should show dataimpulse-scraper connected). Then just ask:

🏠 "Read https://www.zillow.com/homes/for_sale/ and list the first 5 homes."

🌏 "Check our exit IP from Japan."

🆚 "Compare what's trending on Reddit in the US, UK, and Japan."

That's it. 🎉 The first read_page downloads a headless browser (~1 min, one‑time, auto‑cached) — every run after is instant.

📓 More copy‑paste recipes with real output in EXAMPLES.md.


The tools your agent gets

Tool What it does When to use
read_page(url, country) Primary. Real browser → clean markdown (Crawl4AI) Any page, especially JS‑heavy / SPAs
read_page_from_regions(url, regions) The same page from many countries at once → {region: markdown} Compare prices / stock / content by region
fetch_html(url, country) Raw HTML / JSON, fast (no browser) Static pages, APIs
check_proxy(country) Exit IP + geolocation Prove the proxy / geo works

Country codes are 2‑letter ISO (us, de, jp, gb, …). Every request routes through a fresh residential IP in that country — just ask in plain language ("read this as a German visitor") and the agent picks the right tool.

What works vs what needs a login

read_page fetches public pages like a real browser. It does not log in or bypass auth walls.

✅ Works (public) ❌ Needs a login
Reddit, YouTube, Bluesky, Mastodon, news, e‑commerce, SERPs X/Twitter, Instagram, Facebook, LinkedIn

A residential proxy beats IP‑based blocking and geo‑walls — not authentication. For login‑gated platforms, use their official API.

Notes

  • Country targeting only — city/state/ZIP filters bill at , so stay country‑level.
  • Each call gets a fresh residential IP; change country for geo or to dodge IP rate‑limits.
  • If the headless browser ever fails to auto‑install: uvx --from dataimpulse-scraper playwright install chromium
  • Respect each site's Terms of Service and robots.txt. Scrape public data responsibly.
  • Security: credentials live in your MCP client's env block. Never commit proxy passwords — rotate them in the dashboard if exposed.

Local development (running from this folder)

Until the package is on PyPI — or when hacking on your own fork — run it straight from a checkout:

git clone https://github.com/sonnysangha/dataimpulse-scraper
cd dataimpulse-scraper
uv sync                                   # creates .venv, installs deps
cp .env.example .env                      # local dev only — add DI_USER / DI_PASS
uv run dataimpulse-scraper --selftest us  # prints a US exit IP → you're live

Then point your MCP config at the checkout instead of PyPI:

{
  "mcpServers": {
    "dataimpulse-scraper": {
      "command": "uvx",
      "args": [
        "--from",
        "/absolute/path/to/dataimpulse-scraper",
        "dataimpulse-scraper"
      ],
      "env": { "DI_USER": "your_login", "DI_PASS": "your_password" }
    }
  }
}

.env is only for local development (it's gitignored). End users never create one.

Maintainer? Publishing to PyPI (GitHub Action, trusted publishing, release loop) is covered step‑by‑step in PUBLISHING.md.


Built with DataImpulse residential proxies. Get a plan — $1/GB →

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataimpulse_scraper-1.0.1.tar.gz (319.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataimpulse_scraper-1.0.1-py3-none-any.whl (10.5 kB view details)

Uploaded Python 3

File details

Details for the file dataimpulse_scraper-1.0.1.tar.gz.

File metadata

  • Download URL: dataimpulse_scraper-1.0.1.tar.gz
  • Upload date:
  • Size: 319.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for dataimpulse_scraper-1.0.1.tar.gz
Algorithm Hash digest
SHA256 ead43ca6d92cf167c51ea983b5be118d5c9c4297b62493c82506f20384cb0506
MD5 23d7d1d0c0fd19e500ed0e3fa91baf04
BLAKE2b-256 7f7d8e450b387617eee4cf85a5d7906c80a4ebea236d8fff74bbf6b474c6dac3

See more details on using hashes here.

File details

Details for the file dataimpulse_scraper-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: dataimpulse_scraper-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 10.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for dataimpulse_scraper-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 dea71f2c1ea946246d3d8d3c44f6d6816903d94bb76c3e07e41865b2c6b01669
MD5 975547b830e32948eca9d0bdb139aba5
BLAKE2b-256 313e5f517dcb561aecf1d0472e453754fdd60ed2e4fb5cd18eb8dc3a492020c2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page