Skip to main content

MCP server for web scraping via DataImpulse residential proxies with per-request country targeting

Project description

DataImpulse Scraper (MCP)

Give your AI agent — Claude Code, Cursor, Codex, Claude Desktop, … — the power to scrape the web from any country, through DataImpulse residential proxies, with full JavaScript rendering to clean, LLM‑ready markdown.

The only thing you configure is your DataImpulse login + password. No clone, no Python setup, no .env.

🟢 Get your DataImpulse residential proxy plan → — residential proxies from $1/GB, pay‑as‑you‑go, no expiry.


Install

Grab your credentials first: open your DataImpulse dashboard →Residential Proxy → Proxy Access → copy your Login and Password. Then pick a lane:

⚡ Option 1 — one command (macOS / Linux)

curl -LsSf https://raw.githubusercontent.com/sonnysangha/dataimpulse-scraper/main/install.sh | sh

That's the whole install. The script sets up uv if you don't have it, asks for your DataImpulse login + password, verifies them with a live proxy check, and auto‑configures Claude Code, Cursor, and Claude Desktop — whichever you have. (Non‑interactive? DI_USER=... DI_PASS=... sh install.sh)

🖱️ Option 2 — one click (Cursor)

Install MCP Server

Click, approve, then replace the two YOUR_DATAIMPULSE_* placeholders with your login and password. Done. (Needs uv installed — see Option 4, step 1.)

📦 Option 3 — double‑click (Claude Desktop)

Download dataimpulse-scraper.mcpb, double‑click it (or drag it onto Claude Desktop), and type your login + password into the form it shows you — the password is stored in your OS keychain, not a file.

🛠️ Option 4 — manual (any client, incl. Windows)

1. Install uv — the tiny, fast runner that launches the server and auto‑downloads everything else:

# macOS / Linux
curl -LsSf https://astral.sh/uv/install.sh | sh

# Windows (PowerShell)
powershell -c "irm https://astral.sh/uv/install.ps1 | iex"

2. Add the server to your AI app:

Claude Code — one command:

claude mcp add dataimpulse-scraper \
  --env DI_USER=YOUR_LOGIN \
  --env DI_PASS=YOUR_PASSWORD \
  -- uvx dataimpulse-scraper

Cursor / Claude Desktop / Codex / anything else — paste into your MCP config (e.g. .cursor/mcp.json):

{
  "mcpServers": {
    "dataimpulse-scraper": {
      "command": "uvx",
      "args": ["dataimpulse-scraper"],
      "env": {
        "DI_USER": "your_dataimpulse_login",
        "DI_PASS": "your_dataimpulse_password"
      }
    }
  }
}

Then: restart your AI app and ask

MCP servers load at startup (in Claude Code, /mcp should show dataimpulse-scraper connected). Then just ask:

"Read https://www.zillow.com/homes/for_sale/ and list the first 5 homes." "Check our exit IP from Japan." "Compare what's trending on Reddit in the US, UK, and Japan."

That's it. 🎉 The first read_page downloads a headless browser (~1 min, one‑time, auto‑cached) — every run after is instant.

📓 More copy‑paste recipes with real output in EXAMPLES.md.


The tools your agent gets

Tool What it does When to use
read_page(url, country) Primary. Real browser → clean markdown (Crawl4AI) Any page, especially JS‑heavy / SPAs
read_page_from_regions(url, regions) The same page from many countries at once → {region: markdown} Compare prices / stock / content by region
fetch_html(url, country) Raw HTML / JSON, fast (no browser) Static pages, APIs
check_proxy(country) Exit IP + geolocation Prove the proxy / geo works

Country codes are 2‑letter ISO (us, de, jp, gb, …). Every request routes through a fresh residential IP in that country — just ask in plain language ("read this as a German visitor") and the agent picks the right tool.

What works vs what needs a login

read_page fetches public pages like a real browser. It does not log in or bypass auth walls.

✅ Works (public) ❌ Needs a login
Reddit, YouTube, Bluesky, Mastodon, news, e‑commerce, SERPs X/Twitter, Instagram, Facebook, LinkedIn

A residential proxy beats IP‑based blocking and geo‑walls — not authentication. For login‑gated platforms, use their official API.

Notes

  • Country targeting only — city/state/ZIP filters bill at , so stay country‑level.
  • Each call gets a fresh residential IP; change country for geo or to dodge IP rate‑limits.
  • If the headless browser ever fails to auto‑install: uvx --from dataimpulse-scraper playwright install chromium
  • Respect each site's Terms of Service and robots.txt. Scrape public data responsibly.
  • Security: credentials live in your MCP client's env block. Never commit proxy passwords — rotate them in the dashboard if exposed.

Local development (running from this folder)

Until the package is on PyPI — or when hacking on your own fork — run it straight from a checkout:

git clone https://github.com/sonnysangha/dataimpulse-scraper
cd dataimpulse-scraper
uv sync                                   # creates .venv, installs deps
cp .env.example .env                      # local dev only — add DI_USER / DI_PASS
uv run dataimpulse-scraper --selftest us  # prints a US exit IP → you're live

Then point your MCP config at the checkout instead of PyPI:

{
  "mcpServers": {
    "dataimpulse-scraper": {
      "command": "uvx",
      "args": [
        "--from",
        "/absolute/path/to/dataimpulse-scraper",
        "dataimpulse-scraper"
      ],
      "env": { "DI_USER": "your_login", "DI_PASS": "your_password" }
    }
  }
}

.env is only for local development (it's gitignored). End users never create one.

Maintainer? Publishing to PyPI (GitHub Action, trusted publishing, release loop) is covered step‑by‑step in PUBLISHING.md.


Built with DataImpulse residential proxies. Get a plan — $1/GB →

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataimpulse_scraper-1.0.0.tar.gz (318.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataimpulse_scraper-1.0.0-py3-none-any.whl (10.2 kB view details)

Uploaded Python 3

File details

Details for the file dataimpulse_scraper-1.0.0.tar.gz.

File metadata

  • Download URL: dataimpulse_scraper-1.0.0.tar.gz
  • Upload date:
  • Size: 318.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for dataimpulse_scraper-1.0.0.tar.gz
Algorithm Hash digest
SHA256 451640946c584fc7b144e07295067d5ab8f39454e29e59eaf0c8df34f5214c4a
MD5 d7e88029b110c191e063eb5679be0ee1
BLAKE2b-256 4a34d4328ceb5d37d9cf28d1e437094c41e5b38cb35eb4a18ea8dee79e26f771

See more details on using hashes here.

File details

Details for the file dataimpulse_scraper-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: dataimpulse_scraper-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 10.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for dataimpulse_scraper-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9ffc06856402fc56aa7d527e0cc5520fc387f62a5bf41cd113559230e90c8fbe
MD5 1df91cd82290fb92a2b58d07948a0119
BLAKE2b-256 bb6b35aedee8429e8be590ed4881ae46681192c27fdfa901635fb31edbc18a76

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page