Model Context Protocol server for the Grabba API.

These details have not been verified by PyPI

Project description

Grabba MCP Server

This repository contains the Grabba Model Context Protocol (MCP) server, designed to expose Grabba API functionalities as a set of callable tools. Built on FastMCP, this server allows AI agents, orchestrators (like LangChain), and other applications to seamlessly interact with the Grabba data extraction and management services.

Recommended: point your MCP client at the hosted instance at https://mcp.grabba.dev/ — no install required. The rest of this README covers self-hosting (PyPI, Docker) for users who need their own deployment.

Features
Connecting to the Hosted Instance
Self-Hosting
Configuration
- Environment Variables
- Command-Line Arguments
Available Tools
- Authentication
- Tool Details
Programmatic Clients
- Python Client (LangChain Example)
  - Streamable HTTP Transport
  - Stdio Transport (for Docker-in-Docker or specific use cases)
Development Notes
- Project Structure
- Running Tests
Links & Resources
License

Features

Grabba API Exposure: Exposes key Grabba API functionalities (data extraction, job management, statistics) as accessible tools.
Multiple Transports: Supports stdio, streamable-http, and sse transports, offering flexibility for different deployment and client scenarios.
Dependency Injection: Leverages FastAPI's robust dependency injection for secure and efficient GrabbaService initialization (e.g., handling API keys).
Containerized Deployment: Optimized for Docker for easy packaging and deployment.
Configurable: Allows configuration via environment variables and command-line arguments.

Connecting to the Hosted Instance

The Grabba MCP server is publicly available — most users do not need to install anything.

URL: https://mcp.grabba.dev/
Transports: streamable-http (recommended), sse for legacy clients.
Authentication: include your Grabba API key as the API_KEY HTTP header.

Claude Desktop

Edit ~/Library/Application Support/Claude/claude_desktop_config.json (macOS) or %APPDATA%\Claude\claude_desktop_config.json (Windows):

{
  "mcpServers": {
    "grabba": {
      "type": "streamable-http",
      "url": "https://mcp.grabba.dev/",
      "headers": { "API_KEY": "gk_live_..." }
    }
  }
}

Restart Claude Desktop and Grabba's tools will appear in the MCP palette.

Older Claude Desktop builds without native HTTP support can connect through the mcp-remote bridge:

{
  "mcpServers": {
    "grabba": {
      "command": "npx",
      "args": [
        "-y", "mcp-remote",
        "https://mcp.grabba.dev/",
        "--header", "API_KEY:gk_live_..."
      ]
    }
  }
}

Cursor

Settings → MCP → User:

{
  "mcpServers": {
    "grabba": {
      "url": "https://mcp.grabba.dev/",
      "headers": { "API_KEY": "gk_live_..." }
    }
  }
}

Smoke test

curl -i \
  -H "API_KEY: gk_live_..." \
  https://mcp.grabba.dev/

A 200 OK (or 406 Not Acceptable from the transport handshake) confirms the endpoint is reachable and your key was accepted at the edge.

Self-Hosting

For air-gapped environments, CI runners, or anyone who'd rather not depend on the public endpoint, the same server is published as a Python package (grabba) and a Docker image (itsobaa/grabba-mcp).

Prerequisites

Python 3.10+
Docker (for containerized deployment)
A Grabba API Key (you can get one from the Grabba website)

Installation

Via PyPI

The grabba-mcp package is available on PyPI. This is the simplest way to get started.

pip install grabba-mcp

From Source (Development)

If you plan to contribute or modify the server, you'll want to install from source.

Clone the repository:

git clone https://github.com/grabba-dev/grabba-mcp
cd grabba-mcp

Install Poetry: If you don't have Poetry installed, follow their official guide:
```
pip install poetry
```
Install project dependencies: Navigate to the apps/mcp directory where pyproject.toml resides, then install:
```
cd apps/mcp
poetry install
```

Running the Server

Locally

After installation (either via pip or from source), you can run the server.

Create a .env file: In the apps/mcp directory (if running from source) or the directory from which you'll execute the grabba-mcp command, create a .env file and add your Grabba API key:

API_KEY="YOUR_API_KEY_HERE"
# Optional: configure the server port
PORT=8283
# Optional: configure the default transport (overridden by CLI)
MCP_SERVER_TRANSPORT="streamable-http"

Execute the server:
- If installed via pip:
```
grabba-mcp
```
  To specify a transport via command line:
```
grabba-mcp streamable-http
```
- If running from source (using Poetry):
```
cd apps/mcp
poetry run python src/server.py
```
  To specify a transport via command line:
```
poetry run python src/server.py stdio
```
You should see output indicating the server is starting and listening on the specified port (e.g., http://0.0.0.0:8283) if using HTTP transports. Note that the stdio transport will exit after a single request/response cycle, making it unsuitable for persistent services.

Docker Container

A pre-built Docker image is available on Docker Hub, making deployment straightforward.

Pull the image:
```
docker pull itsobaa/grabba-mcp:latest
```

Run the container: For a persistent server, you'll typically use the streamable-http transport and map ports.

docker run -d \
  -p 8283:8283 \
  -e API_KEY="YOUR_API_KEY_HERE" \
  -e MCP_SERVER_TRANSPORT="streamable-http" \
  itsobaa/grabba-mcp:latest

You can also use docker-compose for more complex setups:

# docker-compose.yml
version: '3.8'
services:
  grabba-mcp:
    image: itsobaa/grabba-mcp:latest
    container_name: grabba-mcp
    environment:
      API_KEY: ${API_KEY} # Reads from a .env file next to docker-compose.yml
      MCP_SERVER_TRANSPORT: streamable-http
      PORT: 8283
    ports:
      - "8283:8283"
    healthcheck:
      test: ["CMD-SHELL", "curl -f http://localhost:8283/tools/openapi.json || exit 1"]
      interval: 10s
      timeout: 5s
      retries: 5

With a docker-compose.yml file, create a .env file next to it (e.g., API_KEY="YOUR_API_KEY_HERE") and run:

docker-compose up -d

Configuration

The server can be configured via environment variables and command-line arguments.

Environment Variables

API_KEY (Required): Your Grabba API key. This is critical for authenticating with Grabba services.
PORT (Optional, default: 8283): The port on which the MCP server's HTTP transports (streamable-http, sse) will listen.
MCP_SERVER_TRANSPORT (Optional, default: stdio): The default transport protocol for the MCP server. Can be stdio, streamable-http, or sse.

Command-Line Arguments

The server also accepts a single positional command-line argument which overrides MCP_SERVER_TRANSPORT:

grabba-mcp [transport_protocol]
# or for source: python src/server.py [transport_protocol]

[transport_protocol]: Can be stdio, streamable-http, or sse.
- Example: grabba-mcp streamable-http

Available Tools

The Grabba MCP Server exposes a suite of tools that wrap the Grabba Python SDK functionalities.

Authentication

For streamable-http and sse transports, authentication is performed by including an API_KEY HTTP header with your Grabba API Key. Example: API_KEY: YOUR_API_KEY_HERE

For stdio transport, the API_KEY environment variable must be set in the environment where the grabba-mcp command is executed, as there are no HTTP headers in this communication mode.

Tool Details

`extract_data`

Description: Schedules a new data extraction job with Grabba. Suitable for web search tasks.
Input: Job object (Pydantic model) detailing the extraction tasks.
Output: tuple[str, Optional[Dict]] - A message and the JobResult as a dictionary.

`schedule_existing_job`

Description: Schedules an existing Grabba job to run immediately.
Input: job_id (string) - The ID of the existing job.
Output: tuple[str, Optional[Dict]] - A message and the JobResult as a dictionary.

`wait_for_job_completion`

Description: Waits for a job to reach a terminal state (completed, failed, or cancelled) using SSE first and polling fallback.
Input: job_id (string), optional timeout_seconds (int, default 240), optional poll_interval_seconds (int, default 5).
Output: tuple[str, Dict] - A message and a status payload:
- job_id
- status
- completed
- timed_out
- source (sse, fetch_specific_job, timeout)
- reason
- optional job_result_id

`fetch_all_jobs`

Description: Fetches all Grabba jobs for the current user.
Input: None.
Output: tuple[str, Optional[List[Job]]] - A message and a list of Job objects.

`fetch_specific_job`

Description: Fetches details of a specific Grabba job by its ID.
Input: job_id (string) - The ID of the job.
Output: tuple[str, Optional[Job]] - A message and the Job object.

`delete_job`

Description: Deletes a specific Grabba job.
Input: job_id (string) - The ID of the job to delete.
Output: tuple[str, None] - A success message.

`fetch_job_result`

Description: Fetches results of a completed Grabba job by its result ID.
Input: job_result_id (string) - The ID of the job result.
Output: tuple[str, Optional[Dict]] - A message and the job result data as a dictionary.

`delete_job_result`

Description: Deletes results of a completed Grabba job.
Input: job_result_id (string) - The ID of the job result to delete.
Output: tuple[str, None] - A success message.

`fetch_stats_data`

Description: Fetches usage statistics and current user token balance for Grabba.
Input: None.
Output: tuple[str, Optional[JobStats]] - A message and the JobStats object.

`estimate_job_cost`

Description: Estimates the cost of a Grabba job before creation or scheduling.
Input: Job object (Pydantic model) detailing the extraction tasks.
Output: tuple[str, Optional[Dict]] - A message and the estimated cost details as a dictionary.

`create_job`

Description: Creates a new data extraction job in Grabba without immediately scheduling it for execution.
Input: Job object (Pydantic model) detailing the extraction tasks.
Output: tuple[str, Optional[Job]] - A message and the created Job object.

`fetch_available_regions`

Description: Fetches a list of all available puppet (web agent) regions that can be used for scheduling web data extractions.
Input: None.
Output: tuple[str, Optional[List[PuppetRegion]]] - A message and a list of PuppetRegion objects.

Programmatic Clients

If you're building your own agent runtime (LangChain, custom Python orchestrator, etc.) you can talk to either the hosted instance or your self-hosted server with the same client config — just change the url.

Python Client (LangChain Example)

This example assumes you have the mcp-client package installed (often as part of a larger LangChain/Agent setup), along with grabba and pydantic.

import asyncio
import os
from typing import List, Dict, Optional
from langchain_core.tools import BaseTool, Tool
from mcp.models.mcp_server_config import McpServerConfig, McpServer
from mcp.client.transports.streamable_http import StreamableHttpConnection
from mcp.client.transports.stdio import StdioConnection
from mcp.client.multi_server_client import MultiServerMCPClient
from grabba import Job, JobStats, PuppetRegion # Import necessary Grabba Pydantic models
from dotenv import load_dotenv # For loading API key from .env

async def connect_and_use_mcp_tools(mcp_server_configs: List[McpServerConfig], api_key: Optional[str] = None) -> List[Tool]:
    """
    Connects to the MCP server(s), discovers its tools, and wraps them as LangChain Tools.
    Handles API key injection for HTTP connections.
    """
    try:
        mcp_client_config = {}
        for config in mcp_server_configs:
            # Pydantic V2 model validation
            mcp_server_model = McpServer.model_validate(config.mcp_server.model_dump())
            
            connection_headers = {}
            if api_key:
                # Use standard header name for API keys
                connection_headers["API_KEY"] = api_key 

            if mcp_server_model.transport == "streamable_http":
                server_params: StreamableHttpConnection = {
                    "transport": "streamable_http",
                    "url": str(mcp_server_model.url),
                    "env": config.env_variables or {}, # For other env variables, if any
                    "headers": connection_headers # Pass headers for HTTP transports
                }
            elif mcp_server_model.transport == "stdio":
                server_params: StdioConnection = {
                    "transport": "stdio",
                    "command": mcp_server_model.command, 
                    "args": mcp_server_model.args, 
                    "env": config.env_variables # For stdio, env maps to subprocess env vars
                }
            else:
                raise ValueError(f"Unsupported transport: {mcp_server_model.transport}")

            print(f"Client connecting with params: {server_params}")
            mcp_client_config[mcp_server_model.name] = server_params
        
        mcp_client = MultiServerMCPClient(mcp_client_config)
        tools: List[BaseTool] = await mcp_client.get_tools()
        print(f"Successfully loaded {len(tools)} tools.")
        return tools
    except Exception as e:
        print(f"Error connecting to MCP server or loading tools: {e}")
        return []


async def main():
    load_dotenv() # Load API key from a client-side .env file
    API_KEY = os.getenv("API_KEY", "YOUR_API_KEY_HERE_IF_NOT_ENV")

    # --- Configuration for Streamable HTTP Transport (Local or Public Instance) ---
    # For local: url="http://localhost:8283"
    # For public: url="https://mcp.grabba.dev/"
    http_mcp_config = McpServerConfig(
        mcp_server=McpServer(
            name="grabba-agent-http",
            transport="streamable_http",
            url="http://localhost:8283" # Or "https://mcp.grabba.dev/" for public
        )
    )

    print("\n--- Connecting via Streamable HTTP ---")
    http_tools = await connect_and_use_mcp_tools(
        mcp_server_configs=[http_mcp_config],
        api_key=API_KEY
    )

    if http_tools:
        print("\nAvailable HTTP Tools:")
        for tool in http_tools:
            print(f"- {tool.name}: {tool.description.split('.')[0]}.")
        
        # Example: Using the extract_data tool (adjust as per your Job Pydantic model)
        extract_tool = next((t for t in http_tools if t.name == "extract_data"), None)
        if extract_tool:
            print("\n--- Testing extract_data tool via HTTP ---")
            sample_job = Job(
                url="https://example.com/some-page",
                type="markdown", # or "pdf", "html" etc.
                parser="text-content",
                strategy="auto"
                # ... other required fields for Job
            )
            try:
                result_msg, result_data = await extract_tool.ainvoke({"extraction_data": sample_job})
                print(f"Extraction Result (HTTP): {result_msg}")
                if result_data:
                    print(f"Extraction Data (HTTP): {result_data.get('extracted_text', 'No text extracted')[:100]}...") # Print first 100 chars
            except Exception as e:
                print(f"Error calling extract_data via HTTP: {e}")
        else:
            print("extract_data tool not found in HTTP tools.")

        # Example: Using fetch_all_jobs tool
        fetch_jobs_tool = next((t for t in http_tools if t.name == "fetch_all_jobs"), None)
        if fetch_jobs_tool:
            print("\n--- Testing fetch_all_jobs tool via HTTP ---")
            try:
                result_msg, jobs_list = await fetch_jobs_tool.ainvoke({})
                print(f"Fetch Jobs Result (HTTP): {result_msg}")
                if jobs_list:
                    print(f"Fetched {len(jobs_list)} jobs.")
                    for job in jobs_list[:2]: # Print first 2 jobs
                        print(f"  - Job ID: {job.job_id}, URL: {job.url}")
            except Exception as e:
                print(f"Error calling fetch_all_jobs via HTTP: {e}")
        
        # Example: Using fetch_stats_data tool
        fetch_stats_tool = next((t for t in http_tools if t.name == "fetch_stats_data"), None)
        if fetch_stats_tool:
            print("\n--- Testing fetch_stats_data tool via HTTP ---")
            try:
                result_msg, stats_data = await fetch_stats_tool.ainvoke({})
                print(f"Fetch Stats Result (HTTP): {result_msg}")
                if stats_data:
                    print(f"Token Balance (HTTP): {stats_data.token_balance}")
                    print(f"Jobs Run (HTTP): {stats_data.jobs_run_count}")
            except Exception as e:
                print(f"Error calling fetch_stats_data via HTTP: {e}")

    # --- Configuration for Stdio Transport (e.g., to a Docker container running the server) ---
    # This assumes you have the 'itsobaa/grabba-mcp:latest' Docker image available.
    # The client launches a temporary Docker container for each tool call.
    stdio_mcp_config = McpServerConfig(
        mcp_server=McpServer(
            name="grabba-agent-stdio",
            transport="stdio",
            command="docker",
            args=[
                "run",
                "-i",          # Keep STDIN open for interactive communication
                "--rm",        # Remove container after exit
                "itsobaa/grabba-mcp:latest", # The Docker Hub image for Grabba MCP server
                "grabba-mcp", "stdio" # Command to run the server in stdio mode inside container
            ],
            env_variables={"API_KEY": API_KEY} # Pass API key as env var for stdio
        )
    )

    print("\n--- Connecting via Stdio (to Docker container as a subprocess) ---")
    stdio_tools = await connect_and_use_mcp_tools(
        mcp_server_configs=[stdio_mcp_config],
        api_key=API_KEY # Client might still pass for internal consistency, though env_variables is primary for stdio
    )

    if stdio_tools:
        print("\nAvailable Stdio Tools:")
        for tool in stdio_tools:
            print(f"- {tool.name}: {tool.description.split('.')[0]}.")
        
        # Example: Using the fetch_available_regions tool via Stdio
        fetch_regions_tool = next((t for t in stdio_tools if t.name == "fetch_available_regions"), None)
        if fetch_regions_tool:
            print("\n--- Testing fetch_available_regions tool via Stdio ---")
            try:
                result_msg, regions_list = await fetch_regions_tool.ainvoke({})
                print(f"Fetch Regions Result (Stdio): {result_msg}")
                if regions_list:
                    print(f"Fetched {len(regions_list)} regions.")
                    for region in regions_list[:3]: # Print first 3 regions
                        print(f"  - {region.display_name} ({region.code})")
            except Exception as e:
                print(f"Error calling fetch_available_regions via Stdio: {e}")
        else:
            print("fetch_available_regions tool not found in Stdio tools.")

if __name__ == "__main__":
    asyncio.run(main())

Development Notes

Project Structure

your_project_root/
├── src/
│   └── server.py               # Main FastMCP server application
├── .env                        # Environment variables for local development
├── pyproject.toml              # Poetry project configuration
└── poetry.lock                 # Poetry dependency lock file
├── Dockerfile                  # Docker build instructions for the server
├── docker-compose.yml          # Docker Compose configuration for local development/deployment
├── .dockerignore               # Files to ignore during Docker build
├── .env                        # Example .env for docker-compose (for API_KEY)
├── README.md                   # This documentation file
├── pyproject.toml              # Root pyproject.toml (if using monorepo structure)
├── poetry.lock                 # Root poetry.lock (if using monorepo structure)
├── src/                        # Source code (often for the root project if it's a monorepo)
├── tests/                      # Project tests
└── ... (other project files like dist, docs, tox.ini, project.json etc.)

Running Tests

To run tests (as configured by your pyproject.toml):

poetry run pytest

Links & Resources

Grabba Website: https://www.grabba.dev/
Grabba MCP Server Public Instance: https://mcp.grabba.dev/
GitHub Repository: https://github.com/grabba-dev/grabba-mcp
Docker Hub Image: https://hub.docker.com/r/itsobaa/grabba-mcp
PyPI Package: https://pypi.org/project/grabba-mcp/

License

This project is licensed under the Proprietary License. Please see the LICENSE file in the repository root for full details.

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.0.5

May 18, 2026

0.0.4

May 5, 2026

0.0.3

Jul 16, 2025

0.0.2

Jun 13, 2025

0.0.1

Jun 11, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

grabba_mcp-0.0.5.tar.gz (19.7 kB view details)

Uploaded May 18, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

grabba_mcp-0.0.5-py3-none-any.whl (14.5 kB view details)

Uploaded May 18, 2026 Python 3

File details

Details for the file grabba_mcp-0.0.5.tar.gz.

File metadata

Download URL: grabba_mcp-0.0.5.tar.gz
Upload date: May 18, 2026
Size: 19.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.1 CPython/3.12.3 Linux/6.17.0-23-generic

File hashes

Hashes for grabba_mcp-0.0.5.tar.gz
Algorithm	Hash digest
SHA256	`a979cce1aee933cef5154275514af0967c60b760b32ad2a8713d082011d766ec`
MD5	`12197eff534d0c9e2330edcb332786b0`
BLAKE2b-256	`bc6f1d9cb552150ebf6d41effd997362a49d13429e300c3b1dfaad5716b7bc52`

See more details on using hashes here.

File details

Details for the file grabba_mcp-0.0.5-py3-none-any.whl.

File metadata

Download URL: grabba_mcp-0.0.5-py3-none-any.whl
Upload date: May 18, 2026
Size: 14.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: poetry/2.1.1 CPython/3.12.3 Linux/6.17.0-23-generic

File hashes

Hashes for grabba_mcp-0.0.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`396287c981216c292dcfe3b7f34c05b30aeaa8ce73a1a7feb4698f3b415286d7`
MD5	`84053cbdb0648933d0982bc91bd305c7`
BLAKE2b-256	`0663c20abedec9a00e664ff942b81a03b67532dac5ba79c1f833c0c6be7299f4`

See more details on using hashes here.

grabba-mcp 0.0.5

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

Grabba MCP Server

Table of Contents

Features

Connecting to the Hosted Instance

Claude Desktop

Cursor

Smoke test

Self-Hosting

Prerequisites

Installation

Via PyPI

From Source (Development)

Running the Server

Locally

Docker Container

Configuration

Environment Variables

Command-Line Arguments

Available Tools

Authentication

Tool Details

extract_data

schedule_existing_job

wait_for_job_completion

fetch_all_jobs

fetch_specific_job

delete_job

fetch_job_result

delete_job_result

fetch_stats_data

estimate_job_cost

create_job

fetch_available_regions

Programmatic Clients

Python Client (LangChain Example)

Development Notes

Project Structure

Running Tests

Links & Resources

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`extract_data`

`schedule_existing_job`

`wait_for_job_completion`

`fetch_all_jobs`

`fetch_specific_job`

`delete_job`

`fetch_job_result`

`delete_job_result`

`fetch_stats_data`

`estimate_job_cost`

`create_job`

`fetch_available_regions`