Skip to main content

MCP server that exposes OnCrawl's API for use with Claude Code and Claude Desktop

Project description

OnCrawl MCP Server

PyPI version License: MIT

MCP server that exposes OnCrawl's API for use with Claude Code and Claude Desktop. Enables Claude to perform deep technical SEO analysis by querying crawl data, Google Search Console metrics, and crawl-over-crawl comparisons.

Features

  • 12 MCP Tools for comprehensive SEO analysis
  • Raw data access: Query pages, links, clusters, structured data with flexible OQL
  • Schema discovery: Claude learns available fields before querying
  • Aggregations: Group/count by any dimension for pattern detection
  • Full exports: No 10k limit for complete datasets
  • Crawl-over-crawl analysis: Track changes between crawls (new pages, status changes, etc.)
  • Google Search Console integration: 600+ GSC fields including clicks, impressions, CTR, position by device, brand/non-brand, and more
  • Google Analytics 4 integration: Session, user, and engagement metrics

What Makes This Powerful

OnCrawl combines crawl data with GSC/GA4 traffic data, enabling analyses like:

  • Orphan pages with traffic: Pages getting clicks from Google but not linked internally
  • Underlinked high-performers: Popular pages with weak internal linking
  • Low CTR opportunities: Pages with high impressions but poor click-through rates
  • 404s still in Google: Broken pages still appearing in search results
  • Deep pages with traffic: Buried content that Google values
  • Mobile vs desktop performance: Traffic breakdowns by device

Prerequisites

  • Python 3.11+
  • OnCrawl account with API access
  • OnCrawl API token (from your OnCrawl settings)

Installation

Option 1: Install from PyPI (Recommended)

pip install oncrawl-mcp-server

Then configure in Claude Desktop/Code (see Configuration).

Option 2: Install from Source

# Clone the repository
git clone https://github.com/Amaculus/oncrawl-mcp-server.git
cd oncrawl-mcp-server

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # Mac/Linux
.venv\Scripts\activate     # Windows

# Install dependencies
pip install -e .

Getting Your API Token

  1. Log into OnCrawl
  2. Go to Settings → API
  3. Create a new token with projects:read scope
  4. Copy the token

Configuration

For Claude Desktop

Windows: Edit %APPDATA%\Claude\claude_desktop_config.json Mac: Edit ~/Library/Application Support/Claude/claude_desktop_config.json

{
  "mcpServers": {
    "oncrawl": {
      "command": "python",
      "args": ["-m", "oncrawl_mcp_server.server"],
      "env": {
        "ONCRAWL_API_TOKEN": "your-api-token-here"
      }
    }
  }
}

For Claude Code (CLI)

# Add to your MCP config
claude mcp add oncrawl

# Or manually edit your config with:
{
  "mcpServers": {
    "oncrawl": {
      "command": "python",
      "args": ["-m", "oncrawl_mcp_server.server"],
      "env": {
        "ONCRAWL_API_TOKEN": "your-api-token-here"
      }
    }
  }
}

Important: Restart Claude Desktop/Code after changing the config.

Available Tools

Tool Purpose
oncrawl_list_projects List all projects in a workspace
oncrawl_get_project Get project details including crawl IDs and COC IDs
oncrawl_get_schema Call first - discover available fields for a crawl
oncrawl_search_pages Query pages with OQL filtering, sorting, pagination
oncrawl_search_links Query internal link graph
oncrawl_aggregate Group and count by any dimension with range support
oncrawl_export_pages Full export without 10k limit
oncrawl_search_clusters Find duplicate content clusters
oncrawl_search_structured_data Audit schema markup
oncrawl_get_coc_schema Discover fields for crawl-over-crawl comparison
oncrawl_search_coc Find what changed between two crawls
oncrawl_aggregate_coc Aggregate change patterns at scale

Usage Examples

Getting Started

"List my OnCrawl projects in workspace 5c015889451c956baf7ab7a9"

"Get the schema for crawl xyz789 - what fields are available?"

"Show me the first 10 pages from this crawl"

Technical SEO Analysis

"Find all pages at depth > 5 with fewer than 3 inlinks"

"Show me 404 pages that still have internal links pointing to them"

"What's the status code distribution for this crawl?"

"Find pages with missing meta descriptions"

GSC Integration

"Find orphan pages (0 internal links) that are getting clicks from Google"

"Show me pages with high impressions but low CTR (<2%)"

"Which pages buried deep in the site are getting significant traffic?"

"Compare mobile vs desktop traffic for the top landing pages"

Crawl-Over-Crawl Analysis

"Show me pages that changed status code between the last two crawls"

"Find pages that were added in the latest crawl"

"Which pages increased in depth between crawls?"

"Show me the status code distribution changes over time"

Detective Work

"I want you to act as a senior SEO analyst. Investigate crawl xyz789 for issues:
- Site structure problems
- Orphan page clusters
- Broken internal links
- Pages that should be linked better
- Anything else that looks problematic"
"Analyze this site for low-hanging SEO opportunities using both crawl and GSC data"

OQL Query Language

OnCrawl uses OQL (OnCrawl Query Language) for filtering. Here are the key operators:

Basic Operators

// Equals
{"field": ["status_code", "equals", 200]}

// Greater than / Less than
{"field": ["depth", "gt", "3"]}
{"field": ["follow_inlinks", "lt", "5"]}

// Contains
{"field": ["url", "contains", "/blog/"]}

// Starts with
{"field": ["urlpath", "startswith", "/products/"]}

// Has value / No value
{"field": ["canonical", "has_value", ""]}
{"field": ["canonical", "has_no_value", ""]}

Combining Filters

// AND
{
  "and": [
    {"field": ["status_code", "equals", 200]},
    {"field": ["depth", "gt", "3"]},
    {"field": ["follow_inlinks", "lt", "5"]}
  ]
}

// OR
{
  "or": [
    {"field": ["status_code", "equals", 301]},
    {"field": ["status_code", "equals", 404]}
  ]
}

// Nested combinations
{
  "and": [
    {"field": ["status_code", "equals", 200]},
    {
      "or": [
        {"field": ["depth", "gt", "5"]},
        {"field": ["follow_inlinks", "equals", 0]}
      ]
    }
  ]
}

Regex Support

{"field": ["urlpath", "startswith", "/blog/[0-9]{4}/", {"regex": true}]}

Common Field Names

Crawl Fields

  • url, urlpath, depth, status_code
  • follow_inlinks, follow_outlinks
  • title, description, h1, canonical
  • content_length, load_time
  • indexability, is_compliant

GSC Fields (600+ available)

  • gsc_clicks, gsc_impressions, gsc_ctr, gsc_position
  • gsc_clicks_device_mobile, gsc_clicks_device_desktop
  • gsc_clicks_brand, gsc_clicks_nonbrand
  • gsc_impressions_device_mobile, etc.

Google Analytics Fields

  • google_analytics_users_seo
  • google_analytics_sessions_seo
  • google_analytics_engaged_sessions_seo
  • google_analytics_engagement_rate_seo

Pro tip: Always call oncrawl_get_schema first to see exactly which fields are available for your specific crawl.

GSC Integration

OnCrawl automatically integrates with Google Search Console when connected to your account. The GSC fields will appear in the schema if integration is active.

How it works:

  • If GSC is connected: 600+ GSC fields are available in the schema
  • If GSC is not connected: GSC fields won't appear in the schema
  • Querying GSC fields without integration returns a 400 error
  • Fields with no data return 0 or null (not an error)

Detection: Check the schema first with oncrawl_get_schema to see if GSC fields are present.

Troubleshooting

"ONCRAWL_API_TOKEN environment variable required"

  • Make sure the token is set in the env block of your MCP config
  • Restart Claude Desktop/Code after changing the config

"Unknown field" errors

  • Call oncrawl_get_schema first to see available fields
  • Field names are case-sensitive
  • GSC fields only appear if GSC integration is active

API rate limits

  • OnCrawl API has rate limits
  • If you hit 429 errors, slow down requests
  • Use exports for large datasets instead of pagination

Tool not appearing in Claude

  • Verify Python path in config is correct
  • Check that oncrawl-mcp-server is installed
  • Look at Claude's logs for MCP connection errors
  • Restart Claude after config changes

Permission errors

  • Verify API token has projects:read scope
  • Check workspace/project/crawl IDs are correct

Development

Running Tests

# Set your API token
export ONCRAWL_API_TOKEN="your-token"

# Run the server directly
python -m oncrawl_mcp_server.server

# Test with a specific project
python test_full_mcp.py

Building from Source

# Install build tools
pip install build twine

# Build the package
python -m build

# Install locally
pip install -e .

Version History

0.2.0 (Latest)

  • Added 3 crawl-over-crawl (COC) tools
  • Total of 12 MCP tools
  • Full GSC integration documentation

0.1.0

  • Initial release with 9 core tools
  • Basic OnCrawl API integration

Contributing

Contributions welcome! Please feel free to submit a Pull Request.

License

MIT License - see LICENSE file for details.

Links

Author

Antonio - GitHub

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oncrawl_mcp_server-0.2.0.tar.gz (14.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

oncrawl_mcp_server-0.2.0-py3-none-any.whl (12.2 kB view details)

Uploaded Python 3

File details

Details for the file oncrawl_mcp_server-0.2.0.tar.gz.

File metadata

  • Download URL: oncrawl_mcp_server-0.2.0.tar.gz
  • Upload date:
  • Size: 14.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.2

File hashes

Hashes for oncrawl_mcp_server-0.2.0.tar.gz
Algorithm Hash digest
SHA256 5e2b6a751dcc8e25e54e43ddad67f1db772a2860e57d51ca32cda27940bf4e8f
MD5 fcff000a36a56dde6b50ecc039ad8b13
BLAKE2b-256 b9db68d07077afe06e26d6849902725273b9622c622d76749270855693eb5ae7

See more details on using hashes here.

File details

Details for the file oncrawl_mcp_server-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for oncrawl_mcp_server-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fbb86d14ea2f7a560dd1a247e5379257d87e6c8994d164568fe3142ba3170faf
MD5 526cc3263c1ba42d9b72c7311eebcb75
BLAKE2b-256 c1c45f17376ce3f0937a3407977d6f2fd1949aaa7d050d2fe6feb574d8074d8b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page