MCP server for Source Cooperative auto-discovery and data exploration

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Youssef-Harby

These details have not been verified by PyPI

Project description

Source Cooperative MCP Server

Discover and access 800TB+ of geospatial data through Claude and other AI agents.

An MCP (Model Context Protocol) server providing complete auto-discovery and data exploration for Source Cooperative - a collaborative open data repository hosting datasets from organizations like Maxar, Harvard Library, ESA, and USGS.

Architecture

graph TB
    Client[Claude Desktop / AI Agent]
    Server[Source Cooperative MCP Server<br/>8 Tools + obstore]
    API[HTTP API<br/>source.coop/api/v1<br/>Published Products Only]
    S3[S3 Bucket<br/>us-west-2.opendata.source.coop<br/>All Products + Files]

    Client <-->|JSON-RPC| Server
    Server -->|Rich Metadata| API
    Server -->|Complete Discovery| S3

    style Server fill:#4CAF50,stroke:#2E7D32,stroke-width:3px,color:#fff
    style S3 fill:#FF9800,stroke:#F57C00,stroke-width:2px,color:#fff
    style API fill:#2196F3,stroke:#1976D2,stroke-width:2px,color:#fff
    style Client fill:#9C27B0,stroke:#7B1FA2,stroke-width:2px,color:#fff

Why This Matters

Source Cooperative contains 800TB+ of valuable geospatial datasets, but discovering what's available requires knowing what to look for. This MCP server solves that by:

Auto-discovering all 92+ organizations and their datasets
Finding hidden products not visible in the official API
Providing analysis-ready S3 paths for immediate data access
No authentication required - all data is public

Quick Install

Option 1: uvx (Recommended)

Install directly for use with Claude Desktop:

# From PyPI (once published)
uvx source-coop-mcp

# Or from GitHub
uvx install git+https://github.com/yharby/source-coop-mcp.git

Add to Claude Desktop (~/Library/Application Support/Claude/claude_desktop_config.json):

{
  "mcpServers": {
    "source-coop": {
      "command": "uvx",
      "args": ["source-coop-mcp"],
      "env": {
        "SOURCE_COOP_INCLUDE_README": "true"
      }
    }
  }
}

Environment Variables:

SOURCE_COOP_INCLUDE_README: Set to "true" to automatically include README content in all get_product_details() responses (default: "false")
- When enabled, every product details call will include a readme field with markdown content from the product root
- When disabled (default), README is not included unless explicitly requested via include_readme=True parameter

Restart Claude Desktop and you're ready!

Option 2: Development Install

For contributing or local development:

git clone https://github.com/yharby/source-coop-mcp.git
cd source-coop-mcp
uv sync

Add to Claude Desktop config:

{
  "mcpServers": {
    "source-coop": {
      "command": "uv",
      "args": [
        "--directory",
        "/absolute/path/to/source-coop-mcp",
        "run",
        "src/source_coop_mcp/server.py"
      ],
      "env": {
        "SOURCE_COOP_INCLUDE_README": "true"
      }
    }
  }
}

What You Can Do

Discover Data

List all organizations in Source Cooperative
→ Returns 92+ accounts including maxar, planet, harvard-lil, etc.

Find all datasets under the "vida" organization
→ Returns published products with titles and descriptions

Show me ALL products for youssef-harby (including unpublished)
→ Returns 5 products (API only shows 3!)

Access Files

List all files in harvard-lil/gov-data
→ Returns file listings with S3 URIs and HTTP URLs

Get metadata for youssef-harby/exiobase-3/goose-agent.yaml
→ Returns size, last modified, ETag without downloading

Show me the README for harvard-lil/gov-data
→ Returns markdown documentation

Search & Filter

Find all datasets about "climate" in the harvard-lil account
→ Fast search within specific account, returns ranked results

Show me all featured/curated datasets
→ Returns only datasets marked as featured by Source Cooperative

Performance Tip: Always provide account_id for fast searches (~500ms). Searching all accounts takes 30-60s.

Features

Complete Discovery

Unlike the Source Cooperative web UI, this server discovers:

✅ Published products (visible in API)
✅ Unpublished products (only in S3)
✅ All 92+ organizations
✅ Complete file inventories

Hybrid Architecture

Uses the best of both worlds:

HTTP API for rich metadata (titles, descriptions, dates)
S3 Direct for complete discovery and file access
Rust-backed obstore for faster S3 operations

Key Capabilities

Capability	Details
Organizations	92+ accounts (Maxar, Planet, Harvard, ESA, USGS, etc.)
Datasets	800TB+ of geospatial data
Performance	9x faster than traditional S3 clients
Authentication	None required - all data is public
Unpublished Data	Discovers products not visible in API

8 Available Tools

Discovery

Tool	Purpose	Example
`list_accounts()`	Find all 92+ organizations	~100ms
`list_products(account_id?, featured_only?)`	List published datasets	200-500ms
`list_products_from_s3(account_id, include_file_count?)`	List ALL datasets (including unpublished)	1-3s

Product Info

Tool	Purpose
`get_product_details(account_id, product_id, include_readme?)`	Get full metadata with optional README content from product root

README Integration:

Set include_readme=True to include README markdown content in the response
Or set SOURCE_COOP_INCLUDE_README=true env var to always include README
README is fetched from product root directory (case-insensitive: README.md, readme.md, etc.)
Returns readme field with: {found, content, size, path, filename, last_modified, url}

File Operations

Tool	Purpose
`list_product_files(account_id, product_id, prefix?, max_files?)`	List files with S3/HTTP paths
`get_file_metadata(path)`	Get file info without downloading

Search

Tool	Purpose
`search_products(query, account_id?, search_in?)`	Search datasets by keywords
`get_featured_products()`	Get curated/highlighted datasets

Important: Published vs Unpublished

Source Cooperative has a key distinction:

Published Products (API visible):

Have titles, descriptions, metadata
Appear in web UI and API
Example: harvard-lil/gov-data

Unpublished Products (S3 only):

Uploaded to S3 but not registered in database
Files accessible but no API metadata
Not discoverable via HTTP API
Example: youssef-harby/exiobase-3

This is why we provide TWO discovery tools:

# HTTP API - Only published (3 products)
list_products("youssef-harby")

# S3 Direct - Everything (5 products)
list_products_from_s3("youssef-harby")

Always use list_products_from_s3() for complete discovery.

Example Usage

Get Product Details with README

Get details for fused/overture with README content

The MCP server will:

Fetch product metadata from API (title, description, account info)
Search for README.md in product root
Include full markdown content in response

Response includes:

{
  "title": "Overture Maps - Fused-partitioned",
  "description": "...",
  "readme": {
    "found": true,
    "content": "# Overture - Fused-partitioned\n\n## Overview...",
    "size": 3448,
    "path": "fused/overture/README.md",
    "url": "https://data.source.coop/fused/overture/README.md"
  }
}

Find All Data for an Organization

Show me all products under "maxar" including file counts

The MCP server will:

Scan S3 directly for all products
Count files in each product
Return complete inventory

Access Unpublished Product

List files in youssef-harby/exiobase-3

Even though this product returns 404 from the API, the MCP server can still:

Access files via S3 direct
Return full file listings with URLs
Get file metadata

Search for Datasets

Find all datasets about "climate" in the harvard-lil account

The MCP server will:

Query published products from the account
Search titles, descriptions, and product IDs
Return ranked results by relevance score

Performance Note: Always specify account_id for fast searches (~500ms). Searching all accounts takes 30-60s.

Technology Stack

FastMCP 2.12.5+ - MCP server framework with lifecycle management
obstore 0.8.2+ - Rust-backed S3 client (faster than boto3)
httpx 0.28.1+ - Async HTTP client for API calls
Python 3.12+ - Modern Python with full async support

Performance

Operation	Latency	Notes
List accounts	~100ms	Full S3 bucket scan
List products (HTTP API)	200-500ms	Single account
List products (S3 direct)	1-3s	With file counts
Search products	200-500ms	With account_id specified
Search products (all)	30-60s	Without account_id (all 92 accounts)
List files (1000 files)	500ms-2s	obstore performance
File metadata	~150ms	S3 head operation
README fetch	+200-300ms	Added to product details when enabled

Why so fast?

obstore uses Rust internally with Apache Arrow format:

9x throughput vs boto3 for concurrent operations
40% memory reduction
Zero-copy data structures

Development

Run Tests

# Test S3 discovery
uv run python tests/test_s3_discovery.py

# Compare API vs S3 results
uv run python tests/test_api_vs_obstore.py

# Test README integration
uv run python tests/test_simplified_readme_integration.py

# Test search functionality
uv run python tests/test_search.py

# Test obstore basics
uv run python tests/test_obstore.py

Code Quality

# Format code
uv run ruff format .

# Lint
uv run ruff check .

# Auto-fix issues
uv run ruff check --fix .

Test with MCP Inspector

npx @modelcontextprotocol/inspector uv run src/source_coop_mcp/server.py

Opens a web interface to test all tools interactively.

Documentation

ARCHITECTURE.md - System architecture, data flow diagrams, use cases
SOURCE_COOP_API.md - Complete API reference with curl examples and schemas
PUBLISHING.md - Guide for publishing to PyPI using Trusted Publishing
RELEASE_CHECKLIST.md - Quick release checklist
.github/workflows/README.md - CI/CD workflow documentation with diagrams

Troubleshooting

Server Not Appearing in Claude Desktop

Check config syntax is valid JSON
Restart Claude Desktop completely
Check logs in Claude Desktop developer tools

Products Not Found

Some products may be unpublished. Try:

Use list_products_from_s3 instead of list_products

This scans S3 directly and finds ALL products.

Slow Performance

For faster results:

# Instead of listing all accounts
list_products()  # 30-60s for all 92 accounts

# Filter by account
list_products(account_id="maxar")  # 200-500ms

Requirements

Python: 3.12 or higher
uv: Modern Python package manager (auto-installed by uvx)
Claude Desktop: For MCP integration (optional)

License

MIT

Contributing

Contributions welcome! Please:

Read docs/ARCHITECTURE.md first
Run tests and linting before submitting
Keep documentation updated

Support

Issues: GitHub Issues
Docs: See docs/ directory
Source Cooperative: source.coop

Credits

Built with:

FastMCP - MCP server framework
obstore - Object storage client
Source Cooperative - Open geospatial data repository

Discover. Access. Analyze. Start exploring 800TB+ of open geospatial data through AI.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

Youssef-Harby

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.6

Nov 9, 2025

0.2.5

Nov 9, 2025

0.2.4

Nov 9, 2025

0.2.3

Nov 9, 2025

0.2.1

Oct 22, 2025

0.2.0

Oct 22, 2025

0.1.6

Oct 22, 2025

0.1.5

Oct 22, 2025

0.1.4

Oct 22, 2025

0.1.3

Oct 20, 2025

This version

0.1.2

Oct 19, 2025

0.1.1

Oct 18, 2025

0.1.0

Oct 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

source_coop_mcp-0.1.2.tar.gz (94.1 kB view details)

Uploaded Oct 19, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

source_coop_mcp-0.1.2-py3-none-any.whl (13.2 kB view details)

Uploaded Oct 19, 2025 Python 3

File details

Details for the file source_coop_mcp-0.1.2.tar.gz.

File metadata

Download URL: source_coop_mcp-0.1.2.tar.gz
Upload date: Oct 19, 2025
Size: 94.1 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for source_coop_mcp-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`2ebc531104480480c1219e870170624859b25ea1be3adb30c3c08ddc90b2a9e3`
MD5	`30b0c2e102490620e7c57ad78e45b732`
BLAKE2b-256	`d872bfd70b7ffb21bb8c03cefb360c7393544f9682fb5a2e4582ec8c1d133e91`

See more details on using hashes here.

Provenance

The following attestation bundles were made for source_coop_mcp-0.1.2.tar.gz:

Publisher: publish.yml on yharby/source-coop-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: source_coop_mcp-0.1.2.tar.gz
- Subject digest: 2ebc531104480480c1219e870170624859b25ea1be3adb30c3c08ddc90b2a9e3
- Sigstore transparency entry: 622527120
- Sigstore integration time: Oct 19, 2025
Source repository:
- Permalink: yharby/source-coop-mcp@8fd4c813e58e74993b2cbe24697c53ed4f4da8fd
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/yharby
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@8fd4c813e58e74993b2cbe24697c53ed4f4da8fd
- Trigger Event: release

File details

Details for the file source_coop_mcp-0.1.2-py3-none-any.whl.

File metadata

Download URL: source_coop_mcp-0.1.2-py3-none-any.whl
Upload date: Oct 19, 2025
Size: 13.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for source_coop_mcp-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`580deee2e94dcbba88e06aa8f8e97dfa1603193ca8a7e1255603a2eda2ff6756`
MD5	`0a6c78d91f8c09e22412c5b090a7fc3f`
BLAKE2b-256	`9b8d41c46d8d73eae19046bf1fe7056247f43aa25274389ff6c983b0cbee689d`

See more details on using hashes here.

Provenance

The following attestation bundles were made for source_coop_mcp-0.1.2-py3-none-any.whl:

Publisher: publish.yml on yharby/source-coop-mcp

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: source_coop_mcp-0.1.2-py3-none-any.whl
- Subject digest: 580deee2e94dcbba88e06aa8f8e97dfa1603193ca8a7e1255603a2eda2ff6756
- Sigstore transparency entry: 622527125
- Sigstore integration time: Oct 19, 2025
Source repository:
- Permalink: yharby/source-coop-mcp@8fd4c813e58e74993b2cbe24697c53ed4f4da8fd
- Branch / Tag: refs/tags/v0.1.2
- Owner: https://github.com/yharby
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@8fd4c813e58e74993b2cbe24697c53ed4f4da8fd
- Trigger Event: release

source-coop-mcp 0.1.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

Source Cooperative MCP Server

Architecture

Why This Matters

Quick Install

Option 1: uvx (Recommended)

Option 2: Development Install

What You Can Do

Discover Data

Access Files

Search & Filter

Features

Complete Discovery

Hybrid Architecture

Key Capabilities

8 Available Tools

Discovery

Product Info

File Operations

Search

Important: Published vs Unpublished

Example Usage

Get Product Details with README

Find All Data for an Organization

Access Unpublished Product

Search for Datasets

Technology Stack

Performance

Development

Run Tests

Code Quality

Test with MCP Inspector

Documentation

Troubleshooting

Server Not Appearing in Claude Desktop

Products Not Found

Slow Performance

Requirements

License

Contributing

Support

Credits

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance