MCP server for Source Cooperative auto-discovery and data exploration
Project description
Source Cooperative MCP Server
Discover and access 800TB+ of geospatial data through AI agents.
An MCP (Model Context Protocol) server for Source Cooperative - a collaborative repository with datasets from Maxar, Harvard, ESA, USGS, and 90+ organizations.
🏗️ Architecture Overview
graph TB
subgraph "AI Clients"
A1[Claude Desktop]
A2[Claude Code]
A3[Cursor]
A4[Cline]
A5[Zed]
A6[Continue.dev]
end
subgraph "MCP Server"
MCP[Source Cooperative MCP<br/>FastMCP + obstore]
end
subgraph "6 Available Tools"
T1[list_accounts<br/>94+ orgs]
T2[list_products<br/>hybrid S3+API]
T3[get_product_details<br/>+ README]
T4[list_product_files<br/>tree mode]
T5[get_file_metadata<br/>no download]
T6[search<br/>hybrid fuzzy]
end
subgraph "Data Sources"
S1[HTTP API<br/>source.coop/api]
S2[S3 Direct<br/>opendata.source.coop]
end
A1 -->|JSON-RPC| MCP
A2 -->|JSON-RPC| MCP
A3 -->|JSON-RPC| MCP
A4 -->|JSON-RPC| MCP
A5 -->|JSON-RPC| MCP
A6 -->|JSON-RPC| MCP
MCP --> T1
MCP --> T2
MCP --> T3
MCP --> T4
MCP --> T5
MCP --> T6
T1 --> S2
T2 --> S1
T2 --> S2
T3 --> S1
T3 --> S2
T4 --> S2
T5 --> S2
T6 --> S1
style MCP fill:#4CAF50,stroke:#2E7D32,stroke-width:3px,color:#fff
style S1 fill:#2196F3,stroke:#1976D2,stroke-width:2px,color:#fff
style S2 fill:#2196F3,stroke:#1976D2,stroke-width:2px,color:#fff
Key Features:
- ✅ Token Optimized - 72% reduction for large datasets
- ✅ Smart Partitions - Auto-detects Hive-style patterns
- ✅ Fuzzy Search - Handles typos and partial matches
- ✅ No Auth - All 800TB+ is public
🚀 Quick Start
Install
uvx source-coop-mcp
Configure Your AI Client
Claude Desktop / Claude Code / Cursor / Cline
Add to config file:
- Claude Desktop:
~/Library/Application Support/Claude/claude_desktop_config.json(macOS) - Claude Code: VS Code
settings.json - Cursor: Cursor settings
- Cline: Cline MCP settings
{
"mcpServers": {
"source-coop": {
"command": "uvx",
"args": ["source-coop-mcp"]
}
}
}
Zed
Add to Zed settings:
{
"context_servers": {
"source-coop": {
"command": "uvx",
"args": ["source-coop-mcp"]
}
}
}
Continue.dev
Add to Continue config (~/.continue/config.json):
{
"experimental": {
"modelContextProtocolServers": [
{
"transport": {
"type": "stdio",
"command": "uvx",
"args": ["source-coop-mcp"]
}
}
]
}
}
Restart your AI client and start exploring!
🛠️ Available Tools
| Tool | Purpose | Performance |
|---|---|---|
list_accounts() |
Find all 94+ organizations | ~850ms |
list_products() |
Hybrid: S3 mode (default) for ALL datasets + file counts | ~240ms |
list_products(include_unpublished=False) |
API mode for published datasets with rich metadata | ~500ms |
get_product_details() |
Get metadata + README automatically | ~650ms |
list_product_files() |
List files with S3/HTTP paths | ~240ms |
list_product_files(show_tree=True) |
Tree view (72% token savings) | ~980ms |
get_file_metadata() |
Get file info without downloading | ~230ms |
search(query) |
Hybrid: Search accounts + products (published + unpublished), top 5 results | ~5-10s |
💡 What You Can Do
Discover Data
"List all organizations in Source Cooperative"
→ Returns 94+ organizations: maxar, planet, harvard, etc.
"Find all datasets for harvard-lil"
→ Discovers published + unpublished products
"Search for climate datasets"
→ Smart fuzzy search handles typos and partial matches
Access Files
"List files in harvard-lil/gov-data"
→ Returns S3 paths and HTTP URLs ready for analysis
"Show me the file tree with partition detection"
→ Smart visualization: year={2020,2021,...+5 more}/ [partitioned]
"Get file metadata without downloading"
→ Size, last modified, ETag
Smart Search
"Search for climte" (typo)
→ Finds "climate" datasets (fuzzy matching)
"Search for geo" (partial)
→ Finds "geospatial", "geocoding", etc.
⚡ Features
| Feature | Description |
|---|---|
| Complete Discovery | Finds unpublished products the official API doesn't show |
| No Authentication | All 800TB+ data is public |
| Fast Performance | Rust-backed S3 client (9x faster than boto3) |
| Token Optimized | Tree mode: 72% token reduction for large datasets |
| Smart Partitions | Auto-detects patterns: year={2020,2021,...} |
| Fuzzy Search | Handles typos and partial matches |
| README Integration | Documentation automatically included |
| 800TB+ Data | 94+ organizations, geospatial datasets |
📋 Example Workflow
1. "List all organizations"
→ Get 94+ account names
2. "Show me all datasets from maxar"
→ Discover published + unpublished products
3. "Search for climate data"
→ Smart fuzzy search finds relevant datasets
4. "Get details for harvard-lil/gov-data"
→ Full metadata + README content
5. "List files in this dataset with tree view"
→ Token-optimized tree with partition detection
🎯 Why This Server?
Problem
Source Cooperative has 800TB+ of valuable data, but:
- Official API only shows published products
- No auto-discovery of organizations
- Requires knowing what you're looking for
Solution
This MCP server provides:
- ✅ Complete auto-discovery (published + unpublished)
- ✅ Smart search with fuzzy matching
- ✅ Direct S3 access for all files
- ✅ Token-optimized outputs (72% reduction)
- ✅ Smart partition detection (10-88% additional savings)
- ✅ README documentation included automatically
- ✅ No authentication required
📊 Performance
All operations complete in under 1 second:
list_accounts(): ~850ms (94+ organizations)
list_products(): ~240ms (S3 mode - ALL datasets + file counts)
list_products(include_unpublished=False): ~500ms (API mode - published with metadata)
list_product_files(): ~240ms (simple list)
list_product_files(tree=True): ~980ms (72% token savings)
get_file_metadata(): ~230ms (HEAD only)
search(query): ~5-10s (hybrid search - 1 recursive S3 scan, top 5 enriched)
Token Optimization Impact
| Dataset Size | Without Tree | With Tree | Saved |
|---|---|---|---|
| 10 files | 1,500 tokens | 415 tokens | 72.3% |
| 100 files | 15,000 tokens | 4,150 tokens | 72.3% |
| 1,000 files | 150,000 tokens | 41,500 tokens | 72.3% |
With partition detection (1,000 partitions): 88% total savings!
🔧 Requirements
- Python: 3.11 or higher
- Package Manager:
uv(installed automatically byuvx) - Operating Systems: macOS, Linux, Windows
🤝 Development
See DEVELOPMENT.md for:
- Architecture details
- Testing instructions
- Contributing guidelines
- Performance benchmarks
- Token optimization details
📝 Support
- Issues: GitHub Issues
📄 License
MIT License - see LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file source_coop_mcp-0.2.3.tar.gz.
File metadata
- Download URL: source_coop_mcp-0.2.3.tar.gz
- Upload date:
- Size: 125.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9e7b37172ae8cb2be3d3c3eb501b638236d07ef692dc8efb36c3e0f9c53acf99
|
|
| MD5 |
2b24571a0f29d60e247aae525f79b047
|
|
| BLAKE2b-256 |
c5ae611bf85d0c33379a3f38b503e5985355cc90d509ad5da73fbeaa5e6388cb
|
Provenance
The following attestation bundles were made for source_coop_mcp-0.2.3.tar.gz:
Publisher:
publish.yml on yharby/source-coop-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
source_coop_mcp-0.2.3.tar.gz -
Subject digest:
9e7b37172ae8cb2be3d3c3eb501b638236d07ef692dc8efb36c3e0f9c53acf99 - Sigstore transparency entry: 685720824
- Sigstore integration time:
-
Permalink:
yharby/source-coop-mcp@0448cdf62b13d16d1b813a1bc5dfb1aa34628c58 -
Branch / Tag:
refs/tags/v0.2.3 - Owner: https://github.com/yharby
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0448cdf62b13d16d1b813a1bc5dfb1aa34628c58 -
Trigger Event:
release
-
Statement type:
File details
Details for the file source_coop_mcp-0.2.3-py3-none-any.whl.
File metadata
- Download URL: source_coop_mcp-0.2.3-py3-none-any.whl
- Upload date:
- Size: 18.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c2dffa34ab32a7dfad8ed30d3442b5565132827d4391057d96a1db167422a400
|
|
| MD5 |
42bdf2218a907262dd8a76973e057382
|
|
| BLAKE2b-256 |
b760373a76d39dfae21c74ad780ff7afa623a0b2ba71fa105121546bc634fefa
|
Provenance
The following attestation bundles were made for source_coop_mcp-0.2.3-py3-none-any.whl:
Publisher:
publish.yml on yharby/source-coop-mcp
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
source_coop_mcp-0.2.3-py3-none-any.whl -
Subject digest:
c2dffa34ab32a7dfad8ed30d3442b5565132827d4391057d96a1db167422a400 - Sigstore transparency entry: 685720826
- Sigstore integration time:
-
Permalink:
yharby/source-coop-mcp@0448cdf62b13d16d1b813a1bc5dfb1aa34628c58 -
Branch / Tag:
refs/tags/v0.2.3 - Owner: https://github.com/yharby
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@0448cdf62b13d16d1b813a1bc5dfb1aa34628c58 -
Trigger Event:
release
-
Statement type: