Skip to main content

The Open-Source Geospatial Intelligence Protocol & Engine. Give any AI model a spatial mind.

Project description

GeoSpark

The Open-Source Geospatial Intelligence Protocol & Engine

Give any AI model a spatial mind. Open source. Run anywhere.

CI PyPI Python License


Current LLMs fail at spatial reasoning — achieving 0% on geodesic distance computation and ~48% (random chance) on topological reasoning across five model families in our benchmarks. GeoSpark fixes this.

The Problem

Ask any LLM: "Is the Louvre inside the 7th arrondissement of Paris?"

It will confidently guess — and get it wrong most of the time. LLMs have no geometric engine, no coordinate system awareness, and no way to verify spatial claims. They hallucinate distances, confuse containment with proximity, and silently swap lat/lon.

The Solution

GeoSpark gives AI models ground-truth spatial reasoning through a standardized protocol:

from geospark import Engine
from geospark.protocol import SpatialQuery, SpatialOperation

engine = Engine(tools=["geocoder", "terrain"])

# Geocode a location (not guessing — real coordinates)
result = engine.execute(SpatialQuery(
    operation=SpatialOperation.GEOCODE,
    metadata={"query": "Eiffel Tower, Paris"}
))

# Check spatial relationships (100% accurate, not LLM guessing)
from geospark.engine.spatial_reasoner import SpatialReasoner

park = {"type": "Polygon", "coordinates": [[[2.29, 48.85], [2.30, 48.85], [2.30, 48.86], [2.29, 48.86], [2.29, 48.85]]]}
point = {"type": "Point", "coordinates": [2.295, 48.855]}

SpatialReasoner.check_relationship(park, point, "contains")  # True — ground truth

Key Features

  • Autonomous Spatial Agents — Give a goal, get a complete analysis. No manual step-by-step. Three built-in agents:
    • GeoAgent — Multi-step spatial analysis from natural language ("Find hospitals near the Eiffel Tower")
    • SpatialReport — One-command location intelligence dossier (amenities, accessibility, elevation, narrative)
    • SiteSelector — Optimal location finding with multi-criteria scoring ("Best pharmacy spot in Zurich near hospitals and schools")
  • Spatial Reasoning Engine — Topology, geodesic distance, CRS transforms, buffering, area. All geometrically correct, not LLM-guessed.
  • MCP Server — 6 tools for Claude Desktop and any MCP-compatible AI assistant. pip install geospark-ai[mcp] && geospark-mcp
  • GeoSpark Bench — 535 benchmark questions, 5 LLM families evaluated. LLMs score 0% on distance; with GeoSpark tools, 70%. Results →
  • GeoSpark Protocol (GSP) — Standardized JSON protocol for spatial queries and results.
  • Pluggable Tools — Geocoding, satellite imagery (STAC), terrain/elevation with vertical datum awareness (NAVD88/EGM96/WGS84), routing, spectral indices.
  • GeoSpark Flows — DAG-based workflow automation with CLI and REST API.
  • Spatial Knowledge Graph — Entity-relation graph with OSM admin boundaries, BFS traversal, and natural language queries.
  • Zero-Cost Stack — Local Ollama (primary, no limits) + OpenRouter free tier (fallback, ~200 req/day) + Supabase free tier (500MB). When rate-limited, GeoSpark returns clear error messages with fallback suggestions.

Quick Start

pip install geospark-ai

Autonomous Agents (the fastest way to use GeoSpark)

from geospark.agents import GeoAgent, SpatialReport, SiteSelector

# Autonomous spatial analysis — plans and executes multi-step workflows
agent = GeoAgent()
result = agent.run("Find all hospitals within 2km of the Eiffel Tower")
print(result.summary)  # "Within 2km of the Eiffel Tower, there are 3 hospitals..."

# Location intelligence dossier — one command, complete analysis
reporter = SpatialReport()
report = reporter.analyze("Federation Square, Melbourne")
print(report.accessibility)  # Nearest hospital, school, pharmacy with distances

# Optimal site selection — multi-criteria spatial scoring
selector = SiteSelector()
result = selector.find(within="Zurich", near=["hospital", "school"], facility_type="pharmacy")
print(result.best)  # Best-scoring location with explanation

As a Python library

from geospark.engine.spatial_reasoner import SpatialReasoner

# Distance calculation (geodesic, not Euclidean)
SpatialReasoner.calculate_distance(
    {"type": "Point", "coordinates": [2.2945, 48.8584]},   # Eiffel Tower
    {"type": "Point", "coordinates": [2.3376, 48.8606]},   # Louvre
)
# Returns: ~3,300 meters (actual geodesic distance)

# Spatial relationship check (ground-truth, not LLM-guessed)
SpatialReasoner.check_relationship(polygon_a, polygon_b, "intersects")

As an MCP Server (for Claude Desktop)

pip install geospark-ai[mcp]
geospark-mcp  # Starts stdio MCP server with 6 spatial tools

Add to your Claude Desktop config (~/.claude/claude_desktop_config.json):

{
  "mcpServers": {
    "geospark": { "command": "geospark-mcp" }
  }
}

Natural language spatial questions

from geospark import Engine

engine = Engine(tools=["geocoder", "terrain"])
result = engine.ask("How far is the Eiffel Tower from Big Ben?")
print(result.spatial_context.summary)
# Automatically geocodes both locations + computes geodesic distance

Tries local Ollama first (free, fast), falls back to OpenRouter.

CLI

# Autonomous agents
geospark agent "Find all parks within 2km of Big Ben"
geospark report "Federation Square, Melbourne"
geospark site-select --within "Paris" --near "metro,schools" --facility restaurant

# Spatial tools
geospark geocode "Tokyo Tower, Japan"
geospark elevation 35.6586 139.7454
geospark distance 48.8566 2.3522 51.5074 -- -0.1278  # Paris → London
geospark ask "Is Tokyo closer to Seoul or Beijing?"

Try the Live API (no install needed)

Explore all 28 endpoints interactively at geospark.terrascout.app/docs

# Quick test
curl -X POST https://geospark.terrascout.app/api/v1/distance \
  -H "Content-Type: application/json" \
  -d '{"lat_a": 48.8566, "lon_a": 2.3522, "lat_b": 51.5074, "lon_b": -0.1278}'

Run the Benchmark

# Run GeoSpark Bench on topological reasoning
python -m geospark.bench run --benchmark geotopo

# Run all benchmarks
python -m geospark.bench run

# List available benchmarks
python -m geospark.bench list

Architecture

┌─────────────────────────────────────────────────┐
│                   User / LLM                    │
│         (Claude, ChatGPT, Ollama, ...)          │
└──────────┬──────────────────────┬───────────────┘
           │ MCP                  │ REST API
           v                      v
┌──────────────────────────────────────────────────┐
│              GeoSpark Protocol (GSP)             │
│         Standardized JSON query/result           │
└──────────┬───────────────────────────────────────┘
           │
           v
┌──────────────────────────────────────────────────┐
│             Spatial Reasoning Engine              │
│  Topology · Distance · CRS · Buffer · Centroid   │
│  Planner · Cache · Temporal · Aggregator         │
└──────────┬───────────────────────────────────────┘
           │
    ┌──────┴──────┬──────────┬──────────┬──────────┐
    v             v          v          v          v
┌────────┐ ┌──────────┐ ┌────────┐ ┌────────┐ ┌────────┐
│Geocoder│ │Satellite │ │Terrain │ │Routing │ │Change  │
│        │ │(STAC,    │ │(Elev.) │ │(OSRM)  │ │Detect. │
│        │ │NDVI, EVI)│ │        │ │        │ │        │
└────────┘ └──────────┘ └────────┘ └────────┘ └────────┘
           │
    ┌──────┴──────┬──────────┬──────────┐
    v             v          v          v
┌────────┐ ┌──────────┐ ┌────────┐ ┌────────┐
│ Flows  │ │Knowledge │ │Plugins │ │Spatial │
│(DAG    │ │Graph     │ │(Commun │ │RAG     │
│Runner) │ │(BFS,NL)  │ │ity)   │ │        │
└────────┘ └──────────┘ └────────┘ └────────┘

Benchmark Results

GeoSpark Bench v1.0 — 535 questions across 5 benchmarks, evaluated on 5 LLM families (Qwen, Llama, Gemma, Mistral, Phi) via Ollama.

Baseline: LLM Alone (No Tools)

Benchmark Qwen 2.5 7B Llama 3.1 8B Gemma 2 9B Mistral 7B Phi-3.5 3.8B Mean
GeoDistance 0% 0% 30% 0% 0% 6%
GeoTopo 45% 50% 50% 50% 45% 48%
GeoChange 90% 65% 80% 85% 75% 79%
GeoReason 85% 65% 90% 75% 70% 77%
GeoMultimodal 30% 35% 30% 35% 35% 33%

With GeoSpark Tool Augmentation

Benchmark Qwen 2.5 7B Llama 3.1 8B Mistral 7B Improvement (best)
GeoDistance 70% 10% 0% +70%
GeoReason 100% 65% 80% +15%
GeoTopo 50% 50% 50% +5%

Key findings:

  • 0% on distance across 4/5 models — LLMs cannot compute geodesic distances from coordinates
  • 48% on topology — random chance on binary questions, confirming no spatial predicate capability
  • 79% on change detection — knowledge-based spatial reasoning works; the deficit is strictly computational
  • 70% with tools (Qwen 2.5 7B) — tool augmentation fixes the computational gap
  • 100% on reasoning (Qwen 2.5 7B) — structured prompting solves multi-step spatial chains

Full results: Benchmark Report | Run your own: python -m geospark.bench run

Why GeoSpark?

Problem Without GeoSpark With GeoSpark
"Is point A inside region B?" LLM guesses (30% accuracy) Ground-truth topology check (100%)
"How far is A from B?" LLM can't compute (0% accuracy) Geodesic calculation in meters (100%)
"What changed here since 2020?" LLM hallucinates Real satellite change detection
CRS confusion Silent errors Automatic detection & transformation
"Which landmark is closest?" LLM guesses wrong (0%) Exact nearest-neighbor computation (100%)

Project Status

Phase Status Tests Description
Phase 0-3 — Foundation to Platform Complete 446 Protocol, engine, tools, MCP, Bench, Flows, Knowledge Graph, Plugins
Phase 4 — Deployment Complete 446 Live API, Docker, PyPI, Ollama, API auth, 5-model benchmarks
Phase 5 — Autonomous Agents Complete 446 GeoAgent, SpatialReport, SiteSelector — CLI + REST API

See CONTRIBUTING.md for development guidelines.

Development

# Clone and setup
git clone https://github.com/Maz2580/geospark.git
cd geospark
python -m venv .venv
source .venv/bin/activate  # or .venv\Scripts\activate on Windows
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# Lint & format
ruff check geospark/ tests/
ruff format geospark/ tests/

# Type check
mypy geospark/

Contributing

See CONTRIBUTING.md for guidelines.

Live API

GeoSpark is deployed and accessible at geospark.terrascout.app — 28 endpoints with interactive Swagger documentation.

Author

Created by Mazdak Ghasemi Tootkaboni (University of Melbourne)

License

Apache 2.0 — Copyright 2024-2026 Mazdak Ghasemi Tootkaboni

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geospark_ai-0.4.0.tar.gz (227.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

geospark_ai-0.4.0-py3-none-any.whl (218.3 kB view details)

Uploaded Python 3

File details

Details for the file geospark_ai-0.4.0.tar.gz.

File metadata

  • Download URL: geospark_ai-0.4.0.tar.gz
  • Upload date:
  • Size: 227.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for geospark_ai-0.4.0.tar.gz
Algorithm Hash digest
SHA256 a06d1a3a150dc4a5a4d6a0b049df32a33e1c9f369f7d0f040959019b2e1601eb
MD5 2997213a4a1209196131351c828faaa5
BLAKE2b-256 8d5c2cc97c3b55426f6b0c3825bc48016d8271c1bd8bc792cbc1984014e8ca53

See more details on using hashes here.

File details

Details for the file geospark_ai-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: geospark_ai-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 218.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for geospark_ai-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 15ce8af48a90f43e5ceda84d6c1046c23a86b2d6db09e96c28702223f23212b0
MD5 98ea9b6a34529bbaa14a6da78bd7c3e7
BLAKE2b-256 ca2a49ea43f86903dbf28ec743557c9fefddb617ede4f45284fbd3d9b2a71266

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page