Ground-truth spatial reasoning and tool augmentation for LLMs. Open-source Geospatial Intelligence Protocol & Engine.

These details have not been verified by PyPI

Project links

Project description

GeoSpark

The Open-Source Geospatial Intelligence Protocol & Engine

Give any AI model a spatial mind. Open source. Run anywhere.

Current LLMs fail at spatial reasoning — achieving 0% on geodesic distance computation and ~48% (random chance) on topological reasoning across five model families in our benchmarks. GeoSpark fixes this.

The Problem

Ask any LLM: "Is the Louvre inside the 7th arrondissement of Paris?"

It will confidently guess — and get it wrong most of the time. LLMs have no geometric engine, no coordinate system awareness, and no way to verify spatial claims. They hallucinate distances, confuse containment with proximity, and silently swap lat/lon.

The Solution

GeoSpark gives AI models ground-truth spatial reasoning through a standardized protocol:

from geospark import Engine
from geospark.protocol import SpatialQuery, SpatialOperation

engine = Engine(tools=["geocoder", "terrain"])

# Geocode a location (not guessing — real coordinates)
result = engine.execute(SpatialQuery(
    operation=SpatialOperation.GEOCODE,
    metadata={"query": "Eiffel Tower, Paris"}
))

# Check spatial relationships (100% accurate, not LLM guessing)
from geospark.engine.spatial_reasoner import SpatialReasoner

park = {"type": "Polygon", "coordinates": [[[2.29, 48.85], [2.30, 48.85], [2.30, 48.86], [2.29, 48.86], [2.29, 48.85]]]}
point = {"type": "Point", "coordinates": [2.295, 48.855]}

SpatialReasoner.check_relationship(park, point, "contains")  # True — ground truth

Key Features

Multi-Agent Coordinator — One command, right specialist. Intent classification routes your goal to the best agent automatically, with streaming progress and A2A messaging under the hood.
Autonomous Spatial Agents — Give a goal, get a complete analysis. No manual step-by-step. Three built-in agents:
- GeoAgent — Multi-step spatial analysis from natural language ("Find hospitals near the Eiffel Tower")
- SpatialReport — One-command location intelligence dossier (amenities, accessibility, elevation, narrative)
- SiteSelector — Optimal location finding with multi-criteria scoring ("Best pharmacy spot in Zurich near hospitals and schools")
Spatial Intelligence Memory — Dual memory system that learns across sessions:
- SpatialFact — time-agnostic truths ("Eiffel Tower is at 48.86 N, 2.29 E")
- SpatialEpisode — timestamped observations ("PM2.5 was 120 in Delhi on 2026-04-09")
- Vector-based recall with FAISS acceleration or numpy fallback
- Automatic contradiction detection between conflicting facts
- Auto-linking of related memories at cosine similarity > 0.6
Geospatial Context Database — Tiered storage for missions, datasets, and analysis history:
- L0/L1/L2 lazy loading: abstracts in the prompt, full data on demand
- Hotness scoring: sigmoid(log1p(access)) * exp(-decay * age) balances frequency + recency
- Hierarchical URIs like geospark://missions/melbourne_flood/analysis/2026-04
- Recursive parent-child score propagation for context retrieval
- Spatial bbox + temporal range filters, cold-context archival
Chat-to-Flow Builder — Turn a natural-language goal into a validated Flow DAG. ChatFlowSession drives an LLM through incremental builder tool calls (add_step, add_route, set_trigger, finish_flow); invalid calls surface to the LLM as errors and are corrected before the Flow is emitted. Available via geospark flow build "..." or POST /api/v1/flows/build.
Enterprise Middleware — Production hardening that shipped in Phase 8A: sliding-window rate limiting (per-IP and per-API-key with X-RateLimit-* headers), structured JSON-Lines audit logging with daily rotation, per-endpoint usage tracking with persisted counters, and transparent LRU+TTL caching for data channels.
Spatial Reasoning Engine — Topology, geodesic distance, CRS transforms, buffering, area. All geometrically correct, not LLM-guessed.
MCP Server — 6 tools for Claude Desktop and any MCP-compatible AI assistant. pip install geospark-ai[mcp] && geospark-mcp
GeoSpark Bench — 535 benchmark questions, 5 LLM families evaluated. LLMs score 0% on distance; with GeoSpark tools, 70%. Results →
GeoSpark Protocol (GSP) — Standardized JSON protocol for spatial queries and results.
Live Data Channels — Pluggable real-time data sources:
- Weather (Open-Meteo) — current conditions + forecast for any location
- Air Quality (OpenAQ) — PM2.5, NO2, O3 from government stations with WHO health categories
- Active Fires (NASA FIRMS) — near-real-time satellite fire detections
Pluggable Tools — Geocoding, satellite imagery (STAC), terrain/elevation with vertical datum awareness (NAVD88/EGM96/WGS84), routing, spectral indices.
GeoSpark Flows — DAG-based workflow automation with persistence (Supabase), CLI, and REST API.
Spatial Knowledge Graph — Entity-relation graph with OSM admin boundaries, BFS traversal, and natural language queries.
Zero-Cost Stack — Local Ollama (primary, no limits) + OpenRouter free tier (fallback) + Supabase free tier. All inference on your hardware.

Quick Start

pip install geospark-ai

Autonomous Agents (the fastest way to use GeoSpark)

from geospark.agents import GeoAgent, SpatialReport, SiteSelector

# Autonomous spatial analysis — plans and executes multi-step workflows
agent = GeoAgent()
result = agent.run("Find all hospitals within 2km of the Eiffel Tower")
print(result.summary)  # "Within 2km of the Eiffel Tower, there are 3 hospitals..."

# Location intelligence dossier — one command, complete analysis
reporter = SpatialReport()
report = reporter.analyze("Federation Square, Melbourne")
print(report.accessibility)  # Nearest hospital, school, pharmacy with distances

# Optimal site selection — multi-criteria spatial scoring
selector = SiteSelector()
result = selector.find(within="Zurich", near=["hospital", "school"], facility_type="pharmacy")
print(result.best)  # Best-scoring location with explanation

Multi-agent coordinator (Phase 7C)

from geospark.agents import AgentCoordinator

coord = AgentCoordinator()  # Auto-registers GeoAgent, SiteSelector, SpatialReport

# One entry point for any spatial goal — coordinator picks the right specialist
result = coord.run("Find the best location for a cafe in Melbourne CBD near schools")
print(f"Routed to: {result.agent_used}")   # site_selector (matched pattern)
print(result.summary)

Chat-to-flow builder (Phase 8B)

from geospark.flows import ChatFlowSession, FlowRunner, make_ollama_chat_fn

# LLM incrementally builds a validated Flow DAG via tool calls
session = ChatFlowSession(llm_fn=make_ollama_chat_fn("qwen2.5:7b"))
result = session.run("Geocode Valencia, Spain then check its elevation")

if result.flow is not None:
    print(f"Built flow with {len(result.flow.steps)} steps in {result.turns} turns")
    FlowRunner().run(result.flow)  # Execute the generated DAG

Spatial intelligence memory (Phase 7A)

from geospark.memory import SpatialIntelligence

intel = SpatialIntelligence()

# Remember timeless facts and timestamped episodes
intel.remember_fact("Eiffel Tower is at 48.8584 N, 2.2945 E", source="user")
intel.remember_episode("PM2.5 was 120 in Delhi", importance=0.8, source="tool:air_quality")

# Vector-based recall with automatic scoring (similarity + recency + importance)
results = intel.recall("Paris landmarks", limit=5)

# Detect contradicting facts automatically
for c in intel.find_contradictions():
    print(f"Conflict: {c.fact_a_content}  vs.  {c.fact_b_content}")

Geospatial context database (Phase 7B)

from geospark.context import ContextStore, ContextRetriever, GeoContext

store = ContextStore()
retriever = ContextRetriever(store)

# Save a mission with tiered content
store.save(GeoContext(
    uri="geospark://missions/melbourne_flood_2024",
    category="missions",
    name="Melbourne Flood 2024",
    abstract="Major flooding event in Melbourne CBD",  # L0 — always in prompt
    overview={"severity": "high", "area_km2": 42},      # L1 — loaded for context
    full_data={"affected_population": 15000},           # L2 — loaded on demand
    bounds_wgs84=[144.9, -37.9, 145.1, -37.7],
    tags=["flood", "melbourne", "disaster"],
))

# Retrieve with hierarchical scoring (semantic + hotness + parent propagation)
results, stats = retriever.retrieve(query="flood melbourne", limit=5)

As a Python library

from geospark.engine.spatial_reasoner import SpatialReasoner

# Distance calculation (geodesic on WGS84 ellipsoid, not Euclidean)
SpatialReasoner.calculate_distance(
    {"type": "Point", "coordinates": [2.2945, 48.8584]},   # Eiffel Tower
    {"type": "Point", "coordinates": [2.3376, 48.8606]},   # Louvre
)
# Returns: ~3,300 meters (actual geodesic distance)

# Spatial relationship check (ground-truth, not LLM-guessed)
SpatialReasoner.check_relationship(polygon_a, polygon_b, "intersects")

Vertical Datum Awareness

from geospark.tools.terrain.vertical_datum import infer_datum, format_elevation_warning

# Infer vertical datum from elevation source
info = infer_datum("3dep")  # USGS 3DEP → NAVD88
print(info.datum, info.height_type)  # "NAVD88", "orthometric"

# Warn when mixing datums (NAVD88 vs EGM96 = ~17m difference in Colorado)
warning = format_elevation_warning("3dep", "srtm", "Denver, CO")
# "WARNING: Comparing elevations from '3dep' (NAVD88) and 'srtm' (EGM96)..."

As an MCP Server (for Claude Desktop)

pip install geospark-ai[mcp]
geospark-mcp  # Starts stdio MCP server with 6 spatial tools

Add to your Claude Desktop config (~/.claude/claude_desktop_config.json):

{
  "mcpServers": {
    "geospark": { "command": "geospark-mcp" }
  }
}

Natural language spatial questions

from geospark import Engine

engine = Engine(tools=["geocoder", "terrain"])
result = engine.ask("How far is the Eiffel Tower from Big Ben?")
print(result.spatial_context.summary)
# Automatically geocodes both locations + computes geodesic distance

Tries local Ollama first (free, fast), falls back to OpenRouter.

CLI

# Multi-agent coordinator (routes to the right specialist automatically)
geospark multi-agent "Find the best cafe spot in Melbourne near schools"
geospark multi-agent "Analyze Federation Square" --stream   # Live progress
geospark agents                                              # List registered agents

# Autonomous agents (direct access)
geospark agent "Find all parks within 2km of Big Ben"
geospark report "Federation Square, Melbourne"
geospark site-select --within "Paris" --near "metro,schools" --facility restaurant

# Spatial intelligence memory
geospark memory recall "flood risk Melbourne"    # Vector-based recall
geospark memory contradictions                   # Find conflicting facts
geospark memory stats                            # FAISS + count info
geospark memory compact                          # Archive old episodes

# Geospatial context database
geospark context list                            # All stored contexts
geospark context show geospark://missions/flood  # View at L0/L1/L2
geospark context query "flood melbourne"         # Hierarchical retrieval
geospark context stats                           # Hottest contexts
geospark context archive-cold                    # Move cold to _archive/

# Spatial tools
geospark geocode "Tokyo Tower, Japan"
geospark elevation 35.6586 139.7454
geospark distance 48.8566 2.3522 51.5074 -- -0.1278  # Paris → London
geospark ask "Is Tokyo closer to Seoul or Beijing?"

# Live data channels
geospark data weather "Melbourne, Australia"     # Weather + forecast
geospark data air-quality "Delhi"                # PM2.5, NO2, O3
geospark data fires "Amazon Rainforest"          # Active fire detections
geospark data status                             # Check all channels

# Flow workflows
geospark flow list                     # List templates
geospark flow run distance_analysis    # Run a template
geospark flow build "Monitor NDVI in Valencia; alert if it drops below 0.3" --run

Try the Live API (no install needed)

Explore all 62+ endpoints interactively at geospark.terrascout.app/docs.

# Quick distance check
curl -X POST https://geospark.terrascout.app/api/v1/distance \
  -H "Content-Type: application/json" \
  -d '{"lat_a": 48.8566, "lon_a": 2.3522, "lat_b": 51.5074, "lon_b": -0.1278}'

# Build a flow from a natural-language goal (Phase 8B)
curl -X POST https://geospark.terrascout.app/api/v1/flows/build \
  -H "Content-Type: application/json" \
  -d '{"goal": "Geocode Valencia then check its elevation", "max_turns": 10}'

Run the Benchmark

# Run GeoSpark Bench on topological reasoning
python -m geospark.bench run --benchmark geotopo

# Run all benchmarks
python -m geospark.bench run

# List available benchmarks
python -m geospark.bench list

Architecture

┌─────────────────────────────────────────────────┐
│                   User / LLM                    │
│         (Claude, ChatGPT, Ollama, ...)          │
└──────────┬──────────────────────┬───────────────┘
           │ MCP                  │ REST API (62+ endpoints)
           v                      v
┌──────────────────────────────────────────────────┐
│         Multi-Agent Coordinator (Phase 7C)       │
│   Intent classification · A2A msg · Streaming    │
└──────────┬───────────────────────────────────────┘
           │
           v
┌──────────────────────────────────────────────────┐
│           Autonomous Agents Layer                │
│  GeoAgent · SpatialReport · SiteSelector         │
└──────────┬───────────────────────────────────────┘
           │
           v
┌──────────────────────────────────────────────────┐
│         Spatial Intelligence (Phase 7A/B)        │
│  Facts + Episodes + Contradictions (VectorStore) │
│  Tiered Context DB · Hotness · Hierarchy         │
└──────────┬───────────────────────────────────────┘
           │
           v
┌──────────────────────────────────────────────────┐
│              GeoSpark Protocol (GSP)             │
│         Standardized JSON query/result           │
└──────────┬───────────────────────────────────────┘
           │
           v
┌──────────────────────────────────────────────────┐
│             Spatial Reasoning Engine              │
│  Topology · Distance · CRS · Buffer · Centroid   │
│  Planner · Cache · Temporal · Aggregator         │
└──────────┬───────────────────────────────────────┘
           │
    ┌──────┴──────┬──────────┬──────────┬──────────┐
    v             v          v          v          v
┌────────┐ ┌──────────┐ ┌────────┐ ┌────────┐ ┌────────┐
│Geocoder│ │Satellite │ │Terrain │ │Routing │ │Spectral│
│        │ │(STAC)    │ │+ Datum │ │(OSRM)  │ │Indices │
│        │ │          │ │Aware.  │ │        │ │        │
└────────┘ └──────────┘ └────────┘ └────────┘ └────────┘
           │
    ┌──────┴──────┬──────────┬──────────┐
    v             v          v          v
┌────────┐ ┌──────────┐ ┌────────┐ ┌────────┐
│ Flows  │ │Knowledge │ │Plugins │ │Spatial │
│+ Persist│ │Graph+OSM │ │        │ │RAG     │
│(Supa.) │ │(Admin)   │ │        │ │(Embed) │
└────────┘ └──────────┘ └────────┘ └────────┘

Benchmark Results

GeoSpark Bench v1.0 — 535 questions across 5 benchmarks, evaluated on 5 LLM families (Qwen, Llama, Gemma, Mistral, Phi) via Ollama.

Baseline: LLM Alone (No Tools)

Benchmark	Qwen 2.5 7B	Llama 3.1 8B	Gemma 2 9B	Mistral 7B	Phi-3.5 3.8B	Mean
GeoDistance	0%	0%	30%	0%	0%	6%
GeoTopo	45%	50%	50%	50%	45%	48%
GeoChange	90%	65%	80%	85%	75%	79%
GeoReason	85%	65%	90%	75%	70%	77%
GeoMultimodal	30%	35%	30%	35%	35%	33%

With GeoSpark Tool Augmentation

Benchmark	Qwen 2.5 7B	Llama 3.1 8B	Mistral 7B	Improvement (best)
GeoDistance	70%	10%	0%	+70%
GeoReason	100%	65%	80%	+15%
GeoTopo	50%	50%	50%	+5%

Key findings:

0% on distance across 4/5 models — LLMs cannot compute geodesic distances from coordinates
48% on topology — random chance on binary questions, confirming no spatial predicate capability
79% on change detection — knowledge-based spatial reasoning works; the deficit is strictly computational
70% with tools (Qwen 2.5 7B) — tool augmentation fixes the computational gap
100% on reasoning (Qwen 2.5 7B) — structured prompting solves multi-step spatial chains

Full results: Benchmark Report | Run your own: python -m geospark.bench run

Why GeoSpark?

Problem	Without GeoSpark	With GeoSpark
"Is point A inside region B?"	LLM guesses (30% accuracy)	Ground-truth topology check (100%)
"How far is A from B?"	LLM can't compute (0% accuracy)	Geodesic calculation in meters (100%)
"What changed here since 2020?"	LLM hallucinates	Real satellite change detection
CRS confusion	Silent errors	Automatic detection & transformation
"Which landmark is closest?"	LLM guesses wrong (0%)	Exact nearest-neighbor computation (100%)

Project Status

Phase	Status	Tests	Description
Phase 0-3 — Foundation to Platform	Complete	441	Protocol, engine, tools, MCP, Bench, Flows, Knowledge Graph, Plugins
Phase 4 — Deployment	Complete	446	Live API, Docker, PyPI, Ollama, API auth, 5-model benchmarks
Phase 5 — Autonomous Agents	Complete	446	GeoAgent, SpatialReport, SiteSelector
Phase 6 — Data Channels	Complete	474	Weather, Air Quality, NASA Fires — free, real-time
Phase 7A — Spatial Memory	Complete	540	Facts + Episodes, VectorStore (FAISS), contradictions, auto-linking
Phase 7B — Context Database	Complete	589	Tiered L0/L1/L2 loading, hotness scoring, hierarchical retrieval
Phase 7C — Multi-Agent Coordination	Complete	657	Toolkit, A2A messaging, coordinator with streaming
Phase 7 UI — Guide & Pages	Complete	679	Onboarding guide, Memory/Context UI pages, Coordinator tab
Phase 8A — Enterprise Hardening	Complete	754	Rate limiting, audit logging, usage tracking, channel cache
Phase 8B — Chat-to-Flow Builder	Complete	776	Natural-language goal → validated Flow DAG via LLM tool calling

See CONTRIBUTING.md for development guidelines.

Development

# Clone and setup
git clone https://github.com/Maz2580/geospark.git
cd geospark
python -m venv .venv
source .venv/bin/activate  # or .venv\Scripts\activate on Windows
pip install -e ".[dev]"

# Run tests
pytest tests/ -v

# Lint & format
ruff check geospark/ tests/
ruff format geospark/ tests/

# Type check
mypy geospark/

Contributing

See CONTRIBUTING.md for guidelines.

Live API

GeoSpark is deployed and accessible at geospark.terrascout.app — 62+ endpoints with interactive Swagger documentation.

Author

Created by Mazdak Ghasemi Tootkaboni (University of Melbourne)

ORCID: 0000-0001-8084-5270
GitHub: @Maz2580

License

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.5.1

Apr 22, 2026

0.5.0

Apr 4, 2026

0.4.0

Apr 4, 2026

0.3.0

Mar 31, 2026

0.2.0

Mar 30, 2026

0.1.0

Mar 23, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geospark_ai-0.5.1.tar.gz (360.0 kB view details)

Uploaded Apr 22, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

geospark_ai-0.5.1-py3-none-any.whl (337.6 kB view details)

Uploaded Apr 22, 2026 Python 3

File details

Details for the file geospark_ai-0.5.1.tar.gz.

File metadata

Download URL: geospark_ai-0.5.1.tar.gz
Upload date: Apr 22, 2026
Size: 360.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for geospark_ai-0.5.1.tar.gz
Algorithm	Hash digest
SHA256	`8876afe53986fa16435a0c010212ae9154a99d36af7f8552193122579cabeaf5`
MD5	`2a0e83650ba173b165b5f0594c6e9a20`
BLAKE2b-256	`b1d1d8f6c4b61f0dc1a44e6802d2934ea28c14609897cae95feef5511e8c1755`

See more details on using hashes here.

File details

Details for the file geospark_ai-0.5.1-py3-none-any.whl.

File metadata

Download URL: geospark_ai-0.5.1-py3-none-any.whl
Upload date: Apr 22, 2026
Size: 337.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.8

File hashes

Hashes for geospark_ai-0.5.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`006aa5a54d6cd38cfe7a92460e99a07d92788720072a53bb151e39c6da78c35b`
MD5	`4d31126cbf53e0db05435035e2e8074f`
BLAKE2b-256	`f29b05005852aac63d5e4babf483ea9fb30a3f9ce686dccf3ae2cfdcd76688a0`

See more details on using hashes here.

geospark-ai 0.5.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

GeoSpark

The Problem

The Solution

Key Features

Quick Start

Autonomous Agents (the fastest way to use GeoSpark)

Multi-agent coordinator (Phase 7C)

Chat-to-flow builder (Phase 8B)

Spatial intelligence memory (Phase 7A)

Geospatial context database (Phase 7B)

As a Python library

Vertical Datum Awareness

As an MCP Server (for Claude Desktop)

Natural language spatial questions

CLI

Try the Live API (no install needed)

Run the Benchmark

Architecture

Benchmark Results

Baseline: LLM Alone (No Tools)

With GeoSpark Tool Augmentation

Why GeoSpark?

Project Status

Development

Contributing

Live API

Author

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes