Skip to main content

Reusable SSAL, graph, and route-evaluation utilities for Spatial Ninjas

Project description

research

Reusable SSAL, graph, network-loading, and route-evaluation utilities for Spatial Ninjas.

Package release: v0.1.0
Python package: spatial-ninjas-research
Import package: research

This repository contains the research and evaluation side of the LLM spatial-routing project. It provides the shared Python utilities used to convert routing-network data into SSAL, build an SSAL-derived graph, compute Dijkstra ground truth, and evaluate model-generated route responses.

The broader project compares GPT-family and Gemini-family models on route-generation tasks over an OpenStreetMap-derived Southern Helsinki network.

Relationship to llm-compare-dashboard

The project is split across two repositories:

  • llm-compare-dashboard: run prompts, compare OpenAI and Gemini outputs, store/export history, and inspect route results visually
  • research: provide reusable SSAL, graph, network-loading, route-evaluation, and offline history-evaluation utilities

For deployment, the dashboard should depend on the published spatial-ninjas-research package instead of requiring a sibling checkout of this repository. For local development, the dashboard can still install this repository as an editable package.

Project scope

The project studies how LLMs handle route-generation tasks when given a compact graph-like representation of a real street network instead of a standard map UI. The current reference map is a selected area of Southern Helsinki derived from OpenStreetMap. The current evaluation focuses on GPT and Gemini models.

Main evaluation concerns:

  • structured output correctness
  • candidate node-sequence validity
  • distance estimation quality
  • shortest-path agreement against SSAL-derived Dijkstra ground truth
  • robustness as route difficulty increases

Current workflow

GeoPackage
   ↓
SSAL text
   ↓
SSAL-derived graph
   ↓
Dijkstra ground truth
   ↓
shared route evaluator
   ↓
offline history evaluation helpers / CLI

Typical workflow:

  1. Prepare an OSM-derived routing network.
  2. Build the SSAL artifact from the GeoPackage input.
  3. Run routing prompts in llm-compare-dashboard.
  4. Export the dashboard history as JSON.
  5. Store the export in this repository.
  6. Evaluate the results with the reusable helpers and CLI here.
  7. Record summaries and notes for later review.

Installation

Install the released package

After the package is published, install it with:

python -m pip install spatial-ninjas-research==0.1.0

The PyPI distribution name is spatial-ninjas-research, while the Python import path remains research:

from research.ssal import gpkg_to_ssal
from research.graph import dijkstra_shortest_path
from research.evaluation import evaluate_route_response
from research.network_loader import load_network_bundle_from_gpkg

Verify the installed package with:

python - <<'PY'
import research
from research.ssal import gpkg_to_ssal
from research.history_evaluation import evaluate_entry_file

print(research.__version__)
print('research package ok')
PY

Develop this repository locally

Create and activate a virtual environment first.

macOS / Linux:

python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -e ".[dev]"

Runtime dependencies are declared in pyproject.toml. The pinned requirements.txt is generated from pyproject.toml with pip-compile and can be used when a locked dependency set is preferred:

python -m pip install -r requirements.txt

Development dependencies are exposed through the dev extra and mirrored by requirements-dev.txt:

python -m pip install -r requirements-dev.txt

Local editable install from the dashboard repo

The dashboard can use the published package for deployment, but local editable development is still useful when changing the research utilities and dashboard together.

Expected local folder layout:

spatial-ninjas/
  research/
  llm-compare-dashboard/

From inside llm-compare-dashboard:

source .venv/bin/activate
pip install -e ../research

This editable install should be treated as a development override. Deployment should use the versioned package from PyPI.

Environment configuration

Create a repo-root .env file if you want to override the default network inputs:

GPKG_PATH=data/raw/routing_networks/osm_southern_helsinki_slimmed_cropped.gpkg
EDGES_LAYER=slimmed_cropped_edges
NODES_LAYER=slimmed_cropped_nodes

See scripts/README.md for script-specific usage details.

Core components

SSAL conversion

The OSM road network is converted into a simplified semantic adjacency list to reduce token usage while keeping the routing structure that matters.

The reusable conversion logic lives in:

The CLI entry point for regenerating the versioned SSAL artifact is:

Stable generated SSAL text artifacts are intentionally versioned in this repo.

SSAL graph and Dijkstra baseline

The SSAL-native graph layer lives in:

It provides:

  • immutable Edge objects
  • directed Graph objects
  • SSAL-to-graph parsing
  • target-only node preservation
  • path-length calculation
  • deterministic Dijkstra shortest-path search

This graph layer is the baseline used by the evaluator. The default evaluation path no longer compares against an external routing service.

Shared route-response evaluation

The shared route evaluator lives in:

It handles:

  • JSON recovery from model responses
  • route path extraction
  • model-declared route-length extraction
  • candidate path validation against the SSAL-derived graph
  • Dijkstra ground-truth comparison
  • node and directed-edge overlap metrics
  • candidate-length and declared-length error metrics

This module contains the core route-evaluation logic. Offline history evaluation and future dashboard integration should call into this shared evaluator instead of duplicating parsing or metric code.

Network loading

Network loading lives in:

It loads the GeoPackage network, generates SSAL text, computes the SSAL hash, and builds the graph used for evaluation.

The main container is NetworkBundle, which stores:

  • GeoPackage path
  • generated SSAL text
  • SSAL hash
  • SSAL-derived graph
  • edge layer name
  • node layer name

The graph used for evaluation is built from the generated SSAL text, not directly from the GeoPackage. This keeps the evaluated graph aligned with the representation shown to the model.

Offline history evaluation

Dashboard/export history evaluation is handled by:

This module adapts exported route-history rows into calls to the shared SSAL-native evaluator. It also provides:

  • one-entry JSON loading
  • bulk history JSON loading
  • metadata preservation
  • explicit origin/destination route-context extraction
  • file-based one-entry and bulk evaluation wrappers
  • bulk summary aggregation
  • JSON output formatting and writing

The CLI entry point is intentionally thin:

Evaluation components

The current evaluation workflow is split between reusable helpers and CLI scripts:

  • research/history_evaluation.py adapts dashboard/export history rows into the shared evaluator, preserves metadata, evaluates one-entry or bulk history JSON files, summarizes results, and formats/writes JSON output
  • research/evaluation.py contains the shared route-response evaluator, including JSON recovery, path extraction, path validation, Dijkstra ground truth, and route metrics
  • research/network_loader.py loads the GeoPackage network, generates SSAL text, computes the SSAL hash, and builds the graph used for evaluation
  • research/graph.py contains the SSAL graph model and deterministic Dijkstra baseline
  • scripts/build_ssal.py builds the compact SSAL text artifact from the GeoPackage road-network input
  • scripts/evaluate_history.py is the thin CLI wrapper around research.history_evaluation
  • scripts/README.md documents script dependencies, configuration, and usage

LLM routing prototype

An earlier prototype script feeds SSAL data and a routing prompt to an LLM and expects a route in JSON format. It is kept for historical reference and is not treated as the main current workflow.

Current location:

Comparison interface and history

The project also uses a comparison interface in the separate dashboard repo for side-by-side model testing and persisted history. That history is later exported and analyzed here.

Repository layout

  • data/raw/routing_networks/ — OSM-derived GeoPackage inputs
  • data/derived/ssal/ — stable generated SSAL text artifacts
  • data/raw/llm_history_exports/ — exported dashboard history JSONs
  • research/ — reusable Python logic for SSAL conversion, graph utilities, network loading, route evaluation, and history-evaluation adapters
  • scripts/ — executable SSAL generation and evaluation scripts
  • results/summaries/ — experiment notes and summaries
  • archive/prototypes/ — older prototype scripts
  • tests/ — regression tests for graph, evaluation, network loading, history helpers, and CLI wiring

Common commands

Build the default SSAL artifact:

python scripts/build_ssal.py

Evaluate a dashboard history export:

python scripts/evaluate_history.py \
  --history-json data/raw/llm_history_exports/llm_compare_history_2026-04-20.json \
  --output results/summaries/history_evaluation.json

Evaluate one exported history row:

python scripts/evaluate_history.py \
  --entry-json data/raw/llm_history_exports/entry_example.json \
  --output results/summaries/entry_evaluation.json

Show script options:

python scripts/build_ssal.py --help
python scripts/evaluate_history.py --help

Testing

Install development dependencies:

python -m pip install -e ".[dev]"

Run the core regression tests:

python -m pytest \
  tests/test_history_evaluation.py \
  tests/test_evaluate_history_script.py \
  tests/test_network_loader.py \
  tests/test_evaluation.py \
  tests/test_graph.py \
  -v

The tests cover:

  • SSAL graph parsing and Dijkstra baseline behavior
  • route-response JSON extraction and path validation
  • route comparison metrics
  • GeoPackage-to-SSAL network bundle loading
  • offline history-entry and bulk-history evaluation helpers
  • CLI argument parsing and output wiring

Dashboard export requirements

Offline history evaluation expects exported dashboard route-history rows to include explicit top-level route metadata:

{
  "origin": "25291564",
  "destination": "25291567"
}

The fallback field names are also supported:

{
  "route_origin": "25291564",
  "route_destination": "25291567"
}

The evaluator intentionally does not parse origin/destination from prompt text. Rows without explicit route metadata are skipped with:

{
  "status": "skipped",
  "reason": "missing_route_context"
}

Current status

This repo reflects an evolving research workflow, not a finished software product.

Early experiment notes indicate:

  • GPT-family models sometimes produced partially correct routes and distance estimates
  • Gemini 2.5 Flash often failed to return the expected JSON format in earlier experiments
  • performance worsened on more difficult routes
  • output-format reliability was itself a major issue

The current evaluator is stricter than the early exploratory evaluator. It validates node paths against the SSAL-derived graph, recomputes candidate route length from graph edges, and compares the candidate route against Dijkstra ground truth.

Detailed chronology and test-by-test notes are kept in the supporting docs, summaries, and changelog rather than in the README.

Evaluation note

The current route evaluator is SSAL-native. It validates LLM-produced node paths against the SSAL-derived graph, recomputes candidate route length from graph edges, and compares the candidate route against Dijkstra ground truth.

The default offline evaluator no longer requires:

ORS_API_KEY
routingpy
OpenRouteService

See also

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spatial_ninjas_research-0.1.0.tar.gz (40.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spatial_ninjas_research-0.1.0-py3-none-any.whl (19.8 kB view details)

Uploaded Python 3

File details

Details for the file spatial_ninjas_research-0.1.0.tar.gz.

File metadata

  • Download URL: spatial_ninjas_research-0.1.0.tar.gz
  • Upload date:
  • Size: 40.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for spatial_ninjas_research-0.1.0.tar.gz
Algorithm Hash digest
SHA256 db4ab83f66e7b2b50e8ad9d9401420dfebd85b87023c2c1a8025a36ee3207738
MD5 0372d5d47c7014e30ffd635e9abb4b4a
BLAKE2b-256 48842942f3e1b84cbb1b7d0896e2bf3b8ed41b203fb62f7276a800289fe5ed3a

See more details on using hashes here.

File details

Details for the file spatial_ninjas_research-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for spatial_ninjas_research-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 146fcba40adef546f7e28d825772d52f1b5564a88a2babcae4be5e4da23fe0f5
MD5 cada2659c0c7f8a94068cb6e3b0bd2ec
BLAKE2b-256 2921fee229b276cb707c29391ba431d3c446eadb6528a5cf08cecedf70d56358

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page