Reusable SSAL, graph, and route-evaluation utilities for Spatial Ninjas
Project description
research
Reusable SSAL, graph, network-loading, and route-evaluation utilities for Spatial Ninjas.
Package release: v0.1.0
Python package: spatial-ninjas-research
Import package: research
This repository contains the research and evaluation side of the LLM spatial-routing project. It provides the shared Python utilities used to convert routing-network data into SSAL, build an SSAL-derived graph, compute Dijkstra ground truth, and evaluate model-generated route responses.
The broader project compares GPT-family and Gemini-family models on route-generation tasks over an OpenStreetMap-derived Southern Helsinki network.
Relationship to llm-compare-dashboard
The project is split across two repositories:
- llm-compare-dashboard: run prompts, compare OpenAI and Gemini outputs, store/export history, and inspect route results visually
- research: provide reusable SSAL, graph, network-loading, route-evaluation, and offline history-evaluation utilities
For deployment, the dashboard should depend on the published spatial-ninjas-research package instead of requiring a sibling checkout of this repository. For local development, the dashboard can still install this repository as an editable package.
Project scope
The project studies how LLMs handle route-generation tasks when given a compact graph-like representation of a real street network instead of a standard map UI. The current reference map is a selected area of Southern Helsinki derived from OpenStreetMap. The current evaluation focuses on GPT and Gemini models.
Main evaluation concerns:
- structured output correctness
- candidate node-sequence validity
- distance estimation quality
- shortest-path agreement against SSAL-derived Dijkstra ground truth
- robustness as route difficulty increases
Current workflow
GeoPackage
↓
SSAL text
↓
SSAL-derived graph
↓
Dijkstra ground truth
↓
shared route evaluator
↓
offline history evaluation helpers / CLI
Typical workflow:
- Prepare an OSM-derived routing network.
- Build the SSAL artifact from the GeoPackage input.
- Run routing prompts in llm-compare-dashboard.
- Export the dashboard history as JSON.
- Store the export in this repository.
- Evaluate the results with the reusable helpers and CLI here.
- Record summaries and notes for later review.
Installation
Install the released package
After the package is published, install it with:
python -m pip install spatial-ninjas-research==0.1.0
The PyPI distribution name is spatial-ninjas-research, while the Python import path remains research:
from research.ssal import gpkg_to_ssal
from research.graph import dijkstra_shortest_path
from research.evaluation import evaluate_route_response
from research.network_loader import load_network_bundle_from_gpkg
Verify the installed package with:
python - <<'PY'
import research
from research.ssal import gpkg_to_ssal
from research.history_evaluation import evaluate_entry_file
print(research.__version__)
print('research package ok')
PY
Develop this repository locally
Create and activate a virtual environment first.
macOS / Linux:
python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -e ".[dev]"
Runtime dependencies are declared in pyproject.toml. The pinned requirements.txt is generated from pyproject.toml with pip-compile and can be used when a locked dependency set is preferred:
python -m pip install -r requirements.txt
Development dependencies are exposed through the dev extra and mirrored by requirements-dev.txt:
python -m pip install -r requirements-dev.txt
Local editable install from the dashboard repo
The dashboard can use the published package for deployment, but local editable development is still useful when changing the research utilities and dashboard together.
Expected local folder layout:
spatial-ninjas/
research/
llm-compare-dashboard/
From inside llm-compare-dashboard:
source .venv/bin/activate
pip install -e ../research
This editable install should be treated as a development override. Deployment should use the versioned package from PyPI.
Environment configuration
Create a repo-root .env file if you want to override the default network inputs:
GPKG_PATH=data/raw/routing_networks/osm_southern_helsinki_slimmed_cropped.gpkg
EDGES_LAYER=slimmed_cropped_edges
NODES_LAYER=slimmed_cropped_nodes
See scripts/README.md for script-specific usage details.
Core components
SSAL conversion
The OSM road network is converted into a simplified semantic adjacency list to reduce token usage while keeping the routing structure that matters.
The reusable conversion logic lives in:
The CLI entry point for regenerating the versioned SSAL artifact is:
Stable generated SSAL text artifacts are intentionally versioned in this repo.
SSAL graph and Dijkstra baseline
The SSAL-native graph layer lives in:
It provides:
- immutable
Edgeobjects - directed
Graphobjects - SSAL-to-graph parsing
- target-only node preservation
- path-length calculation
- deterministic Dijkstra shortest-path search
This graph layer is the baseline used by the evaluator. The default evaluation path no longer compares against an external routing service.
Shared route-response evaluation
The shared route evaluator lives in:
It handles:
- JSON recovery from model responses
- route path extraction
- model-declared route-length extraction
- candidate path validation against the SSAL-derived graph
- Dijkstra ground-truth comparison
- node and directed-edge overlap metrics
- candidate-length and declared-length error metrics
This module contains the core route-evaluation logic. Offline history evaluation and future dashboard integration should call into this shared evaluator instead of duplicating parsing or metric code.
Network loading
Network loading lives in:
It loads the GeoPackage network, generates SSAL text, computes the SSAL hash, and builds the graph used for evaluation.
The main container is NetworkBundle, which stores:
- GeoPackage path
- generated SSAL text
- SSAL hash
- SSAL-derived graph
- edge layer name
- node layer name
The graph used for evaluation is built from the generated SSAL text, not directly from the GeoPackage. This keeps the evaluated graph aligned with the representation shown to the model.
Offline history evaluation
Dashboard/export history evaluation is handled by:
This module adapts exported route-history rows into calls to the shared SSAL-native evaluator. It also provides:
- one-entry JSON loading
- bulk history JSON loading
- metadata preservation
- explicit origin/destination route-context extraction
- file-based one-entry and bulk evaluation wrappers
- bulk summary aggregation
- JSON output formatting and writing
The CLI entry point is intentionally thin:
Evaluation components
The current evaluation workflow is split between reusable helpers and CLI scripts:
- research/history_evaluation.py adapts dashboard/export history rows into the shared evaluator, preserves metadata, evaluates one-entry or bulk history JSON files, summarizes results, and formats/writes JSON output
- research/evaluation.py contains the shared route-response evaluator, including JSON recovery, path extraction, path validation, Dijkstra ground truth, and route metrics
- research/network_loader.py loads the GeoPackage network, generates SSAL text, computes the SSAL hash, and builds the graph used for evaluation
- research/graph.py contains the SSAL graph model and deterministic Dijkstra baseline
- scripts/build_ssal.py builds the compact SSAL text artifact from the GeoPackage road-network input
- scripts/evaluate_history.py is the thin CLI wrapper around
research.history_evaluation - scripts/README.md documents script dependencies, configuration, and usage
LLM routing prototype
An earlier prototype script feeds SSAL data and a routing prompt to an LLM and expects a route in JSON format. It is kept for historical reference and is not treated as the main current workflow.
Current location:
Comparison interface and history
The project also uses a comparison interface in the separate dashboard repo for side-by-side model testing and persisted history. That history is later exported and analyzed here.
Repository layout
data/raw/routing_networks/— OSM-derived GeoPackage inputsdata/derived/ssal/— stable generated SSAL text artifactsdata/raw/llm_history_exports/— exported dashboard history JSONsresearch/— reusable Python logic for SSAL conversion, graph utilities, network loading, route evaluation, and history-evaluation adaptersscripts/— executable SSAL generation and evaluation scriptsresults/summaries/— experiment notes and summariesarchive/prototypes/— older prototype scriptstests/— regression tests for graph, evaluation, network loading, history helpers, and CLI wiring
Common commands
Build the default SSAL artifact:
python scripts/build_ssal.py
Evaluate a dashboard history export:
python scripts/evaluate_history.py \
--history-json data/raw/llm_history_exports/llm_compare_history_2026-04-20.json \
--output results/summaries/history_evaluation.json
Evaluate one exported history row:
python scripts/evaluate_history.py \
--entry-json data/raw/llm_history_exports/entry_example.json \
--output results/summaries/entry_evaluation.json
Show script options:
python scripts/build_ssal.py --help
python scripts/evaluate_history.py --help
Testing
Install development dependencies:
python -m pip install -e ".[dev]"
Run the core regression tests:
python -m pytest \
tests/test_history_evaluation.py \
tests/test_evaluate_history_script.py \
tests/test_network_loader.py \
tests/test_evaluation.py \
tests/test_graph.py \
-v
The tests cover:
- SSAL graph parsing and Dijkstra baseline behavior
- route-response JSON extraction and path validation
- route comparison metrics
- GeoPackage-to-SSAL network bundle loading
- offline history-entry and bulk-history evaluation helpers
- CLI argument parsing and output wiring
Dashboard export requirements
Offline history evaluation expects exported dashboard route-history rows to include explicit top-level route metadata:
{
"origin": "25291564",
"destination": "25291567"
}
The fallback field names are also supported:
{
"route_origin": "25291564",
"route_destination": "25291567"
}
The evaluator intentionally does not parse origin/destination from prompt text. Rows without explicit route metadata are skipped with:
{
"status": "skipped",
"reason": "missing_route_context"
}
Current status
This repo reflects an evolving research workflow, not a finished software product.
Early experiment notes indicate:
- GPT-family models sometimes produced partially correct routes and distance estimates
- Gemini 2.5 Flash often failed to return the expected JSON format in earlier experiments
- performance worsened on more difficult routes
- output-format reliability was itself a major issue
The current evaluator is stricter than the early exploratory evaluator. It validates node paths against the SSAL-derived graph, recomputes candidate route length from graph edges, and compares the candidate route against Dijkstra ground truth.
Detailed chronology and test-by-test notes are kept in the supporting docs, summaries, and changelog rather than in the README.
Evaluation note
The current route evaluator is SSAL-native. It validates LLM-produced node paths against the SSAL-derived graph, recomputes candidate route length from graph edges, and compares the candidate route against Dijkstra ground truth.
The default offline evaluator no longer requires:
ORS_API_KEY
routingpy
OpenRouteService
See also
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file spatial_ninjas_research-0.1.0.tar.gz.
File metadata
- Download URL: spatial_ninjas_research-0.1.0.tar.gz
- Upload date:
- Size: 40.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
db4ab83f66e7b2b50e8ad9d9401420dfebd85b87023c2c1a8025a36ee3207738
|
|
| MD5 |
0372d5d47c7014e30ffd635e9abb4b4a
|
|
| BLAKE2b-256 |
48842942f3e1b84cbb1b7d0896e2bf3b8ed41b203fb62f7276a800289fe5ed3a
|
File details
Details for the file spatial_ninjas_research-0.1.0-py3-none-any.whl.
File metadata
- Download URL: spatial_ninjas_research-0.1.0-py3-none-any.whl
- Upload date:
- Size: 19.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
146fcba40adef546f7e28d825772d52f1b5564a88a2babcae4be5e4da23fe0f5
|
|
| MD5 |
cada2659c0c7f8a94068cb6e3b0bd2ec
|
|
| BLAKE2b-256 |
2921fee229b276cb707c29391ba431d3c446eadb6528a5cf08cecedf70d56358
|