Reusable SSAL, graph, and route-evaluation utilities for Spatial Ninjas

These details have not been verified by PyPI

Project links

Project description

research

Reusable SSAL, graph, network-loading, and route-evaluation utilities for Spatial Ninjas.

Package release: v0.1.0
Python package: spatial-ninjas-research
Import package: research

This repository contains the research and evaluation side of the LLM spatial-routing project. It provides the shared Python utilities used to convert routing-network data into SSAL, build an SSAL-derived graph, compute Dijkstra ground truth, and evaluate model-generated route responses.

The broader project compares GPT-family and Gemini-family models on route-generation tasks over an OpenStreetMap-derived Southern Helsinki network.

Relationship to llm-compare-dashboard

The project is split across two repositories:

llm-compare-dashboard: run prompts, compare OpenAI and Gemini outputs, store/export history, and inspect route results visually
research: provide reusable SSAL, graph, network-loading, route-evaluation, and offline history-evaluation utilities

For deployment, the dashboard should depend on the published spatial-ninjas-research package instead of requiring a sibling checkout of this repository. For local development, the dashboard can still install this repository as an editable package.

Project scope

The project studies how LLMs handle route-generation tasks when given a compact graph-like representation of a real street network instead of a standard map UI. The current reference map is a selected area of Southern Helsinki derived from OpenStreetMap. The current evaluation focuses on GPT and Gemini models.

Main evaluation concerns:

structured output correctness
candidate node-sequence validity
distance estimation quality
shortest-path agreement against SSAL-derived Dijkstra ground truth
robustness as route difficulty increases

Current workflow

GeoPackage
   ↓
SSAL text
   ↓
SSAL-derived graph
   ↓
Dijkstra ground truth
   ↓
shared route evaluator
   ↓
offline history evaluation helpers / CLI

Typical workflow:

Prepare an OSM-derived routing network.
Build the SSAL artifact from the GeoPackage input.
Run routing prompts in llm-compare-dashboard.
Export the dashboard history as JSON.
Store the export in this repository.
Evaluate the results with the reusable helpers and CLI here.
Record summaries and notes for later review.

Installation

Install the released package

After the package is published, install it with:

python -m pip install spatial-ninjas-research==0.1.0

The PyPI distribution name is spatial-ninjas-research, while the Python import path remains research:

from research.ssal import gpkg_to_ssal
from research.graph import dijkstra_shortest_path
from research.evaluation import evaluate_route_response
from research.network_loader import load_network_bundle_from_gpkg

Verify the installed package with:

python - <<'PY'
import research
from research.ssal import gpkg_to_ssal
from research.history_evaluation import evaluate_entry_file

print(research.__version__)
print('research package ok')
PY

Develop this repository locally

Create and activate a virtual environment first.

macOS / Linux:

python3 -m venv .venv
source .venv/bin/activate
python -m pip install --upgrade pip
python -m pip install -e ".[dev]"

Runtime dependencies are declared in pyproject.toml. The pinned requirements.txt is generated from pyproject.toml with pip-compile and can be used when a locked dependency set is preferred:

python -m pip install -r requirements.txt

Development dependencies are exposed through the dev extra and mirrored by requirements-dev.txt:

python -m pip install -r requirements-dev.txt

Local editable install from the dashboard repo

The dashboard can use the published package for deployment, but local editable development is still useful when changing the research utilities and dashboard together.

Expected local folder layout:

spatial-ninjas/
  research/
  llm-compare-dashboard/

From inside llm-compare-dashboard:

source .venv/bin/activate
pip install -e ../research

This editable install should be treated as a development override. Deployment should use the versioned package from PyPI.

Environment configuration

Create a repo-root .env file if you want to override the default network inputs:

GPKG_PATH=data/raw/routing_networks/osm_southern_helsinki_slimmed_cropped.gpkg
EDGES_LAYER=slimmed_cropped_edges
NODES_LAYER=slimmed_cropped_nodes

See scripts/README.md for script-specific usage details.

Core components

SSAL conversion

The OSM road network is converted into a simplified semantic adjacency list to reduce token usage while keeping the routing structure that matters.

The reusable conversion logic lives in:

research/ssal.py

The CLI entry point for regenerating the versioned SSAL artifact is:

scripts/build_ssal.py

Stable generated SSAL text artifacts are intentionally versioned in this repo.

SSAL graph and Dijkstra baseline

The SSAL-native graph layer lives in:

research/graph.py

It provides:

immutable Edge objects
directed Graph objects
SSAL-to-graph parsing
target-only node preservation
path-length calculation
deterministic Dijkstra shortest-path search

This graph layer is the baseline used by the evaluator. The default evaluation path no longer compares against an external routing service.

Shared route-response evaluation

The shared route evaluator lives in:

research/evaluation.py

It handles:

JSON recovery from model responses
route path extraction
model-declared route-length extraction
candidate path validation against the SSAL-derived graph
Dijkstra ground-truth comparison
node and directed-edge overlap metrics
candidate-length and declared-length error metrics

This module contains the core route-evaluation logic. Offline history evaluation and future dashboard integration should call into this shared evaluator instead of duplicating parsing or metric code.

Network loading

Network loading lives in:

research/network_loader.py

It loads the GeoPackage network, generates SSAL text, computes the SSAL hash, and builds the graph used for evaluation.

The main container is NetworkBundle, which stores:

GeoPackage path
generated SSAL text
SSAL hash
SSAL-derived graph
edge layer name
node layer name

The graph used for evaluation is built from the generated SSAL text, not directly from the GeoPackage. This keeps the evaluated graph aligned with the representation shown to the model.

Offline history evaluation

Dashboard/export history evaluation is handled by:

research/history_evaluation.py

This module adapts exported route-history rows into calls to the shared SSAL-native evaluator. It also provides:

one-entry JSON loading
bulk history JSON loading
metadata preservation
explicit origin/destination route-context extraction
file-based one-entry and bulk evaluation wrappers
bulk summary aggregation
JSON output formatting and writing

The CLI entry point is intentionally thin:

scripts/evaluate_history.py

Evaluation components

The current evaluation workflow is split between reusable helpers and CLI scripts:

research/history_evaluation.py adapts dashboard/export history rows into the shared evaluator, preserves metadata, evaluates one-entry or bulk history JSON files, summarizes results, and formats/writes JSON output
research/evaluation.py contains the shared route-response evaluator, including JSON recovery, path extraction, path validation, Dijkstra ground truth, and route metrics
research/network_loader.py loads the GeoPackage network, generates SSAL text, computes the SSAL hash, and builds the graph used for evaluation
research/graph.py contains the SSAL graph model and deterministic Dijkstra baseline
scripts/build_ssal.py builds the compact SSAL text artifact from the GeoPackage road-network input
scripts/evaluate_history.py is the thin CLI wrapper around research.history_evaluation
scripts/README.md documents script dependencies, configuration, and usage

LLM routing prototype

An earlier prototype script feeds SSAL data and a routing prompt to an LLM and expects a route in JSON format. It is kept for historical reference and is not treated as the main current workflow.

Current location:

archive/prototypes/

Comparison interface and history

The project also uses a comparison interface in the separate dashboard repo for side-by-side model testing and persisted history. That history is later exported and analyzed here.

Repository layout

data/raw/routing_networks/ — OSM-derived GeoPackage inputs
data/derived/ssal/ — stable generated SSAL text artifacts
data/raw/llm_history_exports/ — exported dashboard history JSONs
research/ — reusable Python logic for SSAL conversion, graph utilities, network loading, route evaluation, and history-evaluation adapters
scripts/ — executable SSAL generation and evaluation scripts
results/summaries/ — experiment notes and summaries
archive/prototypes/ — older prototype scripts
tests/ — regression tests for graph, evaluation, network loading, history helpers, and CLI wiring

Common commands

Build the default SSAL artifact:

python scripts/build_ssal.py

Evaluate a dashboard history export:

python scripts/evaluate_history.py \
  --history-json data/raw/llm_history_exports/llm_compare_history_2026-04-20.json \
  --output results/summaries/history_evaluation.json

Evaluate one exported history row:

python scripts/evaluate_history.py \
  --entry-json data/raw/llm_history_exports/entry_example.json \
  --output results/summaries/entry_evaluation.json

Show script options:

python scripts/build_ssal.py --help
python scripts/evaluate_history.py --help

Testing

Install development dependencies:

python -m pip install -e ".[dev]"

Run the core regression tests:

python -m pytest \
  tests/test_history_evaluation.py \
  tests/test_evaluate_history_script.py \
  tests/test_network_loader.py \
  tests/test_evaluation.py \
  tests/test_graph.py \
  -v

The tests cover:

SSAL graph parsing and Dijkstra baseline behavior
route-response JSON extraction and path validation
route comparison metrics
GeoPackage-to-SSAL network bundle loading
offline history-entry and bulk-history evaluation helpers
CLI argument parsing and output wiring

Dashboard export requirements

Offline history evaluation expects exported dashboard route-history rows to include explicit top-level route metadata:

{
  "origin": "25291564",
  "destination": "25291567"
}

The fallback field names are also supported:

{
  "route_origin": "25291564",
  "route_destination": "25291567"
}

The evaluator intentionally does not parse origin/destination from prompt text. Rows without explicit route metadata are skipped with:

{
  "status": "skipped",
  "reason": "missing_route_context"
}

Current status

This repo reflects an evolving research workflow, not a finished software product.

Early experiment notes indicate:

GPT-family models sometimes produced partially correct routes and distance estimates
Gemini 2.5 Flash often failed to return the expected JSON format in earlier experiments
performance worsened on more difficult routes
output-format reliability was itself a major issue

The current evaluator is stricter than the early exploratory evaluator. It validates node paths against the SSAL-derived graph, recomputes candidate route length from graph edges, and compares the candidate route against Dijkstra ground truth.

Detailed chronology and test-by-test notes are kept in the supporting docs, summaries, and changelog rather than in the README.

Evaluation note

The current route evaluator is SSAL-native. It validates LLM-produced node paths against the SSAL-derived graph, recomputes candidate route length from graph edges, and compares the candidate route against Dijkstra ground truth.

The default offline evaluator no longer requires:

ORS_API_KEY
routingpy
OpenRouteService

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

May 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spatial_ninjas_research-0.1.0.tar.gz (40.4 kB view details)

Uploaded May 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

spatial_ninjas_research-0.1.0-py3-none-any.whl (19.8 kB view details)

Uploaded May 2, 2026 Python 3

File details

Details for the file spatial_ninjas_research-0.1.0.tar.gz.

File metadata

Download URL: spatial_ninjas_research-0.1.0.tar.gz
Upload date: May 2, 2026
Size: 40.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for spatial_ninjas_research-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`db4ab83f66e7b2b50e8ad9d9401420dfebd85b87023c2c1a8025a36ee3207738`
MD5	`0372d5d47c7014e30ffd635e9abb4b4a`
BLAKE2b-256	`48842942f3e1b84cbb1b7d0896e2bf3b8ed41b203fb62f7276a800289fe5ed3a`

See more details on using hashes here.

File details

Details for the file spatial_ninjas_research-0.1.0-py3-none-any.whl.

File metadata

Download URL: spatial_ninjas_research-0.1.0-py3-none-any.whl
Upload date: May 2, 2026
Size: 19.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.2

File hashes

Hashes for spatial_ninjas_research-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`146fcba40adef546f7e28d825772d52f1b5564a88a2babcae4be5e4da23fe0f5`
MD5	`cada2659c0c7f8a94068cb6e3b0bd2ec`
BLAKE2b-256	`2921fee229b276cb707c29391ba431d3c446eadb6528a5cf08cecedf70d56358`

See more details on using hashes here.

spatial-ninjas-research 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

research

Relationship to llm-compare-dashboard

Project scope

Current workflow

Installation

Install the released package

Develop this repository locally

Local editable install from the dashboard repo

Environment configuration

Core components

SSAL conversion

SSAL graph and Dijkstra baseline

Shared route-response evaluation

Network loading

Offline history evaluation

Evaluation components

LLM routing prototype

Comparison interface and history

Repository layout

Common commands

Testing

Dashboard export requirements

Current status

Evaluation note

See also

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes