Skip to main content

4D spatiotemporal vector database middleware for AI world models

Project description

LOCI

A 4D spatiotemporal vector database for AI world models.

CI PyPI version Python 3.11+ License: Apache 2.0


The Problem

Modern world models — V-JEPA 2, DreamerV3, GAIA-1, UniSim — produce embeddings where every vector has an implicit 4D spatiotemporal address (x, y, z, t). Existing vector databases (Qdrant, Milvus, Weaviate) treat all embedding dimensions equally: a spatial query requires 3+ float-range payload filters evaluated independently, time-based retrieval has no native sharding, and there is no concept of "predict the future then find what's nearby."

The Solution

LOCI is a middleware layer on top of Qdrant that makes spatiotemporal structure first-class through three novel primitives:

1. Multi-Resolution Hilbert Bucketing

Encode (x, y, z, t) at multiple Hilbert resolutions (p=4, 8, 12). Spatial bounding-box queries use a Hilbert integer pre-filter with overlap, then apply an exact payload post-filter as the authoritative geometric check. By default queries start at the coarsest indexed resolution; with adaptive=True, dense regions can be promoted to finer Hilbert resolutions at query time.

         Naive Qdrant               LOCI
    ┌──────────────────┐     ┌──────────────────┐
    │ x_min ≤ x ≤ x_max│     │                  │
    │ y_min ≤ y ≤ y_max│ →   │ hilbert_r4 ∈ {…} │
    │ z_min ≤ z ≤ z_max│     │  (single filter)  │
    └──────────────────┘     └──────────────────┘

2. Temporal Sharding

Automatic routing of vectors to time-partitioned Qdrant collections (loci_{epoch_id}). Configurable epoch size. Queries fan out only to epochs that overlap the requested time window — with the async client, all shards are searched concurrently via asyncio.gather.

3. Predict-then-Retrieve with Novelty Detection

An atomic API call that composes a user-supplied world model with vector search, returning both results and a novelty score:

result = client.predict_and_retrieve(
    context_vector=current_embedding,
    predictor_fn=my_world_model,
    future_horizon_ms=2000,
    current_position=(0.5, 0.3, 0.8),
)
print(f"Novelty: {result.prediction_novelty:.2f}")
# 0.0 = "I've seen this before"
# 1.0 = "This is new territory"

Quick Start

No Docker? No problem — in-memory mode

Try LOCI instantly with zero infrastructure using LocalLociClient:

pip install loci-stdb          # or: pip install -e ".[dev]"
from loci import LocalLociClient, WorldState

client = LocalLociClient(vector_size=512)

# Insert a world state
state = WorldState(
    x=0.5, y=0.3, z=0.8,
    timestamp_ms=1000,
    vector=[0.1] * 512,
    scene_id="my_scene",
)
state_id = client.insert(state)

# Query by vector similarity + spatial bounds + time window
results = client.query(
    vector=[0.1] * 512,
    spatial_bounds={"x_min": 0.0, "x_max": 1.0,
                    "y_min": 0.0, "y_max": 1.0,
                    "z_min": 0.0, "z_max": 1.0},
    time_window_ms=(0, 5000),
    limit=10,
)

With Qdrant (production)

pip install loci-stdb
docker run -p 6333:6333 qdrant/qdrant
from loci import LociClient, WorldState

client = LociClient(
    "http://localhost:6333",
    vector_size=512,
    epoch_size_ms=5000,
    distance="cosine",
)

# Insert world states
state = WorldState(
    x=0.5, y=0.3, z=0.8,
    timestamp_ms=1700000000000,
    vector=[0.1] * 512,
    scene_id="warehouse_sim",
    scale_level="patch",
)
state_id = client.insert(state)

# Batch insert (truly batched — one Qdrant call per epoch)
ids = client.insert_batch(states)

# Spatiotemporal query with overlap factor
results = client.query(
    vector=query_embedding,
    spatial_bounds={"x_min": 0.2, "x_max": 0.8,
                    "y_min": 0.0, "y_max": 1.0,
                    "z_min": 0.0, "z_max": 1.0},
    time_window_ms=(start_ms, end_ms),
    limit=10,
    overlap_factor=1.2,  # 20% expanded search for boundary recall
)

# Predict-then-retrieve with novelty scoring
result = client.predict_and_retrieve(
    context_vector=current_embedding,
    predictor_fn=my_world_model,
    future_horizon_ms=2000,
    current_position=(0.5, 0.3, 0.8),
)

# Trajectory reconstruction via scroll API
trajectory = client.get_trajectory(state_id, steps_back=20, steps_forward=20)

# Episodic context window
context = client.get_causal_context(state_id, window_ms=5000)

Async API (parallel shard fan-out)

from loci import AsyncLociClient

async with AsyncLociClient(
    "http://localhost:6333",
    vector_size=512,
    distance="cosine",
) as client:
    await client.insert(state)
    results = await client.query(vector=query_embedding, limit=10)

World Model Adapters

from loci.adapters.vjepa2 import VJEPA2Adapter
from loci.adapters.dreamer import DreamerV3Adapter
from loci.adapters.generic import GenericAdapter

# V-JEPA 2
adapter = VJEPA2Adapter()
states = adapter.batch_clip_to_states(clip_output, ts, scene_id)

# DreamerV3
adapter = DreamerV3Adapter()
ws = adapter.rssm_to_world_state(h_t, z_t, position, ts, scene_id)

# Generic numpy/torch
adapter = GenericAdapter(expected_dim=512)
ws = adapter.from_numpy(embedding, position, ts, scene_id)

Performance

Run the publication benchmark to generate numbers for your hardware:

# In-memory (no Qdrant server needed):
python benchmarks/vs_naive_qdrant.py

# Against a live Qdrant server:
QDRANT_URL=http://localhost:6333 python benchmarks/vs_naive_qdrant.py

Results are written to benchmarks/results/latest.json and printed as a markdown table. The benchmark includes both historical fixed-r4 baselines and a LOCI current arm that mirrors the shipped query path more closely.

For the local in-memory backend, run python benchmarks/local_benchmark.py for insert/query throughput.

Why not SpatCode?

SpatCode (WWW 2026, arXiv 2601.09530) encodes coordinates into the embedding space for soft/fuzzy retrieval via RoPE-style positional encoding. LOCI uses Hilbert bucketing for exact geometric range queries with deterministic behavior.

Use SpatCode when semantic proximity matters (e.g., "find images taken near this location").

Use LOCI when physical boundaries matter (e.g., "find all observations within this 3D bounding box in the last 5 seconds").

Why not TANNS?

TANNS (ICDE 2025) builds a single graph managing all timestamps internally with a Timestamp Graph structure. LOCI uses collection-level sharding with storage tiering.

Use TANNS for single-session temporal ANN where all data fits in one graph.

Use LOCI when you need cross-session persistence, multi-agent memory sharing, hot/warm/cold storage tiering, or predict-then-retrieve.

Architecture

┌───────────────────────────────────────────────┐
│              Application Layer                │
│  LociClient / AsyncLociClient / LocalLociClient│
│  insert · query · predict_and_retrieve        │
├───────────────────────────────────────────────┤
│              Retrieval Layer                  │
│  predict.py — predict-then-retrieve + novelty │
│  funnel.py  — multi-scale coarse→fine search  │
├───────────────────────────────────────────────┤
│           Indexing & Routing Layer            │
│  spatial/  — multi-res Hilbert + overlap      │
│  temporal/ — epoch sharding + decay scoring   │
├───────────────────────────────────────────────┤
│              Adapters Layer                   │
│  V-JEPA 2 · DreamerV3 · Generic numpy/torch  │
├───────────────────────────────────────────────┤
│              Storage Layer                    │
│  Qdrant (one collection per temporal epoch)   │
│  MemoryStore (in-process, no infra needed)    │
└───────────────────────────────────────────────┘

See ARCHITECTURE.md for the full design document.

Documentation

Development

git clone https://github.com/zd87pl/loci-db.git
cd loci-db
pip install -e ".[dev]"
pytest tests/ -v

# Linting & formatting (must pass in CI)
ruff check loci/ tests/
ruff format --check loci/ tests/
mypy loci/

Roadmap

See ROADMAP.md for the v0.1 → v1.0 plan.

Citation

@misc{loci2026,
  title={LOCI: A 4D Spatiotemporal Vector Database for AI World Models},
  author={Dyras, Zygmunt},
  year={2026},
  url={https://github.com/zd87pl/loci-db}
}

License

Apache 2.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

loci_stdb-0.3.0.tar.gz (214.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

loci_stdb-0.3.0-py3-none-any.whl (53.5 kB view details)

Uploaded Python 3

File details

Details for the file loci_stdb-0.3.0.tar.gz.

File metadata

  • Download URL: loci_stdb-0.3.0.tar.gz
  • Upload date:
  • Size: 214.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for loci_stdb-0.3.0.tar.gz
Algorithm Hash digest
SHA256 d53ed3767c117692387d5293355d667d829df3bc1325cfa2d303817f6bd9c764
MD5 9399e20c1921cc4d8e7ad9791401c38a
BLAKE2b-256 bc4cf85748ff7875dfacef5c68bdcdb4a4732520e0257877dbeaf40ae29bcef4

See more details on using hashes here.

File details

Details for the file loci_stdb-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: loci_stdb-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 53.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for loci_stdb-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4d2a464730867d1b61f6b4ceb541dc57a8206bec949fe2e8f92185768565da8d
MD5 4d15e617ac6833f03739b447dc527ccc
BLAKE2b-256 bfe010f1bff93ae9d070168110de34d1ac1b754a29938cf124f5ae488a997ab6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page