Geospatial query engine with dynamic index selection
Project description
A spatial query layer for Polars. Rust core, Python API.
Background
Polars has no native spatial query support. Getting bounding-box filters, k-nearest neighbours, or point-in-polygon tests on a Polars DataFrame typically means converting to GeoPandas, managing an index manually, or scanning every row in Python.
GeoPandas applies linear scans by default for containment and range tests; its STRtree requires explicit opt-in via .sindex and is the only available index type regardless of data distribution. KNN has no built-in path at all and requires a separate library.
PyCanopy adds a declarative lazy query layer directly on Polars DataFrames. You describe the spatial operations you want, and PyCanopy decides which index to build, in what order to run each operation, and what to hand off to Polars to execute.
Installation
pip install pycanopy
Pre-built wheels for Linux, macOS, and Windows. No Rust toolchain required.
Usage
Point dataset: range and KNN
import polars as pl
from pycanopy import SpatialFrame
df = pl.read_parquet("cities.parquet")
sf = SpatialFrame(df, x_col="lon", y_col="lat")
# Bounding-box filter combined with a scalar predicate.
# Optimizer places the scalar filter first, then runs the range query
# on the reduced row set.
result = (
sf.lazy()
.filter(pl.col("population") > 100_000)
.range_query(min_x=-10.0, min_y=35.0, max_x=40.0, max_y=70.0)
.collect()
)
# k-nearest neighbours
nearest = sf.lazy().knn(x=2.35, y=48.85, k=5).collect()
Chaining multiple spatial predicates
# Two range predicates are fused into a single index build on large datasets.
result = (
sf.lazy()
.range_query(0.0, 0.0, 50.0, 50.0)
.range_query(10.0, 10.0, 40.0, 40.0)
.collect()
)
KNN join
query_df = pl.DataFrame({"qx": [2.35, 13.4], "qy": [48.85, 52.5]})
# For each row in query_df, find the 3 nearest rows in sf.
result = sf.lazy().knn_join(query_df, x_col="qx", y_col="qy", k=3).collect()
Polygon dataset: contains and range
from shapely.geometry import box
from pycanopy import SpatialFrame
polygons = [box(i, 0, i + 0.9, 0.9) for i in range(100_000)]
df = pl.DataFrame({"id": list(range(100_000)), "geom": polygons})
sf = SpatialFrame.from_polygons(df, geometry_col="geom")
# Which polygons contain this point?
containing = sf.lazy().contains(x=5.5, y=0.5).collect()
# Which polygon MBRs intersect this bbox?
intersecting = sf.lazy().range_query(0.0, 0.0, 10.0, 1.0).collect()
Polygon holes
from shapely.geometry import Polygon
# Interior rings (holes) are fully supported.
outer = [(0, 0), (10, 0), (10, 10), (0, 10)]
hole = [(2, 2), (8, 2), (8, 8), (2, 8)]
donut = Polygon(outer, [hole])
sf = SpatialFrame.from_polygons(pl.DataFrame({"id": [0], "geom": [donut]}), geometry_col="geom")
# Point inside the hole is NOT contained.
sf.lazy().contains(x=5.0, y=5.0).collect() # empty
# Point outside the hole but inside the outer ring IS contained.
sf.lazy().contains(x=1.0, y=1.0).collect() # returns the polygon row
Within join
# For each query point, find which polygons in sf contain it.
query_df = pl.DataFrame({"qx": [5.5, 12.3], "qy": [0.5, 0.5]})
result = sf.lazy().within_join(query_df, x_col="qx", y_col="qy").collect()
Within-distance join
# For each query point, find all sf points within 50 km.
query_df = pl.DataFrame({"qx": [2.35, 13.4], "qy": [48.85, 52.5]})
result = sf.lazy().within_distance_join(query_df, x_col="qx", y_col="qy", distance=50.0).collect()
Branching from a shared base
from pycanopy import SpatialFrame, SpatialLazyFrame
# Expensive filter applied once; two queries branch from the result.
base = sf.lazy().filter(pl.col("population") > 100_000).range_query(-10.0, 35.0, 40.0, 70.0)
major = base.filter(pl.col("population") > 1_000_000)
minor = base.filter(pl.col("population") <= 1_000_000)
# collect_all detects the shared prefix, caches it in Polars,
# and executes both branches in a single pass.
results = SpatialLazyFrame.collect_all([major, minor])
df_major, df_minor = results
Live updates via delta buffer
# Append new points — visible to queries immediately, no index rebuild yet.
import numpy as np
sf.engine.append_delta(np.array([2.5]), np.array([48.9]))
# Queries probe the main index and scan the delta in parallel.
result = sf.lazy().range_query(-10.0, 35.0, 40.0, 70.0).collect()
# The buffer flushes automatically when accumulated query cost exceeds
# the estimated index rebuild cost, or when it exceeds 10% of N.
# Force a flush manually if needed.
sf.engine.flush()
Accepted input formats
| Format | Example |
|---|---|
numpy (N, 2) array |
np.array([[x, y], ...]) |
| GeoArrow PyArrow array | pa.StructArray or FixedSizeList<2> |
geopandas GeoSeries |
gdf.geometry |
| list of shapely Points or Polygons | [Point(x, y), ...] |
list of (x, y) tuples |
[(x, y), ...] |
| Separate coordinate sequences | Engine.from_coords(xs, ys) |
Benchmarks
All measurements on Apple M-series, uniform random data. Warm = second call with cached index. Index build = cold minus warm (one-time cost amortised across queries). Naive baseline is GeoPandas.
Single-query ops (N=100,000)
| Operation | Index build | Warm | GeoPandas | Speedup |
|---|---|---|---|---|
| Range query | 9 ms | 177 µs | 5.68 ms | 32× |
| kNN k=10 | 73 ms | 22 µs | 6.35 ms | 289× |
| Polygon contains | 127 ms | 20 µs | 6.54 ms | 326× |
| Polygon range | 129 ms | 333 µs | 4.31 ms | 13× |
Batch joins (N=Q=10,000)
| Operation | Index build | Warm | Naive loop | Speedup |
|---|---|---|---|---|
| kNN join k=5 | 17 ms | 16.7 ms | 6.11 s | 366× |
| Within-distance join | 2 ms | 67.8 ms | 1.60 s | 24× |
| Within join (polygon) | 19 ms | 10.1 ms | 4.68 s | 463× |
Sample Chained lazy queries (N=100,000)
Each row is a multi-predicate chain run through the optimizer. GeoPandas applies all predicates manually with no lazy planning.
| Chain | Index build | Warm | GeoPandas | Speedup |
|---|---|---|---|---|
circ_scalar → range³ |
19 ms | 1.03 ms | 9.31 ms | 9× |
3× scalar → range² → scalar |
8 ms | 0.70 ms | 5.74 ms | 8× |
range² → 3× scalar (reordered) |
7 ms | 0.56 ms | 5.71 ms | 10× |
circ_scalar → range → scalar → range² |
7 ms | 0.78 ms | 8.20 ms | 11× |
How It Works
Query Flow
sf.lazy().filter(...).range_query(...).knn_join(...).collect()
│
┌────────────▼────────────┐
│ SpatialOptimizer │
│ • reorder ops by cost │
│ • fuse spatial preds │
│ • select index type │
│ • spatial join order │
└────────────┬────────────┘
│
┌────────────▼────────────┐
│ Polars executes │
│ scalar filters first │
│ then spatial queries │
└────────────┬────────────┘
│
pl.DataFrame
Implementation Details
Optimizer decisions
- Predicate Pushdown: scalar predicates are placed before spatial ones. They cost nothing extra and shrink the row count before any index is touched.
- Fusion: consecutive spatial predicates on large datasets are merged into a single index build and one pass over the data.
- Index type: selected per query based on geometry type, data distribution, and selectivity (see Index Management below).
- Spatial Join Order: for symmetric joins (
within_join,within_distance_join), the optimizer indexes the smaller side when it is less than half the size of the other, minimizing index build cost.knn_joinis asymmetric and always indexes the engine side.
Index Management
Indexes are built lazily. Nothing is constructed at load time; stats (extent, point distribution, a 32x32 histogram) are computed eagerly and drive selection at the first query. The selected index is then cached for all subsequent queries.
| Condition | Index |
|---|---|
| N < 500, selectivity > 50%, or k/N > 10% | Brute force |
| Point range, uniform distribution | Uniform grid |
| Point range, clustered distribution | KD-tree |
| Point KNN or contains | KD-tree |
| Polygons, any query | R-tree |
All index types share the same underlying coordinate arrays with no duplication.
Why Rust
The hot paths need packed immutable index structures, zero-copy array slices at the Python boundary, and loop-level parallelism. C++ would require a separate FFI layer and loses the native Polars plugin integration that PyO3/Maturin provides for free.
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pycanopy-0.2.1.tar.gz.
File metadata
- Download URL: pycanopy-0.2.1.tar.gz
- Upload date:
- Size: 132.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
43757b5b97e534db2d67e093baecd322f70419726fe717338aa766d1189a05cb
|
|
| MD5 |
f2496d354085d19e262ae0aed3c46cc5
|
|
| BLAKE2b-256 |
4f72dedf0c48d6e19993aa603ca4d1920439eb2752e41c48771f598a39e858a2
|
Provenance
The following attestation bundles were made for pycanopy-0.2.1.tar.gz:
Publisher:
release.yml on pranav-walimbe/PyCanopy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pycanopy-0.2.1.tar.gz -
Subject digest:
43757b5b97e534db2d67e093baecd322f70419726fe717338aa766d1189a05cb - Sigstore transparency entry: 1730511807
- Sigstore integration time:
-
Permalink:
pranav-walimbe/PyCanopy@eb54c7314e7eb8896392ce1dbff84b7e9cbf1bbd -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/pranav-walimbe
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@eb54c7314e7eb8896392ce1dbff84b7e9cbf1bbd -
Trigger Event:
push
-
Statement type:
File details
Details for the file pycanopy-0.2.1-cp39-abi3-win_amd64.whl.
File metadata
- Download URL: pycanopy-0.2.1-cp39-abi3-win_amd64.whl
- Upload date:
- Size: 339.8 kB
- Tags: CPython 3.9+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8783984904918d44dec523a588693b1a4a454a75a9da68111801005610125428
|
|
| MD5 |
2c9fd7c11e169bc65da2866cc00e764e
|
|
| BLAKE2b-256 |
4ee9c7bfdec330c9fe400633828dba20821ee2c38fe030072f80da7633bd321c
|
Provenance
The following attestation bundles were made for pycanopy-0.2.1-cp39-abi3-win_amd64.whl:
Publisher:
release.yml on pranav-walimbe/PyCanopy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pycanopy-0.2.1-cp39-abi3-win_amd64.whl -
Subject digest:
8783984904918d44dec523a588693b1a4a454a75a9da68111801005610125428 - Sigstore transparency entry: 1730511956
- Sigstore integration time:
-
Permalink:
pranav-walimbe/PyCanopy@eb54c7314e7eb8896392ce1dbff84b7e9cbf1bbd -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/pranav-walimbe
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@eb54c7314e7eb8896392ce1dbff84b7e9cbf1bbd -
Trigger Event:
push
-
Statement type:
File details
Details for the file pycanopy-0.2.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: pycanopy-0.2.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 474.0 kB
- Tags: CPython 3.9+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cb0c3ad30cbf03ce92de4d53ec2ab22773714f8f87a96ef9356a82d6c1a7513e
|
|
| MD5 |
88ef61481e10e5781e89e9ac6ec37b13
|
|
| BLAKE2b-256 |
e75dd192588c768816b0f65cf2cf96f6d9b31d00a692baaeff21dfc80d590876
|
Provenance
The following attestation bundles were made for pycanopy-0.2.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:
Publisher:
release.yml on pranav-walimbe/PyCanopy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pycanopy-0.2.1-cp39-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl -
Subject digest:
cb0c3ad30cbf03ce92de4d53ec2ab22773714f8f87a96ef9356a82d6c1a7513e - Sigstore transparency entry: 1730512289
- Sigstore integration time:
-
Permalink:
pranav-walimbe/PyCanopy@eb54c7314e7eb8896392ce1dbff84b7e9cbf1bbd -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/pranav-walimbe
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@eb54c7314e7eb8896392ce1dbff84b7e9cbf1bbd -
Trigger Event:
push
-
Statement type:
File details
Details for the file pycanopy-0.2.1-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: pycanopy-0.2.1-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 462.3 kB
- Tags: CPython 3.9+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8f5f4c7c160506a9d732ead7dfb89e6cd8ece9695fb04d354bbd0e655229baac
|
|
| MD5 |
2e4948ef4a78bb225f75b277890cde6f
|
|
| BLAKE2b-256 |
8aaf16d542bdb5e0bc595477ecc66f0ef0b2278c808e70d769e5d67981ed8a40
|
Provenance
The following attestation bundles were made for pycanopy-0.2.1-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:
Publisher:
release.yml on pranav-walimbe/PyCanopy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pycanopy-0.2.1-cp39-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl -
Subject digest:
8f5f4c7c160506a9d732ead7dfb89e6cd8ece9695fb04d354bbd0e655229baac - Sigstore transparency entry: 1730512172
- Sigstore integration time:
-
Permalink:
pranav-walimbe/PyCanopy@eb54c7314e7eb8896392ce1dbff84b7e9cbf1bbd -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/pranav-walimbe
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@eb54c7314e7eb8896392ce1dbff84b7e9cbf1bbd -
Trigger Event:
push
-
Statement type:
File details
Details for the file pycanopy-0.2.1-cp39-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: pycanopy-0.2.1-cp39-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 419.8 kB
- Tags: CPython 3.9+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7cd06a3e7e6869a635840af8ab2fe364636f771ff323fe795ee84716f81041cf
|
|
| MD5 |
ab8b31f2b467424343ccdb9dbf379b7a
|
|
| BLAKE2b-256 |
f13f9e2a319b219b1116549a0e86943f4e093de3fe72ea9f5cd0aedfaa7d3aa6
|
Provenance
The following attestation bundles were made for pycanopy-0.2.1-cp39-abi3-macosx_11_0_arm64.whl:
Publisher:
release.yml on pranav-walimbe/PyCanopy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pycanopy-0.2.1-cp39-abi3-macosx_11_0_arm64.whl -
Subject digest:
7cd06a3e7e6869a635840af8ab2fe364636f771ff323fe795ee84716f81041cf - Sigstore transparency entry: 1730511878
- Sigstore integration time:
-
Permalink:
pranav-walimbe/PyCanopy@eb54c7314e7eb8896392ce1dbff84b7e9cbf1bbd -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/pranav-walimbe
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@eb54c7314e7eb8896392ce1dbff84b7e9cbf1bbd -
Trigger Event:
push
-
Statement type:
File details
Details for the file pycanopy-0.2.1-cp39-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: pycanopy-0.2.1-cp39-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 434.7 kB
- Tags: CPython 3.9+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
785b92aef8f8c03e20f3ec5b8bed5b4c22404c00b5344e4d2ff5b388e64fdbac
|
|
| MD5 |
c5509d4a93f5e1752e8b089040cc34b4
|
|
| BLAKE2b-256 |
db2ab86a5103da0a8a8c022f120e49f9adf7cec82d87ebd5e50bae7a82c79405
|
Provenance
The following attestation bundles were made for pycanopy-0.2.1-cp39-abi3-macosx_10_12_x86_64.whl:
Publisher:
release.yml on pranav-walimbe/PyCanopy
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pycanopy-0.2.1-cp39-abi3-macosx_10_12_x86_64.whl -
Subject digest:
785b92aef8f8c03e20f3ec5b8bed5b4c22404c00b5344e4d2ff5b388e64fdbac - Sigstore transparency entry: 1730512066
- Sigstore integration time:
-
Permalink:
pranav-walimbe/PyCanopy@eb54c7314e7eb8896392ce1dbff84b7e9cbf1bbd -
Branch / Tag:
refs/tags/v0.2.1 - Owner: https://github.com/pranav-walimbe
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@eb54c7314e7eb8896392ce1dbff84b7e9cbf1bbd -
Trigger Event:
push
-
Statement type: