High-performance Polars plugin for GPS trajectory indexing using 3D Hilbert curves
Project description
polars-gps-hilbert
A high-performance Polars plugin for GPS trajectory indexing using 3D Hilbert curves. This plugin enables efficient spatial-temporal indexing of GPS data for applications like trajectory analysis, route deviation detection, and spatial queries.
Features
- 3D Hilbert Curve Indexing: Combines latitude, longitude, and timestamp into a single index
- Lazy Evaluation: Supports out-of-core processing for datasets larger than memory
- Automatic Bounds Detection: Automatically computes spatial and temporal bounds or accepts pre-computed values
- High Performance: Written in Rust for maximum performance (10-15M points/second)
- Polars Integration: Seamless integration with Polars' expression API and lazy evaluation
Installation
From Source
# Clone the repository
git clone https://github.com/nullbutt/polars-gps-hilbert.git
cd polars-gps-hilbert
# Install UV (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Build and install the plugin
./build_and_test.sh
Requirements
- Python >= 3.8
- Rust toolchain (for building from source)
- Polars >= 0.20.0
- UV (for Python project management)
Quick Start
import polars as pl
import polars_gps_hilbert as gps
# Load GPS trajectory data
df = pl.read_parquet("gps_data.parquet")
# Add Hilbert index with automatic bounds computation
df = df.with_columns(
gps.hilbert_index("latitude", "longitude", "timestamp").alias("hilbert_idx")
)
# Use for spatial queries
nearby_points = df.sort("hilbert_idx").slice(1000, 100)
Usage Examples
Basic Indexing
import polars as pl
import polars_gps_hilbert as gps
df = pl.DataFrame({
"latitude": [37.7749, 34.0522, 40.7128],
"longitude": [-122.4194, -118.2437, -74.0060],
"timestamp": [1640995200, 1640998800, 1641002400]
})
result = df.with_columns(
gps.hilbert_index("latitude", "longitude", "timestamp").alias("hilbert_idx")
)
With Pre-computed Bounds (Better Performance)
bounds = {
"lat_min": -90.0, "lat_max": 90.0,
"lon_min": -180.0, "lon_max": 180.0,
"ts_min": 1640995200, "ts_max": 1767225600
}
df = df.with_columns(
gps.hilbert_index("latitude", "longitude", "timestamp", bounds=bounds).alias("hilbert_idx")
)
Lazy Evaluation for Large Datasets
# Perfect for datasets larger than memory
lf = pl.scan_parquet("huge_gps_dataset.parquet")
result = lf.with_columns(
gps.hilbert_index("latitude", "longitude", "timestamp").alias("hilbert_idx")
).filter(
pl.col("hilbert_idx") < 1000000
).collect()
Spatial Range Queries
# Sort by Hilbert index for efficient spatial queries
df_sorted = df.sort("hilbert_idx")
# Find nearby points using index ranges
reference_idx = df_sorted["hilbert_idx"][1000]
range_size = 1000000
nearby = df_sorted.filter(
(pl.col("hilbert_idx") >= reference_idx - range_size) &
(pl.col("hilbert_idx") <= reference_idx + range_size)
)
Performance
Benchmarked on Apple M1 MacBook Pro:
- Throughput: 10-15 million points/second
- Memory overhead: 8 bytes per index (21.8% increase)
- Spatial locality: Excellent for nearby GPS points
- Scales linearly: Tested up to 200M points
Dataset Generation
Generate test datasets for benchmarking:
# Generate 1M point test dataset
cd examples
python generate_sample_data.py
# Generate full 200M point dataset (requires significant time/space)
python generate_sample_data.py --full
Test datasets include:
- GPS points across 50+ countries
- 5,000+ unique vehicles
- Realistic trajectory simulation
- Timestamps from 2022-2025
Testing
# Run basic functionality tests
pytest tests/test_basic.py
# Run performance tests
pytest tests/test_performance.py
# Run all tests
pytest tests/
# Run examples
python examples/basic_usage.py
# Benchmark 200M dataset
python examples/benchmark_200M.py
Project Structure
polars-gps-hilbert/
├── src/ # Rust source code
│ ├── lib.rs # Library entry point
│ ├── expressions.rs # Polars expression functions
│ ├── gps_hilbert.rs # Core Hilbert indexing logic
│ └── utils.rs # Utility functions
├── python/ # Python bindings
│ └── polars_gps_hilbert/
│ └── __init__.py # Python API
├── examples/ # Usage examples and benchmarks
│ ├── basic_usage.py # Basic usage examples
│ ├── generate_sample_data.py # Dataset generation
│ └── benchmark_200M.py # Large-scale benchmarks
├── tests/ # Test suite
│ ├── test_basic.py # Basic functionality tests
│ ├── test_performance.py # Performance tests
│ └── test_gps_hilbert.py # Legacy comprehensive tests
└── Cargo.toml # Rust dependencies
Technical Details
- Hilbert Curves: Uses the
lindelcrate for efficient 3D Hilbert curve computation - Precision: Configurable bits per dimension (default: 21 bits, 63 total)
- Bounds: Auto-computed or pre-specified for optimal performance
- Null Handling: Graceful handling of missing GPS coordinates
- Validation: Input validation for GPS coordinate ranges
Use Cases
- Vehicle Trajectory Analysis: Index millions of GPS points for efficient trajectory reconstruction
- Route Deviation Detection: Quickly identify when vehicles deviate from expected routes
- Spatial-Temporal Clustering: Group nearby points in space and time
- Geofencing: Efficient point-in-region queries
- Traffic Pattern Analysis: Analyze movement patterns across time and space
Contributing
Contributions are welcome! Please feel free to submit issues or pull requests.
License
MIT License - see LICENSE file for details.
Acknowledgments
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file polars_gps_hilbert-0.1.0.tar.gz.
File metadata
- Download URL: polars_gps_hilbert-0.1.0.tar.gz
- Upload date:
- Size: 36.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9bd89f5092e070d2e155aae161096c642b689f286dcabe8961a2a9accf27f0e3
|
|
| MD5 |
b7022ffe6418c84f7a310d468564c621
|
|
| BLAKE2b-256 |
b677800b25708af8720b8792d270f7e88741ef9e7c4a3eb2f459acce287b1bfa
|
File details
Details for the file polars_gps_hilbert-0.1.0-cp38-abi3-win_amd64.whl.
File metadata
- Download URL: polars_gps_hilbert-0.1.0-cp38-abi3-win_amd64.whl
- Upload date:
- Size: 4.3 MB
- Tags: CPython 3.8+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ee98e4e3caae9f831e1cdb1c3b5a82783aec270cb142556d3fccb82335659d5f
|
|
| MD5 |
7db8e7d7a9bc5c803111880dac274f71
|
|
| BLAKE2b-256 |
d12274b53436b21e74d8037a76b2e4a00d8bd6e4d6d55a2f6167a94619f5a493
|
File details
Details for the file polars_gps_hilbert-0.1.0-cp38-abi3-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: polars_gps_hilbert-0.1.0-cp38-abi3-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 4.9 MB
- Tags: CPython 3.8+, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
972f7b82b131abb7b25c5f2325945a286c278b06087ce1e849a639898fa73140
|
|
| MD5 |
a3d9cc52482e5c6936eb0c907f52d03d
|
|
| BLAKE2b-256 |
db0fe060fa19fd159cca3ffae05d36b0a5f89380fa74c4ab1c7721efbe80919a
|
File details
Details for the file polars_gps_hilbert-0.1.0-cp38-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: polars_gps_hilbert-0.1.0-cp38-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 4.1 MB
- Tags: CPython 3.8+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d88c66498467cbd28d841285a6586fd6028575a03ce2b64c2c46eef5e060fa84
|
|
| MD5 |
3763577e7f9aad92b5ff953e1b744370
|
|
| BLAKE2b-256 |
055d8702dfe44eb1df82ddb0a8370082f1f432cb2df6d9f9de202f56bdd559b6
|