Skip to main content

High-performance Polars plugin for GPS trajectory indexing using 3D Hilbert curves

Project description

polars-gps-hilbert

A high-performance Polars plugin for GPS trajectory indexing using 3D Hilbert curves. This plugin enables efficient spatial-temporal indexing of GPS data for applications like trajectory analysis, route deviation detection, and spatial queries.

Features

  • 3D Hilbert Curve Indexing: Combines latitude, longitude, and timestamp into a single index
  • Lazy Evaluation: Supports out-of-core processing for datasets larger than memory
  • Automatic Bounds Detection: Automatically computes spatial and temporal bounds or accepts pre-computed values
  • High Performance: Written in Rust for maximum performance (10-15M points/second)
  • Polars Integration: Seamless integration with Polars' expression API and lazy evaluation

Installation

From Source

# Clone the repository
git clone https://github.com/nullbutt/polars-gps-hilbert.git
cd polars-gps-hilbert

# Install UV (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Build and install the plugin
./build_and_test.sh

Requirements

  • Python >= 3.8
  • Rust toolchain (for building from source)
  • Polars >= 0.20.0
  • UV (for Python project management)

Quick Start

import polars as pl
import polars_gps_hilbert as gps

# Load GPS trajectory data
df = pl.read_parquet("gps_data.parquet")

# Add Hilbert index with automatic bounds computation
df = df.with_columns(
    gps.hilbert_index("latitude", "longitude", "timestamp").alias("hilbert_idx")
)

# Use for spatial queries
nearby_points = df.sort("hilbert_idx").slice(1000, 100)

Usage Examples

Basic Indexing

import polars as pl
import polars_gps_hilbert as gps

df = pl.DataFrame({
    "latitude": [37.7749, 34.0522, 40.7128],
    "longitude": [-122.4194, -118.2437, -74.0060],
    "timestamp": [1640995200, 1640998800, 1641002400]
})

result = df.with_columns(
    gps.hilbert_index("latitude", "longitude", "timestamp").alias("hilbert_idx")
)

With Pre-computed Bounds (Better Performance)

bounds = {
    "lat_min": -90.0, "lat_max": 90.0,
    "lon_min": -180.0, "lon_max": 180.0,
    "ts_min": 1640995200, "ts_max": 1767225600
}

df = df.with_columns(
    gps.hilbert_index("latitude", "longitude", "timestamp", bounds=bounds).alias("hilbert_idx")
)

Lazy Evaluation for Large Datasets

# Perfect for datasets larger than memory
lf = pl.scan_parquet("huge_gps_dataset.parquet")

result = lf.with_columns(
    gps.hilbert_index("latitude", "longitude", "timestamp").alias("hilbert_idx")
).filter(
    pl.col("hilbert_idx") < 1000000
).collect()

Spatial Range Queries

# Sort by Hilbert index for efficient spatial queries
df_sorted = df.sort("hilbert_idx")

# Find nearby points using index ranges
reference_idx = df_sorted["hilbert_idx"][1000]
range_size = 1000000

nearby = df_sorted.filter(
    (pl.col("hilbert_idx") >= reference_idx - range_size) &
    (pl.col("hilbert_idx") <= reference_idx + range_size)
)

Performance

Benchmarked on Apple M1 MacBook Pro:

  • Throughput: 10-15 million points/second
  • Memory overhead: 8 bytes per index (21.8% increase)
  • Spatial locality: Excellent for nearby GPS points
  • Scales linearly: Tested up to 200M points

Dataset Generation

Generate test datasets for benchmarking:

# Generate 1M point test dataset
cd examples
python generate_sample_data.py

# Generate full 200M point dataset (requires significant time/space)
python generate_sample_data.py --full

Test datasets include:

  • GPS points across 50+ countries
  • 5,000+ unique vehicles
  • Realistic trajectory simulation
  • Timestamps from 2022-2025

Testing

# Run basic functionality tests
pytest tests/test_basic.py

# Run performance tests
pytest tests/test_performance.py

# Run all tests
pytest tests/

# Run examples
python examples/basic_usage.py

# Benchmark 200M dataset
python examples/benchmark_200M.py

Project Structure

polars-gps-hilbert/
├── src/                     # Rust source code
│   ├── lib.rs              # Library entry point
│   ├── expressions.rs      # Polars expression functions
│   ├── gps_hilbert.rs     # Core Hilbert indexing logic
│   └── utils.rs           # Utility functions
├── python/                 # Python bindings
│   └── polars_gps_hilbert/
│       └── __init__.py    # Python API
├── examples/               # Usage examples and benchmarks
│   ├── basic_usage.py     # Basic usage examples
│   ├── generate_sample_data.py  # Dataset generation
│   └── benchmark_200M.py  # Large-scale benchmarks
├── tests/                  # Test suite
│   ├── test_basic.py      # Basic functionality tests
│   ├── test_performance.py # Performance tests
│   └── test_gps_hilbert.py # Legacy comprehensive tests
└── Cargo.toml             # Rust dependencies

Technical Details

  • Hilbert Curves: Uses the lindel crate for efficient 3D Hilbert curve computation
  • Precision: Configurable bits per dimension (default: 21 bits, 63 total)
  • Bounds: Auto-computed or pre-specified for optimal performance
  • Null Handling: Graceful handling of missing GPS coordinates
  • Validation: Input validation for GPS coordinate ranges

Use Cases

  1. Vehicle Trajectory Analysis: Index millions of GPS points for efficient trajectory reconstruction
  2. Route Deviation Detection: Quickly identify when vehicles deviate from expected routes
  3. Spatial-Temporal Clustering: Group nearby points in space and time
  4. Geofencing: Efficient point-in-region queries
  5. Traffic Pattern Analysis: Analyze movement patterns across time and space

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

License

MIT License - see LICENSE file for details.

Acknowledgments

  • Built on Polars for high-performance data processing
  • Uses lindel for Hilbert curve implementations
  • Inspired by spatial indexing techniques in geographic information systems

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_gps_hilbert-0.1.0.tar.gz (36.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_gps_hilbert-0.1.0-cp38-abi3-win_amd64.whl (4.3 MB view details)

Uploaded CPython 3.8+Windows x86-64

polars_gps_hilbert-0.1.0-cp38-abi3-manylinux_2_34_x86_64.whl (4.9 MB view details)

Uploaded CPython 3.8+manylinux: glibc 2.34+ x86-64

polars_gps_hilbert-0.1.0-cp38-abi3-macosx_11_0_arm64.whl (4.1 MB view details)

Uploaded CPython 3.8+macOS 11.0+ ARM64

File details

Details for the file polars_gps_hilbert-0.1.0.tar.gz.

File metadata

  • Download URL: polars_gps_hilbert-0.1.0.tar.gz
  • Upload date:
  • Size: 36.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for polars_gps_hilbert-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9bd89f5092e070d2e155aae161096c642b689f286dcabe8961a2a9accf27f0e3
MD5 b7022ffe6418c84f7a310d468564c621
BLAKE2b-256 b677800b25708af8720b8792d270f7e88741ef9e7c4a3eb2f459acce287b1bfa

See more details on using hashes here.

File details

Details for the file polars_gps_hilbert-0.1.0-cp38-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for polars_gps_hilbert-0.1.0-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 ee98e4e3caae9f831e1cdb1c3b5a82783aec270cb142556d3fccb82335659d5f
MD5 7db8e7d7a9bc5c803111880dac274f71
BLAKE2b-256 d12274b53436b21e74d8037a76b2e4a00d8bd6e4d6d55a2f6167a94619f5a493

See more details on using hashes here.

File details

Details for the file polars_gps_hilbert-0.1.0-cp38-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for polars_gps_hilbert-0.1.0-cp38-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 972f7b82b131abb7b25c5f2325945a286c278b06087ce1e849a639898fa73140
MD5 a3d9cc52482e5c6936eb0c907f52d03d
BLAKE2b-256 db0fe060fa19fd159cca3ffae05d36b0a5f89380fa74c4ab1c7721efbe80919a

See more details on using hashes here.

File details

Details for the file polars_gps_hilbert-0.1.0-cp38-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_gps_hilbert-0.1.0-cp38-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 d88c66498467cbd28d841285a6586fd6028575a03ce2b64c2c46eef5e060fa84
MD5 3763577e7f9aad92b5ff953e1b744370
BLAKE2b-256 055d8702dfe44eb1df82ddb0a8370082f1f432cb2df6d9f9de202f56bdd559b6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page