Skip to main content

⚠️ EXPERIMENTAL: High-performance Polars plugin for GPS trajectory indexing using 3D Hilbert curves. Not for production use.

Project description

polars-gps-hilbert

⚠️ EXPERIMENTAL SOFTWARE: This plugin is in active development and should not be used in production environments. APIs may change without notice.

A high-performance Polars plugin for GPS trajectory indexing using 3D Hilbert curves. This plugin enables efficient spatial-temporal indexing of GPS data for applications like trajectory analysis, route deviation detection, and spatial queries.

Features

  • 3D Hilbert Curve Indexing: Combines latitude, longitude, and timestamp into a single index
  • Lazy Evaluation: Supports out-of-core processing for datasets larger than memory
  • Automatic Bounds Detection: Automatically computes spatial and temporal bounds or accepts pre-computed values
  • High Performance: Written in Rust for maximum performance (10-15M points/second)
  • Polars Integration: Seamless integration with Polars' expression API and lazy evaluation

Installation

From Source

# Clone the repository
git clone https://github.com/nullbutt/polars-gps-hilbert.git
cd polars-gps-hilbert

# Install UV (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Build and install the plugin
./build_and_test.sh

Requirements

  • Python >= 3.8
  • Rust toolchain (for building from source)
  • Polars >= 0.20.0
  • UV (for Python project management)

Quick Start

import polars as pl
import polars_gps_hilbert as gps

# Load GPS trajectory data
df = pl.read_parquet("gps_data.parquet")

# Add Hilbert index with automatic bounds computation
df = df.with_columns(
    gps.hilbert_index("latitude", "longitude", "timestamp").alias("hilbert_idx")
)

# Use for spatial queries
nearby_points = df.sort("hilbert_idx").slice(1000, 100)

Usage Examples

Basic Indexing

import polars as pl
import polars_gps_hilbert as gps

df = pl.DataFrame({
    "latitude": [37.7749, 34.0522, 40.7128],
    "longitude": [-122.4194, -118.2437, -74.0060],
    "timestamp": [1640995200, 1640998800, 1641002400]
})

result = df.with_columns(
    gps.hilbert_index("latitude", "longitude", "timestamp").alias("hilbert_idx")
)

With Pre-computed Bounds (Better Performance)

bounds = {
    "lat_min": -90.0, "lat_max": 90.0,
    "lon_min": -180.0, "lon_max": 180.0,
    "ts_min": 1640995200, "ts_max": 1767225600
}

df = df.with_columns(
    gps.hilbert_index("latitude", "longitude", "timestamp", bounds=bounds).alias("hilbert_idx")
)

Lazy Evaluation for Large Datasets

# Perfect for datasets larger than memory
lf = pl.scan_parquet("huge_gps_dataset.parquet")

result = lf.with_columns(
    gps.hilbert_index("latitude", "longitude", "timestamp").alias("hilbert_idx")
).filter(
    pl.col("hilbert_idx") < 1000000
).collect()

Spatial Range Queries

# Sort by Hilbert index for efficient spatial queries
df_sorted = df.sort("hilbert_idx")

# Find nearby points using index ranges
reference_idx = df_sorted["hilbert_idx"][1000]
range_size = 1000000

nearby = df_sorted.filter(
    (pl.col("hilbert_idx") >= reference_idx - range_size) &
    (pl.col("hilbert_idx") <= reference_idx + range_size)
)

Performance

Benchmarked on Apple M1 MacBook Pro:

  • Throughput: 10-15 million points/second
  • Memory overhead: 8 bytes per index (21.8% increase)
  • Spatial locality: Excellent for nearby GPS points
  • Scales linearly: Tested up to 200M points

Dataset Generation

Generate test datasets for benchmarking:

# Generate 1M point test dataset
cd examples
python generate_sample_data.py

# Generate full 200M point dataset (requires significant time/space)
python generate_sample_data.py --full

Test datasets include:

  • GPS points across 50+ countries
  • 5,000+ unique vehicles
  • Realistic trajectory simulation
  • Timestamps from 2022-2025

Testing

# Run basic functionality tests
pytest tests/test_basic.py

# Run performance tests
pytest tests/test_performance.py

# Run all tests
pytest tests/

# Run examples
python examples/basic_usage.py

# Benchmark 200M dataset
python examples/benchmark_200M.py

Project Structure

polars-gps-hilbert/
├── src/                     # Rust source code
│   ├── lib.rs              # Library entry point
│   ├── expressions.rs      # Polars expression functions
│   ├── gps_hilbert.rs     # Core Hilbert indexing logic
│   └── utils.rs           # Utility functions
├── python/                 # Python bindings
│   └── polars_gps_hilbert/
│       └── __init__.py    # Python API
├── examples/               # Usage examples and benchmarks
│   ├── basic_usage.py     # Basic usage examples
│   ├── generate_sample_data.py  # Dataset generation
│   └── benchmark_200M.py  # Large-scale benchmarks
├── tests/                  # Test suite
│   ├── test_basic.py      # Basic functionality tests
│   ├── test_performance.py # Performance tests
│   └── test_gps_hilbert.py # Legacy comprehensive tests
└── Cargo.toml             # Rust dependencies

Technical Details

  • Hilbert Curves: Uses the lindel crate for efficient 3D Hilbert curve computation
  • Precision: Configurable bits per dimension (default: 21 bits, 63 total)
  • Bounds: Auto-computed or pre-specified for optimal performance
  • Null Handling: Graceful handling of missing GPS coordinates
  • Validation: Input validation for GPS coordinate ranges

Use Cases

  1. Vehicle Trajectory Analysis: Index millions of GPS points for efficient trajectory reconstruction
  2. Route Deviation Detection: Quickly identify when vehicles deviate from expected routes
  3. Spatial-Temporal Clustering: Group nearby points in space and time
  4. Geofencing: Efficient point-in-region queries
  5. Traffic Pattern Analysis: Analyze movement patterns across time and space

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

License

MIT License - see LICENSE file for details.

Acknowledgments

  • Built on Polars for high-performance data processing
  • Uses lindel for Hilbert curve implementations
  • Inspired by spatial indexing techniques in geographic information systems

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

polars_gps_hilbert-0.1.1.tar.gz (44.5 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

polars_gps_hilbert-0.1.1-cp39-abi3-win_amd64.whl (5.1 MB view details)

Uploaded CPython 3.9+Windows x86-64

polars_gps_hilbert-0.1.1-cp39-abi3-manylinux_2_34_x86_64.whl (5.4 MB view details)

Uploaded CPython 3.9+manylinux: glibc 2.34+ x86-64

polars_gps_hilbert-0.1.1-cp39-abi3-macosx_11_0_arm64.whl (4.6 MB view details)

Uploaded CPython 3.9+macOS 11.0+ ARM64

File details

Details for the file polars_gps_hilbert-0.1.1.tar.gz.

File metadata

  • Download URL: polars_gps_hilbert-0.1.1.tar.gz
  • Upload date:
  • Size: 44.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for polars_gps_hilbert-0.1.1.tar.gz
Algorithm Hash digest
SHA256 5e65ea7b42b3405f4671e96594525ac169a9a225ee9012debdfd8312a5ea4551
MD5 36a28c17fb57075de07ef571d72a95c0
BLAKE2b-256 f70f94ffe7624d3b0f3f7c83e9e4fb1a70068ece9f35618d50bed86e5a154074

See more details on using hashes here.

File details

Details for the file polars_gps_hilbert-0.1.1-cp39-abi3-win_amd64.whl.

File metadata

File hashes

Hashes for polars_gps_hilbert-0.1.1-cp39-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 b013b7bd46e014ba2e24c88b69a260b0c1f23911ef674bf9d106d6c5bd9c5818
MD5 2fbb701b4e709368e8f3d2b145fb618d
BLAKE2b-256 57c8d4023525c1e764ce56ea5d8e66b7f7e2bc59473bf074857a7d2d9c14df3d

See more details on using hashes here.

File details

Details for the file polars_gps_hilbert-0.1.1-cp39-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for polars_gps_hilbert-0.1.1-cp39-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 451eb906fb78520168d778148a7f2919bda4144fe766d95cdc90f69f005fbc21
MD5 df956e2a886f46642357216cc03f2830
BLAKE2b-256 d2d4d7bee61e62c1937591d670424a191e905c09b190c91afbfd0776d631bb7e

See more details on using hashes here.

File details

Details for the file polars_gps_hilbert-0.1.1-cp39-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for polars_gps_hilbert-0.1.1-cp39-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 408fed747f84506920bd9154048541b7a3aa1b138c9b13ee9247916cb76b8c29
MD5 df30141c1f36c0cef508076832fc339b
BLAKE2b-256 f9c6e6cc8b7bcb6289994f23e3b72b3caf753651291cf1d18ca197410d6ed42f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page