Skip to main content

Fast I/O and transformation tools for GeoParquet files

Project description

geoparquet-io

Tests Python Version License Code style: ruff

Fast I/O and transformation tools for GeoParquet files using PyArrow and DuckDB.

Features

  • Fast: Built on PyArrow and DuckDB for high-performance operations
  • Comprehensive: Sort, partition, enhance, and validate GeoParquet files
  • Spatial Indexing: Add bbox, H3 hexagonal cells, KD-tree partitions, and hierarchical admin divisions
  • Best Practices: Automatic optimization following GeoParquet 1.1 spec
  • Flexible: CLI and Python API for any workflow
  • Tested: Extensive test suite across Python 3.9-3.13 and all platforms

Installation

# With uv (recommended)
uv pip install geoparquet-io

# Or with pip
pip install geoparquet-io

# From source
git clone https://github.com/cholmes/geoparquet-io.git
cd geoparquet-io
uv sync --all-extras

For full development set up see the getting started instructions.

Requirements

  • Python 3.9 or higher
  • PyArrow 12.0.0+
  • DuckDB 1.1.3+

Quick Start

# Inspect file structure and metadata
gpio inspect myfile.parquet

# Check file quality and best practices
gpio check all myfile.parquet

# Add bounding box column for faster queries
gpio add bbox input.parquet output.parquet

# Sort using Hilbert curve for spatial locality
gpio sort hilbert input.parquet output_sorted.parquet

# Partition by admin boundaries
gpio partition admin buildings.parquet output_dir/ --dataset gaul --levels continent,country

Documentation

Full documentation is available at: https://cholmes.github.io/geoparquet-io/

Usage Examples

Inspect and Validate

# Quick metadata inspection
gpio inspect data.parquet

# Preview first 10 rows
gpio inspect data.parquet --head 10

# Check against best practices
gpio check all data.parquet

Enhance with Spatial Indices

# Add bounding boxes
gpio add bbox input.parquet output.parquet

# Add H3 hexagonal cell IDs
gpio add h3 input.parquet output.parquet --resolution 9

# Add KD-tree partition IDs (auto-balanced)
gpio add kdtree input.parquet output.parquet

# Add country codes via spatial join (default dataset)
gpio add admin-divisions buildings.parquet output.parquet

# Add GAUL hierarchical admin divisions (continent, country, department)
gpio add admin-divisions buildings.parquet output.parquet --dataset gaul

Optimize and Partition

# Sort by Hilbert curve
gpio sort hilbert input.parquet sorted.parquet

# Partition by H3 cells
gpio partition h3 large.parquet output_dir/ --resolution 7

# Partition by admin boundaries with spatial extent filtering
gpio partition admin buildings.parquet by_admin/ --dataset gaul --levels continent,country

# Multi-level Hive-style partitioning (continent=Africa/country=Kenya/...)
gpio partition admin buildings.parquet by_admin/ --dataset gaul --levels continent,country,department --hive

Python API

from geoparquet_io.core.add_bbox_column import add_bbox_column
from geoparquet_io.core.hilbert_order import hilbert_order

# Add bounding box
add_bbox_column("input.parquet", "output.parquet", verbose=True)

# Sort by Hilbert curve
hilbert_order("input.parquet", "sorted.parquet", add_bbox=True)

Contributing

Contributions are welcome! See our Contributing Guide for details.

Development

# Clone repository
git clone https://github.com/cholmes/geoparquet-io.git
cd geoparquet-io

# Install with all development dependencies
uv sync --all-extras

# Run tests
uv run pytest

# Run linting
uv run ruff check .

# Build docs locally
uv run mkdocs serve

License

Apache 2.0 - See LICENSE for details.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geoparquet_io-0.3.0.tar.gz (287.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

geoparquet_io-0.3.0-py3-none-any.whl (85.3 kB view details)

Uploaded Python 3

File details

Details for the file geoparquet_io-0.3.0.tar.gz.

File metadata

  • Download URL: geoparquet_io-0.3.0.tar.gz
  • Upload date:
  • Size: 287.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for geoparquet_io-0.3.0.tar.gz
Algorithm Hash digest
SHA256 b407f5f47473f8f02ddfdaf63d8831669968d6664f136e963443774b28299311
MD5 aaffd0a834d696fd9ae5be0d89dd68fe
BLAKE2b-256 537cfff1fa3fceb36892e61f2057c465d47424dd2605b5fa2979c2c08acd81ce

See more details on using hashes here.

Provenance

The following attestation bundles were made for geoparquet_io-0.3.0.tar.gz:

Publisher: publish.yml on cholmes/geoparquet-io

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file geoparquet_io-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: geoparquet_io-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 85.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for geoparquet_io-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f5b965774e4df21b3df2ff80ee0531264a8aef5f60158263456b238795d76eed
MD5 f09732ee046743fdfabfbca2bfca2fb0
BLAKE2b-256 08822c19549b81bc6a2f4695323f7aa7f067426d7f827fefd912cf973fa47c68

See more details on using hashes here.

Provenance

The following attestation bundles were made for geoparquet_io-0.3.0-py3-none-any.whl:

Publisher: publish.yml on cholmes/geoparquet-io

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page