Skip to main content

Fast I/O and transformation tools for GeoParquet files

Project description

geoparquet-io

Tests Python Version License Code style: ruff

Fast I/O and transformation tools for GeoParquet files using PyArrow and DuckDB.

📚 Full Documentation | Quick Start Tutorial

Features

  • Fast: Built on PyArrow and DuckDB for high-performance operations
  • Pipeable: Chain commands with Unix pipes using Arrow IPC streaming - no intermediate files
  • Comprehensive: Sort, extract, partition, enhance, validate, and upload GeoParquet files
  • Cloud-Native: Read from and write to S3, GCS, Azure, and HTTPS sources
  • Spatial Indexing: Add bbox, H3 hexagonal cells, KD-tree partitions, and admin divisions
  • Best Practices: Automatic optimization following GeoParquet 1.1 and 2.0 specs
  • Parquet Geo Types support: Read and write Parquet geometry and geography types.
  • Flexible: CLI and Python API for any workflow
  • Tested: Extensive test suite across Python 3.10-3.13 and all platforms

Installation

pipx install geoparquet-io     # CLI tool
pip install geoparquet-io      # Python library

See the Installation Guide for more options including uv tool, from source, and requirements.

Quick Start

# Inspect file structure and metadata
gpio inspect myfile.parquet

# Check file quality and best practices
gpio check all myfile.parquet

# Add bounding box column for faster queries
gpio add bbox input.parquet output.parquet

# Sort using Hilbert curve for spatial locality
gpio sort hilbert input.parquet output_sorted.parquet

# Partition by admin boundaries
gpio partition admin buildings.parquet output_dir/ --dataset gaul --levels continent,country

# Remote-to-remote processing (S3, GCS, Azure, HTTPS)
gpio add bbox s3://bucket/input.parquet s3://bucket/output.parquet --profile my-aws
gpio partition h3 gs://bucket/data.parquet gs://bucket/partitions/ --resolution 9
gpio sort hilbert https://example.com/data.parquet s3://bucket/sorted.parquet

# Chain commands with Unix pipes - no intermediate files needed
gpio extract --bbox "-122.5,37.5,-122.0,38.0" input.parquet | gpio add bbox - | gpio sort hilbert - output.parquet

For more examples and detailed usage, see the Quick Start Tutorial and User Guide.

Python API

Use gpio programmatically for the best performance:

import geoparquet_io as gpio

# Read, transform, and write in a fluent chain
gpio.read('input.parquet') \
    .add_bbox() \
    .sort_hilbert() \
    .write('output.parquet')

# Convert from other formats (Shapefile, GeoJSON, GeoPackage, CSV)
gpio.convert('data.gpkg') \
    .add_h3(resolution=9) \
    .partition_by_h3('output/', resolution=5)

# Upload to cloud storage
gpio.read('data.parquet') \
    .extract(bbox=(-122.5, 37.5, -122.0, 38.0)) \
    .add_bbox() \
    .upload('s3://bucket/filtered.parquet')

The Python API keeps data in memory as Arrow tables, providing up to 5x better performance than CLI operations. See the Python API documentation for full details.

Plugins

gpio supports plugins that add specialized format support. Plugins are installed alongside the main tool:

# Install gpio with PMTiles support
uv tool install geoparquet-io --with gpio-pmtiles
pipx install geoparquet-io --preinstall gpio-pmtiles

# Or add to existing installation
uv tool install --with gpio-pmtiles geoparquet-io
pipx inject geoparquet-io gpio-pmtiles

Available Plugins

  • gpio-pmtiles - Convert between GeoParquet and PMTiles format for efficient web map tiles

Claude Code Integration

Use gpio with Claude Code for AI-assisted spatial data workflows.

Install the skill from skills/geoparquet/ or download it from:

https://github.com/geoparquet/geoparquet-io/tree/main/skills/geoparquet

The skill teaches Claude how to help you convert spatial data to optimized GeoParquet, validate files, recommend partitioning strategies, and publish to cloud storage.

Contributing

Contributions are welcome! See CONTRIBUTING.md for development setup, coding standards, and how to submit changes.

Links

License

Apache 2.0 - See LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geoparquet_io-1.0.0b1.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

geoparquet_io-1.0.0b1-py3-none-any.whl (375.6 kB view details)

Uploaded Python 3

File details

Details for the file geoparquet_io-1.0.0b1.tar.gz.

File metadata

  • Download URL: geoparquet_io-1.0.0b1.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for geoparquet_io-1.0.0b1.tar.gz
Algorithm Hash digest
SHA256 bb559df5b92f827f090ae1c30ceaa6ef1156a26076b44073d3849e7bbce91b1f
MD5 19af53b9dd523562411026efc6ee0394
BLAKE2b-256 8950379d711cbfdc0644b0b718812de3fe6787e28533d1ce1b9bbd8a7bd4e5e6

See more details on using hashes here.

Provenance

The following attestation bundles were made for geoparquet_io-1.0.0b1.tar.gz:

Publisher: publish.yml on geoparquet/geoparquet-io

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file geoparquet_io-1.0.0b1-py3-none-any.whl.

File metadata

  • Download URL: geoparquet_io-1.0.0b1-py3-none-any.whl
  • Upload date:
  • Size: 375.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for geoparquet_io-1.0.0b1-py3-none-any.whl
Algorithm Hash digest
SHA256 c626c9d915462657264ae73d9c1f170a5c596e4623ed04208791851986d610db
MD5 2ffa1d07a9e3237bf7e591e06787531c
BLAKE2b-256 3782a2494dd53397b950e81f7f315d968c4b85f652929ba9d8eb53df854375a0

See more details on using hashes here.

Provenance

The following attestation bundles were made for geoparquet_io-1.0.0b1-py3-none-any.whl:

Publisher: publish.yml on geoparquet/geoparquet-io

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page