Fast I/O and transformation tools for GeoParquet files
Project description
geoparquet-io
Fast I/O and transformation tools for GeoParquet files using PyArrow and DuckDB.
Features
- Fast: Built on PyArrow and DuckDB for high-performance operations
- Comprehensive: Sort, partition, enhance, and validate GeoParquet files
- Spatial Indexing: Add bbox, H3 hexagonal cells, KD-tree partitions, and admin divisions
- Best Practices: Automatic optimization following GeoParquet 1.1 spec
- Flexible: CLI and Python API for any workflow
- Tested: Extensive test suite across Python 3.9-3.13 and all platforms
Installation
# With uv (recommended)
uv pip install geoparquet-io
# Or with pip
pip install geoparquet-io
# From source
git clone https://github.com/cholmes/geoparquet-io.git
cd geoparquet-io
uv sync --all-extras
For full development set up see the getting started instructions.
Requirements
- Python 3.9 or higher
- PyArrow 12.0.0+
- DuckDB 1.1.3+
Quick Start
# Inspect file structure and metadata
gpio inspect myfile.parquet
# Check file quality and best practices
gpio check all myfile.parquet
# Add bounding box column for faster queries
gpio add bbox input.parquet output.parquet
# Sort using Hilbert curve for spatial locality
gpio sort hilbert input.parquet output_sorted.parquet
# Partition into separate files by country
gpio partition admin buildings.parquet output_dir/
Documentation
Full documentation is available at: https://cholmes.github.io/geoparquet-io/
- Getting Started - Installation and quick start guide
- User Guide - Detailed documentation for all features
- CLI Reference - Complete command reference
- Python API - Python API documentation
- Examples - Real-world usage patterns
Usage Examples
Inspect and Validate
# Quick metadata inspection
gpio inspect data.parquet
# Preview first 10 rows
gpio inspect data.parquet --head 10
# Check against best practices
gpio check all data.parquet
Enhance with Spatial Indices
# Add bounding boxes
gpio add bbox input.parquet output.parquet
# Add H3 hexagonal cell IDs
gpio add h3 input.parquet output.parquet --resolution 9
# Add KD-tree partition IDs (auto-balanced)
gpio add kdtree input.parquet output.parquet
# Add country codes via spatial join
gpio add admin-divisions buildings.parquet output.parquet
Optimize and Partition
# Sort by Hilbert curve
gpio sort hilbert input.parquet sorted.parquet
# Partition by H3 cells
gpio partition h3 large.parquet output_dir/ --resolution 7
# Partition by country
gpio partition admin buildings.parquet by_country/
Python API
from geoparquet_io.core.add_bbox_column import add_bbox_column
from geoparquet_io.core.hilbert_order import hilbert_order
# Add bounding box
add_bbox_column("input.parquet", "output.parquet", verbose=True)
# Sort by Hilbert curve
hilbert_order("input.parquet", "sorted.parquet", add_bbox=True)
Contributing
Contributions are welcome! See our Contributing Guide for details.
Development
# Clone repository
git clone https://github.com/cholmes/geoparquet-io.git
cd geoparquet-io
# Install with all development dependencies
uv sync --all-extras
# Run tests
uv run pytest
# Run linting
uv run ruff check .
# Build docs locally
uv run mkdocs serve
License
Apache 2.0 - See LICENSE for details.
Links
- Documentation: https://cholmes.github.io/geoparquet-io/
- PyPI: https://pypi.org/project/geoparquet-io/ (coming soon)
- Issues: https://github.com/cholmes/geoparquet-io/issues
- Source: https://github.com/cholmes/geoparquet-io
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file geoparquet_io-0.2.0.tar.gz.
File metadata
- Download URL: geoparquet_io-0.2.0.tar.gz
- Upload date:
- Size: 262.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c533dc75fe36e19d32b9068ec1d95d051dee445dfbe3944a976bc5262349ff6f
|
|
| MD5 |
b631117535b1b40be6b959e032032603
|
|
| BLAKE2b-256 |
ea69abb216e34705569ca6cae2a206439d4a1503f6490b3114931840b8d85843
|
Provenance
The following attestation bundles were made for geoparquet_io-0.2.0.tar.gz:
Publisher:
publish.yml on cholmes/geoparquet-io
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
geoparquet_io-0.2.0.tar.gz -
Subject digest:
c533dc75fe36e19d32b9068ec1d95d051dee445dfbe3944a976bc5262349ff6f - Sigstore transparency entry: 637523760
- Sigstore integration time:
-
Permalink:
cholmes/geoparquet-io@9ac0c88ff4014018f10fea4eb861e008760d199c -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/cholmes
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9ac0c88ff4014018f10fea4eb861e008760d199c -
Trigger Event:
release
-
Statement type:
File details
Details for the file geoparquet_io-0.2.0-py3-none-any.whl.
File metadata
- Download URL: geoparquet_io-0.2.0-py3-none-any.whl
- Upload date:
- Size: 62.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
95feedb41efddeb5aa74c9b462ef9946240bdb42a651ff6379066b7ae567bf00
|
|
| MD5 |
838dcdad3d40903317f33d192ea7115f
|
|
| BLAKE2b-256 |
5049a05e85110da21fa30b466a1d9fc9c20d49575a800d5b3d76a99392aff5e0
|
Provenance
The following attestation bundles were made for geoparquet_io-0.2.0-py3-none-any.whl:
Publisher:
publish.yml on cholmes/geoparquet-io
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
geoparquet_io-0.2.0-py3-none-any.whl -
Subject digest:
95feedb41efddeb5aa74c9b462ef9946240bdb42a651ff6379066b7ae567bf00 - Sigstore transparency entry: 637523778
- Sigstore integration time:
-
Permalink:
cholmes/geoparquet-io@9ac0c88ff4014018f10fea4eb861e008760d199c -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/cholmes
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@9ac0c88ff4014018f10fea4eb861e008760d199c -
Trigger Event:
release
-
Statement type: