Fast I/O and transformation tools for GeoParquet files
Project description
geoparquet-io
Fast I/O and transformation tools for GeoParquet files using PyArrow and DuckDB.
📚 Full Documentation | Quick Start Tutorial
Features
- Fast: Built on PyArrow and DuckDB for high-performance operations
- Comprehensive: Sort, extract, partition, enhance, validate, and upload GeoParquet files
- Cloud-Native: Read from and write to S3, GCS, Azure, and HTTPS sources
- Spatial Indexing: Add bbox, H3 hexagonal cells, KD-tree partitions, and admin divisions
- Best Practices: Automatic optimization following GeoParquet 1.1 and 2.0 specs
- Parquet Geo Types support: Read and write Parquet geometry and geography types.
- Flexible: CLI and Python API for any workflow
- Tested: Extensive test suite across Python 3.10-3.13 and all platforms
Installation
pip install geoparquet-io
See the Installation Guide for other options (uv, from source) and requirements.
Quick Start
# Inspect file structure and metadata
gpio inspect myfile.parquet
# Check file quality and best practices
gpio check all myfile.parquet
# Add bounding box column for faster queries
gpio add bbox input.parquet output.parquet
# Sort using Hilbert curve for spatial locality
gpio sort hilbert input.parquet output_sorted.parquet
# Partition by admin boundaries
gpio partition admin buildings.parquet output_dir/ --dataset gaul --levels continent,country
# Remote-to-remote processing (S3, GCS, Azure, HTTPS)
gpio add bbox s3://bucket/input.parquet s3://bucket/output.parquet --profile my-aws
gpio partition h3 gs://bucket/data.parquet gs://bucket/partitions/ --resolution 9
gpio sort hilbert https://example.com/data.parquet s3://bucket/sorted.parquet
For more examples and detailed usage, see the Quick Start Tutorial and User Guide.
Contributing
Contributions are welcome! See CONTRIBUTING.md for development setup, coding standards, and how to submit changes.
Links
- Documentation: https://geoparquet.org/geoparquet-io/
- PyPI: https://pypi.org/project/geoparquet-io/
- Issues: https://github.com/cholmes/geoparquet-io/issues
License
Apache 2.0 - See LICENSE for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file geoparquet_io-0.7.0.tar.gz.
File metadata
- Download URL: geoparquet_io-0.7.0.tar.gz
- Upload date:
- Size: 886.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5321bc2908fac3bc5e65ed084849a411b538acded9e84718ba2bb8eb41e2a5c6
|
|
| MD5 |
c2e6461b98a16843aebdfd8d59c92ea2
|
|
| BLAKE2b-256 |
451e7a59ed9e9cf59b186b3d6a8cfd5bc92741a30e391afeac900132451cdfcc
|
Provenance
The following attestation bundles were made for geoparquet_io-0.7.0.tar.gz:
Publisher:
publish.yml on geoparquet/geoparquet-io
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
geoparquet_io-0.7.0.tar.gz -
Subject digest:
5321bc2908fac3bc5e65ed084849a411b538acded9e84718ba2bb8eb41e2a5c6 - Sigstore transparency entry: 780834168
- Sigstore integration time:
-
Permalink:
geoparquet/geoparquet-io@b84675074d24d5236e5ea3b30ad2688ab2db69bc -
Branch / Tag:
refs/tags/v0.7.0 - Owner: https://github.com/geoparquet
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b84675074d24d5236e5ea3b30ad2688ab2db69bc -
Trigger Event:
release
-
Statement type:
File details
Details for the file geoparquet_io-0.7.0-py3-none-any.whl.
File metadata
- Download URL: geoparquet_io-0.7.0-py3-none-any.whl
- Upload date:
- Size: 204.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
95d9c89eeaac7173e723368f920054d407e115a5d87278c6b957a02032417353
|
|
| MD5 |
a8cb340483c52a399902c56ac2869426
|
|
| BLAKE2b-256 |
20ba9acdc9893c27fb50c49e6113705a75b6a1430aee872c039e5f96389d03dd
|
Provenance
The following attestation bundles were made for geoparquet_io-0.7.0-py3-none-any.whl:
Publisher:
publish.yml on geoparquet/geoparquet-io
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
geoparquet_io-0.7.0-py3-none-any.whl -
Subject digest:
95d9c89eeaac7173e723368f920054d407e115a5d87278c6b957a02032417353 - Sigstore transparency entry: 780834169
- Sigstore integration time:
-
Permalink:
geoparquet/geoparquet-io@b84675074d24d5236e5ea3b30ad2688ab2db69bc -
Branch / Tag:
refs/tags/v0.7.0 - Owner: https://github.com/geoparquet
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@b84675074d24d5236e5ea3b30ad2688ab2db69bc -
Trigger Event:
release
-
Statement type: