Query geospatial sources, analyze at scale, and publish results with a consistent API.
Project description
GeoFabric
GeoFabric is a pragmatic geospatial toolkit for ETL, analytics, and publishing—built around Parquet, DuckDB Spatial, and PMTiles.
- ETL: Pull/normalize subsets into (Geo)Parquet
- Analytics: Scalable spatial SQL via DuckDB + DuckDB Spatial
- Viz / Publishing: Quick notebook maps + PMTiles generation via tippecanoe
Features
- Unified Query API - Chainable, lazy query builder for geospatial data
- Multiple Format Support - Parquet, GeoJSON, GeoPackage, FlatGeoBuf, Shapefile, CSV
- 17+ Spatial Operations - Buffer, simplify, transform, clip, erase, boundary, densify, dissolve, centroid, convex hull, and more
- Geometry Measurements - Area, length, perimeter, bounds, distance calculations
- Coordinate Extraction - Extract X/Y coordinates from geometries
- Spatial Joins - Efficient joins with 6 predicates (intersects, within, contains, touches, crosses, overlaps)
- K-Nearest Neighbors - Find nearest features with optional distance filtering
- Cloud Support - Read directly from S3, GCS, and Azure Blob Storage
- Programmatic Configuration - Configure credentials for all platforms via API
- PostGIS Integration - Query PostGIS databases directly
- Overture Maps - Built-in helper for downloading Overture data
- PMTiles Export - Generate vector tiles via tippecanoe
- Validation - Geometry validation and repair utilities
- CLI Tool - 16 commands for common operations
- Auto-generated Docs - API documentation deployed to GitHub Pages
Architecture
Installation
From PyPI
pip install geofabric
From Source
git clone https://github.com/marcostfermin/GeoFabric.git
cd GeoFabric
pip install -e "."
With Optional Dependencies
# Visualization support (geopandas + lonboard)
pip install -e ".[viz]"
# STAC catalog support
pip install -e ".[stac]"
# All optional dependencies
pip install -e ".[all]"
# Development dependencies
pip install -e ".[dev,all]"
Quick Start
import geofabric as gf
# Open a dataset
ds = gf.open("file:///path/to/data.parquet")
# Define a region of interest
roi = gf.roi.bbox(-74.10, 40.60, -73.70, 40.90)
# Build a query
q = ds.within(roi).select(["*"]).limit(1000)
# Export to various formats
q.to_parquet("out.parquet")
q.to_geojson("out.geojson")
q.to_geopackage("out.gpkg")
# Run analytics
print(q.aggregate({"count": "*"}))
# Visualize in notebooks (requires viz extras)
q.show()
# Generate vector tiles (requires tippecanoe)
q.to_pmtiles("out.pmtiles", layer="features", maxzoom=14)
Spatial Operations
import geofabric as gf
ds = gf.open("file:///data.parquet")
q = ds.query()
# Geometry transformations
q.buffer(distance=100, unit="meters") # Buffer with unit conversion
q.simplify(tolerance=0.001) # Simplify geometries
q.transform(to_srid=3857) # Transform CRS
q.boundary() # Extract boundaries
q.centroid() # Get centroids
q.convex_hull() # Convex hulls
q.envelope() # Bounding boxes
q.densify(max_distance=100) # Add vertices
q.make_valid() # Repair invalid geometries
q.dissolve(by="category") # Merge geometries by attribute
q.collect() # Gather into MultiGeometry
q.explode() # Split multi-geometries
# Add computed columns
q.with_area() # Add area column
q.with_length() # Add length/perimeter
q.with_bounds() # Add minx, miny, maxx, maxy
q.with_distance_to("POINT(0 0)") # Distance to reference
q.with_coordinates() # Add X, Y columns
q.with_geometry_type() # Add geometry type
q.with_num_points() # Add vertex count
q.with_is_valid() # Add validity check
Spatial Joins
import geofabric as gf
buildings = gf.open("file:///buildings.parquet")
parcels = gf.open("file:///parcels.parquet")
# Spatial join
joined = buildings.query().sjoin(
parcels.query(),
predicate="intersects",
how="inner"
)
# K-nearest neighbors
nearest = buildings.query().nearest(
parcels.query(),
k=3,
max_distance=1000
)
Overture Maps Integration
from geofabric.sources.overture import Overture
import geofabric as gf
# Download Overture data (requires AWS CLI)
ov = Overture(release="2025-12-17.0", theme="base", type_="infrastructure")
local_dir = ov.download("./data/overture_infra")
# Query the downloaded data
ds = gf.open(local_dir)
sample = ds.query().limit(10000)
sample.to_parquet("overture_sample.parquet")
CLI Usage
# Show help
gf --help
# Run SQL queries
gf sql file:///tmp/x.parquet "SELECT COUNT(*) FROM data"
# Pull subset of data
gf pull file:///data.parquet out.parquet --where "type='building'" --limit 1000
# Show dataset info
gf info file:///data.parquet
# Validate geometries
gf validate file:///data.parquet
# Show first N rows
gf head file:///data.parquet --n 20
# Sample random rows
gf sample file:///data.parquet sample.parquet --n 1000
# Show statistics
gf stats file:///data.parquet
# Buffer geometries
gf buffer file:///data.parquet buffered.parquet --distance 100 --unit meters
# Simplify geometries
gf simplify file:///data.parquet simplified.parquet --tolerance 0.001
# Transform CRS
gf transform file:///data.parquet transformed.parquet --to-srid 3857
# Compute centroids
gf centroid file:///data.parquet centroids.parquet
# Compute convex hulls
gf convex-hull file:///data.parquet hulls.parquet
# Dissolve geometries by attribute
gf dissolve file:///data.parquet dissolved.parquet --by category
# Add area column
gf add-area file:///data.parquet with_area.parquet --column-name area_sqm
# Add length/perimeter column
gf add-length file:///data.parquet with_length.parquet --column-name perimeter
# Download Overture data
gf overture download --release 2025-12-17.0 --theme base --type infrastructure --dest ./data
Supported Data Sources
| Source | URI Format | Example |
|---|---|---|
| Local Files | file:///path |
file:///data/buildings.parquet |
| S3 | s3://bucket/key |
s3://my-bucket/data.parquet |
| GCS | gs://bucket/key |
gs://my-bucket/data.parquet |
| Azure | az://container/path |
az://mycontainer/data.parquet |
| PostGIS | postgresql://... |
postgresql://user:pass@host/db?table=schema.table |
| STAC | stac://... |
stac://catalog-url/collection |
Credential Configuration
GeoFabric supports both programmatic configuration and environment variables:
import geofabric as gf
# Programmatic configuration (takes precedence)
gf.configure_s3(
access_key_id="AKIA...",
secret_access_key="...",
region="us-east-1"
)
gf.configure_postgis(host="db.example.com", user="user", password="pass")
gf.configure_azure(account_name="...", account_key="...")
# Now use gf.open() with configured credentials
ds = gf.open("s3://my-bucket/data.parquet?anonymous=false")
See Authentication Guide and API Reference for details.
Plugin System
GeoFabric supports plugins via Python entry points:
geofabric.sources- Data source pluginsgeofabric.engines- Query engine pluginsgeofabric.sinks- Output sink plugins
See src/geofabric/registry.py for implementation details.
Development
# Clone and install
git clone https://github.com/marcostfermin/GeoFabric.git
cd GeoFabric
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev,all]"
# Install pre-commit hooks
pre-commit install
# Run tests
pytest
# Run tests with coverage
pytest --cov=geofabric --cov-report=term-missing
# Run linting
ruff check src/
# Run type checking
mypy src/geofabric
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
License
This project is licensed under the MIT License - see the LICENSE file for details.
Changelog
See CHANGELOG.md for a list of changes.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file geofabric-1.0.0.tar.gz.
File metadata
- Download URL: geofabric-1.0.0.tar.gz
- Upload date:
- Size: 189.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8db617140fefb2e5da7b9fb8a66a2a3a8e0b455de1df60eb62ce2c666bd53158
|
|
| MD5 |
b98cb5fbf75d8885ccc1c34b55a20cea
|
|
| BLAKE2b-256 |
42bf1c25d2ee2dd530a673cbc00ae778b8b16872784b262ac02813dd8a58ce3f
|
Provenance
The following attestation bundles were made for geofabric-1.0.0.tar.gz:
Publisher:
ci.yml on marcostfermin/GeoFabric
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
geofabric-1.0.0.tar.gz -
Subject digest:
8db617140fefb2e5da7b9fb8a66a2a3a8e0b455de1df60eb62ce2c666bd53158 - Sigstore transparency entry: 808991954
- Sigstore integration time:
-
Permalink:
marcostfermin/GeoFabric@15c1a7bc818c6e9a57a341356d3cb7bf607c02b6 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/marcostfermin
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@15c1a7bc818c6e9a57a341356d3cb7bf607c02b6 -
Trigger Event:
release
-
Statement type:
File details
Details for the file geofabric-1.0.0-py3-none-any.whl.
File metadata
- Download URL: geofabric-1.0.0-py3-none-any.whl
- Upload date:
- Size: 70.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
44582189a17cf643d506cc08adc60a2597bbeacf853597e4bda324893b749e05
|
|
| MD5 |
e88a0ccd5a01b49960c2773d724ac9cf
|
|
| BLAKE2b-256 |
f0754cdab8d37146323171b3d53dcdc46ccc831662575ddc04f3fed701e02372
|
Provenance
The following attestation bundles were made for geofabric-1.0.0-py3-none-any.whl:
Publisher:
ci.yml on marcostfermin/GeoFabric
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
geofabric-1.0.0-py3-none-any.whl -
Subject digest:
44582189a17cf643d506cc08adc60a2597bbeacf853597e4bda324893b749e05 - Sigstore transparency entry: 808991961
- Sigstore integration time:
-
Permalink:
marcostfermin/GeoFabric@15c1a7bc818c6e9a57a341356d3cb7bf607c02b6 -
Branch / Tag:
refs/tags/v1.0.0 - Owner: https://github.com/marcostfermin
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
ci.yml@15c1a7bc818c6e9a57a341356d3cb7bf607c02b6 -
Trigger Event:
release
-
Statement type: