Skip to main content

Query geospatial sources, analyze at scale, and publish results with a consistent API.

Reason this release was yanked:

Wrong version number was released

Project description

GeoFabric GeoFabric Logo

PyPI version Python 3.10+ License: MIT

GeoFabric is a pragmatic geospatial toolkit for ETL, analytics, and publishing—built around Parquet, DuckDB Spatial, and PMTiles.

  • ETL: Pull/normalize subsets into (Geo)Parquet
  • Analytics: Scalable spatial SQL via DuckDB + DuckDB Spatial
  • Viz / Publishing: Quick notebook maps + PMTiles generation via tippecanoe

Features

  • Unified Query API - Chainable, lazy query builder for geospatial data
  • Multiple Format Support - Parquet, GeoJSON, GeoPackage, FlatGeoBuf, Shapefile, CSV
  • 17+ Spatial Operations - Buffer, simplify, transform, clip, erase, boundary, densify, dissolve, centroid, convex hull, and more
  • Geometry Measurements - Area, length, perimeter, bounds, distance calculations
  • Coordinate Extraction - Extract X/Y coordinates from geometries
  • Spatial Joins - Efficient joins with 6 predicates (intersects, within, contains, touches, crosses, overlaps)
  • K-Nearest Neighbors - Find nearest features with optional distance filtering
  • Cloud Support - Read directly from S3, GCS, and Azure Blob Storage
  • Programmatic Configuration - Configure credentials for all platforms via API
  • PostGIS Integration - Query PostGIS databases directly
  • Overture Maps - Built-in helper for downloading Overture data
  • PMTiles Export - Generate vector tiles via tippecanoe
  • Validation - Geometry validation and repair utilities
  • CLI Tool - 16 commands for common operations
  • Auto-generated Docs - API documentation deployed to GitHub Pages

Architecture

flowchart TB
  %% =========================
  %% USER ENTRY POINTS
  %% =========================
  subgraph U["User entry points"]
    direction LR
    U1["Python API"]
    U2["CLI"]
    U3["Notebook viz"]
  end

  %% =========================
  %% THIN CORE (ORCHESTRATION)
  %% =========================
  C["GeoFabric core<br/>lazy query orchestration"]

  %% =========================
  %% EXECUTION BACKEND
  %% =========================
  subgraph E["Execution backend"]
    direction TB
    E1["Query planning / SQL generation"]
    E2["DuckDB + DuckDB Spatial"]
  end

  %% =========================
  %% DATA SOURCES
  %% =========================
  subgraph S["Data sources"]
    direction TB
    S1["Local files"]
    S2["S3 / GCS / Azure"]
    S3["PostGIS"]
    S4["STAC / Overture"]
  end

  %% =========================
  %% OUTPUTS
  %% =========================
  subgraph O["Outputs"]
    direction TB
    O1["Parquet (GeoParquet)"]
    O2["Vector formats<br/>GeoJSON · GPKG · FlatGeoBuf · CSV"]
    O3["PMTiles (tippecanoe)"]
  end

  %% =========================
  %% SUPPORT SERVICES
  %% =========================
  subgraph P["Support services"]
    direction LR
    P1["Validation / repair"]
    P2["Caching"]
    P3["Configuration"]
    P4["Plugin registry"]
  end

  %% =========================
  %% TRUE EXECUTION FLOW
  %% =========================
  U --> C
  S --> C
  C --> E1
  E1 --> E2
  E2 --> O

  P -.-> C

Installation

From PyPI (coming soon)

pip install geofabric

From Source

git clone https://github.com/marcostfermin/GeoFabric.git
cd GeoFabric
pip install -e "."

With Optional Dependencies

# Visualization support (geopandas + lonboard)
pip install -e ".[viz]"

# STAC catalog support
pip install -e ".[stac]"

# All optional dependencies
pip install -e ".[all]"

# Development dependencies
pip install -e ".[dev,all]"

Quick Start

import geofabric as gf

# Open a dataset
ds = gf.open("file:///path/to/data.parquet")

# Define a region of interest
roi = gf.roi.bbox(-74.10, 40.60, -73.70, 40.90)

# Build a query
q = ds.within(roi).select(["*"]).limit(1000)

# Export to various formats
q.to_parquet("out.parquet")
q.to_geojson("out.geojson")
q.to_geopackage("out.gpkg")

# Run analytics
print(q.aggregate({"count": "*"}))

# Visualize in notebooks (requires viz extras)
q.show()

# Generate vector tiles (requires tippecanoe)
q.to_pmtiles("out.pmtiles", layer="features", maxzoom=14)

Spatial Operations

import geofabric as gf

ds = gf.open("file:///data.parquet")
q = ds.query()

# Geometry transformations
q.buffer(distance=100, unit="meters")  # Buffer with unit conversion
q.simplify(tolerance=0.001)            # Simplify geometries
q.transform(to_srid=3857)              # Transform CRS
q.boundary()                           # Extract boundaries
q.centroid()                           # Get centroids
q.convex_hull()                        # Convex hulls
q.envelope()                           # Bounding boxes
q.densify(max_distance=100)            # Add vertices
q.make_valid()                         # Repair invalid geometries
q.dissolve(by="category")              # Merge geometries by attribute
q.collect()                            # Gather into MultiGeometry
q.explode()                            # Split multi-geometries

# Add computed columns
q.with_area()                          # Add area column
q.with_length()                        # Add length/perimeter
q.with_bounds()                        # Add minx, miny, maxx, maxy
q.with_distance_to("POINT(0 0)")       # Distance to reference
q.with_coordinates()                   # Add X, Y columns
q.with_geometry_type()                 # Add geometry type
q.with_num_points()                    # Add vertex count
q.with_is_valid()                      # Add validity check

Spatial Joins

import geofabric as gf

buildings = gf.open("file:///buildings.parquet")
parcels = gf.open("file:///parcels.parquet")

# Spatial join
joined = buildings.query().sjoin(
    parcels.query(),
    predicate="intersects",
    how="inner"
)

# K-nearest neighbors
nearest = buildings.query().nearest(
    parcels.query(),
    k=3,
    max_distance=1000
)

Overture Maps Integration

from geofabric.sources.overture import Overture
import geofabric as gf

# Download Overture data (requires AWS CLI)
ov = Overture(release="2025-12-17.0", theme="base", type_="infrastructure")
local_dir = ov.download("./data/overture_infra")

# Query the downloaded data
ds = gf.open(local_dir)
sample = ds.query().limit(10000)
sample.to_parquet("overture_sample.parquet")

CLI Usage

# Show help
gf --help

# Run SQL queries
gf sql file:///tmp/x.parquet "SELECT COUNT(*) FROM data"

# Pull subset of data
gf pull file:///data.parquet out.parquet --where "type='building'" --limit 1000

# Show dataset info
gf info file:///data.parquet

# Validate geometries
gf validate file:///data.parquet

# Show first N rows
gf head file:///data.parquet --n 20

# Sample random rows
gf sample file:///data.parquet sample.parquet --n 1000

# Show statistics
gf stats file:///data.parquet

# Buffer geometries
gf buffer file:///data.parquet buffered.parquet --distance 100 --unit meters

# Simplify geometries
gf simplify file:///data.parquet simplified.parquet --tolerance 0.001

# Transform CRS
gf transform file:///data.parquet transformed.parquet --to-srid 3857

# Compute centroids
gf centroid file:///data.parquet centroids.parquet

# Compute convex hulls
gf convex-hull file:///data.parquet hulls.parquet

# Dissolve geometries by attribute
gf dissolve file:///data.parquet dissolved.parquet --by category

# Add area column
gf add-area file:///data.parquet with_area.parquet --column-name area_sqm

# Add length/perimeter column
gf add-length file:///data.parquet with_length.parquet --column-name perimeter

# Download Overture data
gf overture download --release 2025-12-17.0 --theme base --type infrastructure --dest ./data

Supported Data Sources

Source URI Format Example
Local Files file:///path file:///data/buildings.parquet
S3 s3://bucket/key s3://my-bucket/data.parquet
GCS gs://bucket/key gs://my-bucket/data.parquet
Azure az://container/path az://mycontainer/data.parquet
PostGIS postgresql://... postgresql://user:pass@host/db?table=schema.table
STAC stac://... stac://catalog-url/collection

Credential Configuration

GeoFabric supports both programmatic configuration and environment variables:

import geofabric as gf

# Programmatic configuration (takes precedence)
gf.configure_s3(
    access_key_id="AKIA...",
    secret_access_key="...",
    region="us-east-1"
)
gf.configure_postgis(host="db.example.com", user="user", password="pass")
gf.configure_azure(account_name="...", account_key="...")

# Now use gf.open() with configured credentials
ds = gf.open("s3://my-bucket/data.parquet?anonymous=false")

See Authentication Guide and API Reference for details.

Plugin System

GeoFabric supports plugins via Python entry points:

  • geofabric.sources - Data source plugins
  • geofabric.engines - Query engine plugins
  • geofabric.sinks - Output sink plugins

See src/geofabric/registry.py for implementation details.

Development

# Clone and install
git clone https://github.com/marcostfermin/GeoFabric.git
cd GeoFabric
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev,all]"

# Install pre-commit hooks
pre-commit install

# Run tests
pytest

# Run tests with coverage
pytest --cov=geofabric --cov-report=term-missing

# Run linting
ruff check src/

# Run type checking
mypy src/geofabric

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for guidelines.

License

This project is licensed under the MIT License - see the LICENSE file for details.

Changelog

See CHANGELOG.md for a list of changes.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

geofabric-0.1.0.tar.gz (182.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

geofabric-0.1.0-py3-none-any.whl (71.1 kB view details)

Uploaded Python 3

File details

Details for the file geofabric-0.1.0.tar.gz.

File metadata

  • Download URL: geofabric-0.1.0.tar.gz
  • Upload date:
  • Size: 182.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for geofabric-0.1.0.tar.gz
Algorithm Hash digest
SHA256 88c37466ba68eac97cbcd08548c545910530bf827ac060996ccbef2aa57215f8
MD5 04818173d972c7f53927263a08fdaaa9
BLAKE2b-256 83c25ce6a27549282b76acc50be7b0c1e16018cc6e4e731cd442d055a5d31f8c

See more details on using hashes here.

Provenance

The following attestation bundles were made for geofabric-0.1.0.tar.gz:

Publisher: ci.yml on marcostfermin/GeoFabric

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file geofabric-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: geofabric-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 71.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for geofabric-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0e21d932946d570aabf809c4b040ad1f84e2ef4a3c2f2de7c1595d521a8490fe
MD5 47433a59fc5f020ae943c4a442d149e9
BLAKE2b-256 57bc9c2ad1eb75cc7f145ae72e0bd3809f724043c7eb179593225d9e1402ccc0

See more details on using hashes here.

Provenance

The following attestation bundles were made for geofabric-0.1.0-py3-none-any.whl:

Publisher: ci.yml on marcostfermin/GeoFabric

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page