Skip to main content

Scalable geospatial analysis on Cloud Optimized GeoTIFFs.

Project description

Cog Worker

Scalable geospatial analysis on Cloud Optimized GeoTIFFs.

cog_worker is a simple library to help write scripts to conduct scaleable analysis of gridded data. It's intended to be useful for moderate- to large-scale GIS, remote sensing, and machine learning applications.

Installation

pip install cog_worker

Examples

See docs/examples for Jupyter notebook examples

Quick start

  1. A simple cog_worker script
from rasterio.plot import show
from cog_worker import Manager

def my_analysis(worker):
    arr = worker.read('roads_cog.tif')
    return arr

manager = Manager(proj='wgs84', scale=0.083333)
arr, bbox = manager.preview(my_analysis)
show(arr)
  1. Define an analysis function that recieves a cog_worker.Worker as the first parameter.
from cog_worker import Worker, Manager
import numpy as np

# Define an analysis function to read and process COG data sources
def MyAnalysis(worker: Worker) -> np.ndarray:

    # 1. Read a COG (reprojecting, resampling and clipping as necessary)
    array: np.ndarray = worker.read('roads_cog.tif')

    # 2. Work on the array
    # ...

    # 3. Return (or post to blob storage etc.)
    return array
  1. Run your analysis in different scales and projections
import rasterio as rio

# Run your analysis using a cog_worker.Manager which handles chunking
manager = Manager(
    proj = 'wgs84',       # any pyproj string
    scale = 0.083333,  # in projection units (degrees or meters)
    bounds = (-180, -90, 180, 90),
    buffer = 128          # buffer pixels when chunking analysis
)

# preview analysis
arr, bbox = manager.preview(MyAnalysis, max_size=1024)
rio.plot.show(arr)

# preview analysis chunks
for bbox in manager.chunks(chunksize=1500):
    print(bbox)

# execute analysis chunks sequentially
for arr, bbox in manager.chunk_execute(MyAnalysis, chunksize=1500):
    rio.plot.show(arr)

# generate job execution parameters
for params in manager.chunk_params(chunksize=1500):
    print(params)
  1. Write scale-dependent functions¶
import scipy

def focal_mean(
    worker: Worker,
    kernel_radius: float = 1000 # radius in projection units (meters)
) -> np.ndarray:

    array: np.ndarray = worker.read('sample-geotiff.tif')

    # Access the pixel size at worker.scale
    kernel_size = kernel_radius * 2 / worker.scale
    array = scipy.ndimage.uniform_filter(array, kernel_size)

    return array
  1. Chunk your analysis and run it in a dask cluster
from cog_worker.distributed import DaskManager
from dask.distributed import LocalCluster, Client

# Set up a Manager with that connects to a Dask cluster
cluster = LocalCluster()
client = Client(cluster)
distributed_manager = DaskManager(
    client,
    proj = 'wgs84',
    scale = 0.083333,
    bounds = (-180, -90, 180, 90),
    buffer = 128
)

# Execute in worker pool and save chunks to disk as they complete.
distributed_manager.chunk_save('output.tif', MyAnalysis, chunksize=2048)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cog_worker-0.3.0.tar.gz (17.5 kB view details)

Uploaded Source

Built Distribution

cog_worker-0.3.0-py3-none-any.whl (17.1 kB view details)

Uploaded Python 3

File details

Details for the file cog_worker-0.3.0.tar.gz.

File metadata

  • Download URL: cog_worker-0.3.0.tar.gz
  • Upload date:
  • Size: 17.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for cog_worker-0.3.0.tar.gz
Algorithm Hash digest
SHA256 897b06c8cae9fcb38e67775ad33d3a500b0fc8ad321687dd47fd5ed0407ab9cb
MD5 7ea5d8a463ada5ab2637b333d74f0fa0
BLAKE2b-256 48ec90cf862f7d2532c28328c5584760d36ae717c5b4aeeba6b4f19c74a00532

See more details on using hashes here.

File details

Details for the file cog_worker-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: cog_worker-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 17.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.4

File hashes

Hashes for cog_worker-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 eff959ec4f0d19a9a0bffc48b12c868e9d675768c70495b6e4a445e959f7c69f
MD5 757417aea9496ae65161e9e2805ed6dc
BLAKE2b-256 c4172567c14d882aaa92d06cf55a887948f002f7fb76eb3d933a0019cf8c85c1

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page