Scalable geospatial analysis on Cloud Optimized GeoTIFFs.
Project description
Cog Worker
Scalable geospatial analysis on Cloud Optimized GeoTIFFs.
- Documentation: https://vizzuality.github.io/cog_worker
- PyPI: https://pypi.org/project/cog-worker
cog_worker is a simple library to help write scripts to conduct scaleable analysis of gridded data. It's intended to be useful for moderate- to large-scale GIS, remote sensing, and machine learning applications.
Installation
pip install cog_worker
Examples
See docs/examples
for Jupyter notebook examples
Quick start
- A simple cog_worker script
from rasterio.plot import show
from cog_worker import Manager
def my_analysis(worker):
arr = worker.read('roads_cog.tif')
return arr
manager = Manager(proj='wgs84', scale=0.083333)
arr, bbox = manager.preview(my_analysis)
show(arr)
- Define an analysis function that recieves a cog_worker.Worker as the first parameter.
from cog_worker import Worker, Manager
import numpy as np
# Define an analysis function to read and process COG data sources
def MyAnalysis(worker: Worker) -> np.ndarray:
# 1. Read a COG (reprojecting, resampling and clipping as necessary)
array: np.ndarray = worker.read('roads_cog.tif')
# 2. Work on the array
# ...
# 3. Return (or post to blob storage etc.)
return array
- Run your analysis in different scales and projections
import rasterio as rio
# Run your analysis using a cog_worker.Manager which handles chunking
manager = Manager(
proj = 'wgs84', # any pyproj string
scale = 0.083333, # in projection units (degrees or meters)
bounds = (-180, -90, 180, 90),
buffer = 128 # buffer pixels when chunking analysis
)
# preview analysis
arr, bbox = manager.preview(MyAnalysis, max_size=1024)
rio.plot.show(arr)
# preview analysis chunks
for bbox in manager.chunks(chunksize=1500):
print(bbox)
# execute analysis chunks sequentially
for arr, bbox in manager.chunk_execute(MyAnalysis, chunksize=1500):
rio.plot.show(arr)
# generate job execution parameters
for params in manager.chunk_params(chunksize=1500):
print(params)
- Write scale-dependent functions¶
import scipy
def focal_mean(
worker: Worker,
kernel_radius: float = 1000 # radius in projection units (meters)
) -> np.ndarray:
array: np.ndarray = worker.read('sample-geotiff.tif')
# Access the pixel size at worker.scale
kernel_size = kernel_radius * 2 / worker.scale
array = scipy.ndimage.uniform_filter(array, kernel_size)
return array
- Chunk your analysis and run it in a dask cluster
from cog_worker.distributed import DaskManager
from dask.distributed import LocalCluster, Client
# Set up a Manager with that connects to a Dask cluster
cluster = LocalCluster()
client = Client(cluster)
distributed_manager = DaskManager(
client,
proj = 'wgs84',
scale = 0.083333,
bounds = (-180, -90, 180, 90),
buffer = 128
)
# Execute in worker pool and save chunks to disk as they complete.
distributed_manager.chunk_save('output.tif', MyAnalysis, chunksize=2048)
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
cog_worker-0.3.0.tar.gz
(17.5 kB
view details)
Built Distribution
File details
Details for the file cog_worker-0.3.0.tar.gz
.
File metadata
- Download URL: cog_worker-0.3.0.tar.gz
- Upload date:
- Size: 17.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 897b06c8cae9fcb38e67775ad33d3a500b0fc8ad321687dd47fd5ed0407ab9cb |
|
MD5 | 7ea5d8a463ada5ab2637b333d74f0fa0 |
|
BLAKE2b-256 | 48ec90cf862f7d2532c28328c5584760d36ae717c5b4aeeba6b4f19c74a00532 |
Provenance
File details
Details for the file cog_worker-0.3.0-py3-none-any.whl
.
File metadata
- Download URL: cog_worker-0.3.0-py3-none-any.whl
- Upload date:
- Size: 17.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | eff959ec4f0d19a9a0bffc48b12c868e9d675768c70495b6e4a445e959f7c69f |
|
MD5 | 757417aea9496ae65161e9e2805ed6dc |
|
BLAKE2b-256 | c4172567c14d882aaa92d06cf55a887948f002f7fb76eb3d933a0019cf8c85c1 |