Skip to main content

Marine Extremes Detection and Tracking

Project description

Logo

Marine Extremes Python Package

Efficient & scalable Marine Extremes detection, identification, & tracking for Exascale Climate Data.

Features

Data Pre-processing

Detrending & Anomaly Detection:

  • Removes trend and seasonal cycle using a 6+ coefficient model (mean, annual & semi-annual harmonics, and arbitrary polynomial trends).
  • (Optional) Normalises anomalies using a 30-day rolling standard deviation.
  • Identifies extreme events based on a global-in-time percentile threshold.
  • Utilises dask for efficient parallel computation and scaling to very large spatio-temporal datasets.
  • Performance/Scaling Test: 100 years of daily 0.25° resolution data with 128 cores...
    • Takes ~5 wall-minutes per century
    • Requires only 1 Gb memory per core (when dask-backed data has chunks of 25 days)

Object Detection & Tracking

Object Detection:

  • Implements efficient algorithms for object detection in 2D geographical data.
  • Fully-parallelised workflow built on dask for extremely fast & larger-than-memory computation.
  • Uses morphological opening & closing to fill small holes and gaps in binary features.
  • Filters out small objects based on area thresholds.
  • Identifies and labels connected regions in binary data representing arbitrary events (e.g. SST or SSS extrema, tracer presence, eddies, etc...).
  • Performance/Scaling Test: 100 years of daily 0.25° resolution binary data with 64 cores...
    • Takes ~5 wall-minutes per century
    • Requires only 1 Gb memory per core (with dask chunks of 25 days)

Object Tracking:

  • Implements strict event tracking conditions to avoid very few, very large objects.
  • Permits temporal gaps (of T_fill days) between objects, to allow more continuous event tracking.
  • Requires objects to overlap by at least overlap_threshold fraction of the smaller objects's area to be considered the same event and continue tracking with the same ID.
  • Accounts for & keeps a history of object splitting & merging events, ensuring objects are more coherent and retain their previous identities & histories.
  • Improves upon the splitting & merging logic of Sun et al. (2023):
    • In this New Version: Partition the child object based on the parent of the nearest-neighbour cell (not the nearest parent centroid).
  • Provides much more accessible and usable tracking outputs:
    • Tracked object properties (such as area, centroid, and any other user-defined properties) are mapped into ID-time space
    • Details & Properties of all Merging/Splitting events are recorded.
    • Provides other useful information that may be difficult to extract from the large object ID field, such as:
      • Event presence in time
      • Event start/end times and duration
      • etc...
  • Performance/Scaling Test: 100 years of daily 0.25° resolution binary data with 64 cores...
    • Takes ~8 wall-minutes per decade (cf. Old Method, i.e. without merge-split-tracking, time-gap filling, overlap-thresholding, et al., but here updated to leverage dask, now takes 1 wall-minute per decade!)
    • Requires only ~2 Gb memory per core (wwith dask chunks of 25 days)

Visualisation

Plotting:

  • Provides a few helper functions to create pretty plots, wrapped subplots, and animations (e.g. below).

cf. Old Method vs. New Tracking Algorithms (Centroidal & Nearest-Neighbour Partitioning):

https://github.com/user-attachments/assets/9936fb43-f7a3-461e-aa82-9beb2826619c

https://github.com/user-attachments/assets/462acafd-b2af-467c-b2c5-efec28ed24f1

Usage

1. Pre-process SST Data: cf. 01_preprocess_extremes.ipynb

import xarray as xr
import dask
import marEx

# Load SST data & rechunk for optimal processing
file_name = 'path/to/sst/data'
sst = xr.open_dataset(file_name, chunks={'time':500}).sst

# Process Data
extremes_ds = hot.preprocess_data(sst, threshold_percentile=95)

The resulting xarray dataset extremes_ds will have the following structure & entries:

xarray.Dataset
Dimensions:     (lat, lon, time)
Coordinates:
    lat         (lat)
    lon         (lon)
    time        (time)
Data variables:
    dat_detrend     (time, lat, lon)  float64   dask.array
    mask            (lat, lon)        bool      dask.array
    extreme_events  (time, lat, lon)  bool      dask.array

where dat_detrend is the detrended SST data, mask is the deduced land-sea mask, and extreme_events is the binary field locating extreme events. Additionally, the STD-renormalised anomalies, extreme_events_stn, will be output if normalise=True is set in preprocess_data().

Optional arguments for marEx.preprocess_data() include:

  • std_normalise: Whether to normalise the anomalies using a 30-day rolling standard deviation. Default is False.
  • threshold_percentile: The percentile threshold for extreme event detection. Default is 95.
  • detrend_orders: List of polynomial orders for detrending. Default is [1], i.e. 1st order (linear) detrend. [1,2] e.g. would use a linear+quadratic detrending.
  • dimensions: The names of the time, latitude, and longitude dimensions in the data array. Default is ('time', 'lat', 'lon').
  • dask_chunks: The chunk size for the output dataset. Default is {'time': 25}.

See, e.g. ./examples/unstructured data/01_preprocess_extremes.ipynb for a detailed example of pre-processing on an unstructured grid.


2. Identify & Track Marine Heatwaves: cf. 02_id_track_events.ipynb

import xarray as xr
import dask
import marEx


# Load Pre-processed Data
file_name = 'path/to/binary/extreme/data'
chunk_size = {'time': 25, 'lat': -1, 'lon': -1}
ds_hot = xr.open_dataset(file_name, chunks=chunk_size)

# Extract Extreme Binary Features and Modify Mask
extreme_bin = ds_hot.dat_stn
mask = ds_hot.mask.where((ds_hot.lat < 85) & (ds_hot.lat > -90), other=False)

# ID, Track, & Merge
tracker = marEx.tracker(extreme_bin, mask, area_filter_quartile=0.5, R_fill=8, T_fill=2, allow_merging=True, overlap_threshold=0.5, nn_partitioning = True)
extreme_events_ds, merges_ds = tracker.run(return_merges=True)

The resulting xarray dataset extreme_events_ds will have the following structure & entries:

xarray.Dataset 
Dimensions: (lat, lon, time, ID, component, sibling_ID) 
Coordinates:
    lat         (lat)
    lon         (lon)
    time        (time)
    ID          (ID)
Data variables:
    ID_field              (time, lat, lon)        int32       dask.array
    global_ID             (time, ID)              int32       ndarray
    area                  (time, ID)              float32     ndarray
    centroid              (component, time, ID)   float32     ndarray
    presence              (time, ID)              bool        ndarray
    time_start            (ID)                    datetime64  ndarray
    time_end              (ID)                    datetime64  ndarray
    merge_ledger          (time, ID, sibling_ID)  int32       ndarray

where

  • ID_field is the binary field of tracked events,
  • global_ID is the unique ID of each object. global_ID.sel(ID=10) tells you the corresponding mapped original_id of event ID 10 at every time,
  • area is the area of each event as a function of time,
  • centroid is the (x,y) centroid of each event as a function of time,
  • presence indicates the presence of each event at each time (anywhere in space),
  • time_start and time_end are the start and end times of each event,
  • merge_ledger gives the sibling IDs (matching ID_field) of each merging event. Values of -1 indicate no merging event occurred.

Additionally, if running with return_merges=True, the resulting xarray dataset merges_ds will have the following structure & entries:

xarray.Dataset 
Dimensions: (merge_ID, parent_idx, child_idx) 
Data variables:
    parent_IDs      (merge_ID, parent_idx)  int32       ndarray
    child_IDs       (merge_ID, child_idx)   int32       ndarray
    overlap_areas   (merge_ID, parent_idx)  int32       ndarray
    merge_time      (merge_ID)              datetime64  ndarray
    n_parents       (merge_ID)              int8        ndarray
    n_children      (merge_ID)              int8        ndarray

where

  • parent_IDs and child_IDs are the original parent and child IDs of each merging event,
  • overlap_areas is the area of overlap between the parent and child objects in each merging event,
  • merge_time is the time of each merging event,
  • n_parents and n_children are the number of parent and child objects participating in each merging event.

Arguments for marEx.tracker() include:

  • data_bin: The binary field of events to group & label. Must represent an underlying dask array.
  • mask: The land-sea mask to apply to the binary field, indicating points to keep.
  • area_filter_quartile: The fraction of the smallest objects to discard, i.e. the quantile defining the smallest area object retained.
  • R_fill: The size of the structuring element used in morphological opening & closing, relating to the largest hole that can be filled. In units of pixels.
  • T_fill: The permissible temporal gap between objects for tracking continuity to be maintained. Default is 2 days.
  • allow_merging:
    • True: (Default) Apply splitting & merging criteria, track merge events, and maintain original identities of merged objects across time.
    • False: Classical ndmeasure.label with simple time connectivity, i.e. Scannell et al.
  • nn_partitioning:
    • True: (Default) Implement a better partitioning of merged child objects based on closest parent cell.
    • False: Use the parent centroids to determine partitioning between new child objects, i.e. Di Sun & Bohai Zhang 2023. N.B.: This has major problems with small merging objects suddenly obtaining unrealistically-large (and often disjoint) fractions of the larger object.
  • overlap_threshold: The fraction of the smaller object's area that must overlap with the larger object's area to be considered the same event and continue tracking with the same ID. Default is 0.5.
  • timedim, xdim, ydim: The names of the time, latitude, and longitude dimensions in the data array. Default is ('time', 'lat', 'lon').

See, e.g. ./examples/unstructured data/02_id_track_events.ipynb for a detailed example of identification, tracking, & merging on an unstructured grid.

Installation

PyPI

To install the full package, run: pip install marEx[full]
Note: JAX may be difficult to install on some systems. If you encounter issues, you can install without JAX support by running: pip install marEx

GitHub

  1. Clone marEx: git clone https://github.com/wienkers/marEx.git
  2. Change to the parent directory of marEx
  3. Install marEx with pip install -e ./marEx`.
    This will allow changes you make locally, to be reflected when you import the package in Python

Please contact Aaron Wienkers with any questions, comments, issues, or bugs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

marex-2.0.0.tar.gz (38.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

marex-2.0.0-py3-none-any.whl (57.3 kB view details)

Uploaded Python 3

File details

Details for the file marex-2.0.0.tar.gz.

File metadata

  • Download URL: marex-2.0.0.tar.gz
  • Upload date:
  • Size: 38.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for marex-2.0.0.tar.gz
Algorithm Hash digest
SHA256 29845fc7a4f956dfea0f5ae5116187dad233b3bfa42c29533688cf41eed2719a
MD5 439082dbac7e956d3e26dbbf8ca59364
BLAKE2b-256 808d102855384d9711e8b38ec02c23be5d0c030e69dc76716312f00a51c09b01

See more details on using hashes here.

File details

Details for the file marex-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: marex-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 57.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for marex-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e862e939be95cafe784924833e66ff60b6ebc9e61dd7a93eb71afe36fe0d8505
MD5 818313bb1fd409fcb6319465a0f40bb2
BLAKE2b-256 e6508753bea6300378f64aac78637e55f399509213f35241c2c14ee15336a7b5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page