Skip to main content

Causal Time Series Modeling of Supraglacial Lake Evolution in Greenland under Distribution Shift

Project description

RIC-TSC: Causal Time Series Modeling of Supraglacial Lake Evolution in Greenland under Distribution Shift

This repository provides the implementation for "Causal Time Series Modeling of Supraglacial Lake Evolution in Greenland under Distribution Shift", an accepted paper to ICMLA 2025. We introduce a regionally-informed causal framework that discovers lagged environmental drivers of supraglacial lake (SGL) evolution across Greenland and uses these causal signals for robust sequence modeling under spatial distribution shift.


Introduction

Supraglacial lakes (SGLs) exhibit complex spatiotemporal behaviors such as rapid drainage, slow drainage, refreezing, and burial. Accurate classification of lake evolution is critical to understanding meltwater runoff and ice sheet stability.

This repository presents a causally-informed modeling framework that identifies invariant environmental drivers across Greenland using Joint PCMCI+ (J-PCMCI+), and also captures region-specific causal mechanisms in individual basins. These causal predictors are then used in downstream sequence modeling to improve robustness and generalization under distribution shifts. We assess performance in global, in-distribution (ID), and out-of-distribution (OOD) settings.


Methodology

We construct daily multivariate time series from satellite and reanalysis sources:

  • Sentinel-1 SAR (HV backscatter anomaly)
  • Sentinel-2 and Landsat-8 optical imagery (NDWI-based water fraction, solar zenith)
  • CARRA-West reanalysis (temperature, humidity, pressure, SST, etc.)

J-PCMCI+ is applied globally and per region to identify lagged causal parents of HV_anom (horizontally transmitted, vertically received backscatter anomaly), a proxy for lake water presence. These causal features are then used for lake evolution classification.

RIC-TSC Methodology


Installation

Install the package in editable mode for development:

git clone [https://github.com/ehfahad/RIC-TSC.git](https://github.com/ehfahad/RIC-TSC.git)
cd RIC-TSC
pip install -e .

Directory Structure

RIC-TSC/
├── src/rictsc/                        # Core package logic   ├── utils/                         # Refactored helper functions   ├── preprocessing.py               # Preprocessing module   ├── causality.py                   # Causal feature module   └── classification.py              # RICTSCClassifier API
├── causality/                         # J-PCMCI+ causal discovery notebooks
├── data/                              # Raw, processed, and causal datasets
├── figures/                           # Methodology diagrams and experiment visualizations
├── results/                           # Output metrics, confusion matrices, GMM plots
├── tests/                             # Package sanity tests
├── pyproject.toml                     # Package metadata and dependencies
├── run_global_classification.py       # Global pooled classification script
└── run_regionwise_classification.py   # Region-wise ID and OOD classification script

Quickstart

1. Command Line Interface

Run the pipeline directly from the terminal using the installed entry points:

# Step 1: Preprocess time series for all lakes
rictsc-preprocess

# Step 2: Extract region-specific causal datasets
rictsc-causal

2. Python API

Integrate the RIC-TSC classifier into your own scripts:

from rictsc import RICTSCClassifier
import pandas as pd

# Initialize the classifier
model = RICTSCClassifier(seed=42)

# Load data and fit model on causal features
df = pd.read_csv("data/region_causal_datasets/CW_causal_timeseries.csv")
model.fit(df, feature_cols=["HV_anom_lag1", "S2_water", "r2"], label_col="label")

# Predict on new sequences
predictions = model.predict(test_df, feature_cols=["HV_anom_lag1", "S2_water", "r2"])

Experiments

We evaluate RIC-TSC under three experimental settings:

  • Global: Train/test on pooled lake data from all six regions using an 80/20 split stratified by region.
  • In-Distribution (ID): For each region, an 80/20 train/test split is applied to that region’s lakes.
  • Out-of-Distribution (OOD): Train on a single region and test on the remaining five, assessing generalization beyond the training domain.

Each setting compares two models:

  • Causal Model: Trained only on the lagged causal parents discovered by J-PCMCI+ for each region.
  • Baseline Model: Trained using all available features, with no causal feature selection or temporal lag filtering.

Performance is reported using overall accuracy, macro-averaged F1, precision, and recall.


Reproducing Results

To reproduce the experiments from the paper:

  1. Clone the repository: git clone https://github.com/ehfahad/RIC-TSC.git
  2. Run the experiment scripts:
    python run_global_classification.py
    python run_regionwise_classification.py
    
---

## Output Structure

```bash
results/
├── global_classification/
│   └── global_classification_results.csv  # Metrics for global experiment comparing causal vs. baseline models
│
├── region_specific_classification/
│   ├── id_results.csv                     # Region-wise ID results comparing causal vs. baseline models
│   └── ood_results.csv                    # OOD results where models are trained on one region and tested on the other five


Citation

This work is under submission. Please cite as:

@inproceedings{hossain2025rictsc,
  title={Causal Time Series Modeling of Supraglacial Lake Evolution in Greenland under Distribution Shift},
  author={Emam Hossain and Muhammad Hasan Ferdous and Devon Dunmire and Aneesh Subramanian and Md Osman Gani},
  booktitle={2025 International Conference on Machine Learning and Applications (ICMLA)},
  year={2025},
  organization={IEEE}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rictsc-0.1.2.tar.gz (15.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rictsc-0.1.2-py3-none-any.whl (15.4 kB view details)

Uploaded Python 3

File details

Details for the file rictsc-0.1.2.tar.gz.

File metadata

  • Download URL: rictsc-0.1.2.tar.gz
  • Upload date:
  • Size: 15.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for rictsc-0.1.2.tar.gz
Algorithm Hash digest
SHA256 9c933f7ef41704b5d72a128861d48ab2cba9bd0fa21657a237165c229dac1e39
MD5 eec5596e53ebead2d2aa5aeada3e554b
BLAKE2b-256 e61889098b0a669b69fca559f1c4d0ebceee0f087a5cf5fd72dc511a5db32c01

See more details on using hashes here.

File details

Details for the file rictsc-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: rictsc-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 15.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.5

File hashes

Hashes for rictsc-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 d5accffb7b133bc826f95231969a82b3297e2a160caa7c6cbc3f88d4966e8e8b
MD5 d46705b2747b9db3d0b3753d5dee6439
BLAKE2b-256 35034bc72842d6ba24e21829a8b1d6e2c48c4488f623c8ab49ed19bd0653315d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page