Solar phenomena prediction models

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

SDOFMv2: A Multi-Instrument Foundation Model for the Solar Dynamics Observatory with Transferable Downstream Applications

SDOFMv2 is an advanced multi-instrument foundation model for analyzing Solar Dynamics Observatory (SDO) data, designed to drive large-scale, data-driven heliophysics research. Building on the original SDOFM framework, this version improves spatial coherence and global consistency by addressing limitations in temporal coverage and reconstruction artifacts.

Model architecture

A Masked Autoencoder (MAE) built on a Vision Transformer (ViT) backbone is used for pretraining. During training, a% of image patches are masked; the remaining (100 - a)% are processed by the encoder. The decoder then reconstructs all patches, optimized via a customized loss function.

Getting Started
Repository Structure
Data Preparation
Training & Evaluation
Results & Visualizations
Citation
Acknowledgments

Getting Started

For full documentation, please visit sdofmv2.readthedocs.io.

Prerequisites

Python 3.11+
NVIDIA GPU + CUDA toolkit (recommended for training)

Installation

We use mamba (or conda) for fast dependency resolution.

Hardware Note: sdofmv2_environment.yml is configured for CUDA 12.8 by default. If your system requires a different CUDA version (e.g., 11.8), edit the pip section in sdofmv2_environment.yml before running setup — change cu128 to the appropriate tag (e.g., cu118).

# Clone the repository
git clone https://github.com/Joaggi/sdofmv2.git
cd sdofmv2

# Create and activate the environment
# (installs PyTorch and the local package automatically)
mamba env create -f sdofmv2_environment.yml
mamba activate sdofmv2

Pretrained Weights

Pretrained model checkpoints are available on Hugging Face:

# Using the Hugging Face Hub
mamba install huggingface_hub -c conda-forge
huggingface-cli download joseph-gallego/sdofmv2

Repository Structure

.
├── assets/                     # Output images, model results, and test artifacts
├── configs/                    # YAML configurations for experiments
│   ├── downstream/             # Configs for downstream tasks (F10.7, solar wind)
│   ├── pretrain/               # Configs for MAE pretraining (AIA, HMI)
│   └── test_run/               # Configs for testing and quick validation
├── docs/                       # Sphinx documentation source files
├── notebooks/                  # Jupyter notebooks for analysis and visualization
│   ├── analysis/               # Attention maps, PCA, and masking analysis
│   └── downstream_apps/        # Downstream application demos (F10.7, missing data)
├── scripts/                    # Executable scripts for pipeline tasks
│   ├── analysis/               # Scripts for evaluating and plotting results
│   ├── data/                   # Data acquisition, conversion, and preprocessing
│   ├── evaluation/             # Model evaluation and inference scripts
│   ├── finetuning/             # Scripts for downstream finetuning
│   └── training/               # Pretraining scripts
├── src/
│   └── sdofmv2/
│       ├── core/               # Base model architectures and modules
│       ├── tasks/              # PyTorch Lightning modules for downstream tasks
│       └── utils/              # Helper functions, physical constants, and metrics
├── tests/                      # Unit tests (pytest)
├── pyproject.toml              # Project metadata and build dependencies
└── sdofmv2_environment.yml     # Mamba environment definition

Data Preparation

SDOFMv2 uses the SDOMLv2 dataset — a curated, multi-instrument dataset for the Solar Dynamics Observatory, hosted on NASA's HDRL S3 bucket. Data is streamed via s3fs and stored in the Zarr format.

Dataset Components

Component	Instrument	Data Type	Approx. Size	Description
`aia`	AIA	EUV Images	~7.2 TB	9 extreme ultraviolet channels capturing the solar atmosphere
`hmi`	HMI	Magnetograms	~713 GB	3-component vector magnetic field (Bx, By, Bz) for the solar photosphere
`all`	AIA + HMI	EUV Images & Magnetograms	~7.9 TB	13 channels combining AIA and HMI modalities

Storage: Zarr datasets require significant local disk space. Verify your target drive has sufficient capacity before downloading.

Downloading the Data

The download script is resumable — it checks for existing local files and only fetches what's missing.

# Download AIA only
python scripts/data/download_sdomlv2.py --target /path/to/your/storage --component aia

# Download HMI only
python scripts/data/download_sdomlv2.py --target /path/to/your/storage --component hmi

# Download the full dataset
python scripts/data/download_sdomlv2.py --target /path/to/your/storage --component both

Zarr Directory Layout

After download, the data is organized as follows:

data/
├── sdomlv2.zarr/                # AIA multi-channel dataset
│   ├── .zgroup                  # Group hierarchy metadata
│   ├── 2010/
│   │   ├── 131A/                # EUV channel (131 Å)
│   │   ├── 1600A/               # EUV channel (1600 Å)
│   │   └── ...                  # Other AIA channels (193, 211, 304, etc.)
│   └── ...
└── sdomlv2_hmi.zarr/            # HMI magnetic field dataset
    ├── .zgroup
    └── 2010/
        ├── Bx/                  # Magnetic field component
        └── By/                  # Magnetic field component

Unlike monolithic file formats (e.g., .fits), the chunked Zarr layout enables high-speed random access — data loaders can read specific time slices or channels without loading the full multi-terabyte dataset into memory.

Preprocessing

Before training or evaluation, you must compute temporal alignments and dataset statistics (such as normalizations and masks). This step creates an index file that significantly speeds up the data loading process.

# Preprocess data for AIA (default)
python scripts/data/preprocess.py --config-name pretrain_mae_AIA.yaml

# Preprocess data for HMI
python scripts/data/preprocess.py --config-name pretrain_mae_HMI.yaml

Note: The preprocessing script will process the data and output the index files to the directory specified in your configuration file.

Training & Evaluation

Pretraining

python scripts/training/pretrain.py --config-name pretrain_mae_AIA.yaml

Evaluation

python scripts/evaluation/test.py --config-name pretrain_mae_AIA.yaml

Downstream Finetuning

# Example: solar wind forecasting
python scripts/finetuning/run_solarwind.py --config-name solarwind_sdofmv2_ALL.yaml

Configuration files for all tasks are in configs/downstream/. Notebook-based walkthroughs are available in notebooks/downstream_apps/.

Results & Visualizations

Our MAE trained on AIA data successfully reconstructs SDO solar images at high quality.

Sample Visualization

Row 1: Ground-truth images. Row 2: Reconstructions at 0% masking ratio. Row 3: Reconstructions at 50% masking ratio.

Citation

If SDOFMv2 is useful in your research, please cite:

@misc{sdofmv2,
  author    = {Hong, Jinsu and Martin, Daniela and Gallego, Joseph},
  title     = {SDOFMv2: A Multi-Instrument Foundation Model for the Solar Dynamics Observatory with Transferable Downstream Applications},
  year      = {2026},
  publisher = {GitHub},
  journal   = {GitHub repository},
  howpublished = {\url{https://github.com/Joaggi/sdofmv2}},
  note      = {Jinsu Hong, Daniela Martin, and Joseph Gallego contributed equally to this work}
}

Contributing

Contributions, bug reports, and feature requests are welcome! Please check the issues page or open a pull request.

Acknowledgments

This work builds on the SDOFM framework developed by Trillium Technologies Inc. We thank the creators of SDOMLv2 for providing the curated multi-wavelength training data, and the NASA Solar Dynamics Observatory mission for open data access.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

jinsuhong90

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Jun 26, 2026

0.1.1

Mar 10, 2026

0.1.0

Mar 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sdofmv2-0.2.0.tar.gz (97.0 kB view details)

Uploaded Jun 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sdofmv2-0.2.0-py3-none-any.whl (111.5 kB view details)

Uploaded Jun 26, 2026 Python 3

File details

Details for the file sdofmv2-0.2.0.tar.gz.

File metadata

Download URL: sdofmv2-0.2.0.tar.gz
Upload date: Jun 26, 2026
Size: 97.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sdofmv2-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`bf5d360bc0274fb8dcc14e7693806a34bb7217900f754c15364353ceeb71d985`
MD5	`32f6da5cca97e1eb6a9a45d205a0b6cc`
BLAKE2b-256	`13106cf58e35c77bddab91e9d72cf0aa05fb6a8bdf7833114128b123b9f8949b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sdofmv2-0.2.0.tar.gz:

Publisher: publish.yml on Joaggi/sdofmv2

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sdofmv2-0.2.0.tar.gz
- Subject digest: bf5d360bc0274fb8dcc14e7693806a34bb7217900f754c15364353ceeb71d985
- Sigstore transparency entry: 1961107968
- Sigstore integration time: Jun 26, 2026
Source repository:
- Permalink: Joaggi/sdofmv2@4d1abf7211da6bc2661de0fc2347201309ff4753
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/Joaggi
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@4d1abf7211da6bc2661de0fc2347201309ff4753
- Trigger Event: release

File details

Details for the file sdofmv2-0.2.0-py3-none-any.whl.

File metadata

Download URL: sdofmv2-0.2.0-py3-none-any.whl
Upload date: Jun 26, 2026
Size: 111.5 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for sdofmv2-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`83a8cc20935d7b223372d4565266530d48670dacbeae3a9d81ed3960287377c7`
MD5	`08fe322feec018f45434e62f30758964`
BLAKE2b-256	`2d4cb98f825da308c7d8ee146ec4e91bc74458173eb3b35c359260b05f483eb3`

See more details on using hashes here.

Provenance

The following attestation bundles were made for sdofmv2-0.2.0-py3-none-any.whl:

Publisher: publish.yml on Joaggi/sdofmv2

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: sdofmv2-0.2.0-py3-none-any.whl
- Subject digest: 83a8cc20935d7b223372d4565266530d48670dacbeae3a9d81ed3960287377c7
- Sigstore transparency entry: 1961108070
- Sigstore integration time: Jun 26, 2026
Source repository:
- Permalink: Joaggi/sdofmv2@4d1abf7211da6bc2661de0fc2347201309ff4753
- Branch / Tag: refs/tags/v0.2.0
- Owner: https://github.com/Joaggi
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@4d1abf7211da6bc2661de0fc2347201309ff4753
- Trigger Event: release

sdofmv2 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

SDOFMv2: A Multi-Instrument Foundation Model for the Solar Dynamics Observatory with Transferable Downstream Applications

Table of Contents

Getting Started

Prerequisites

Installation

Pretrained Weights

Repository Structure

Data Preparation

Dataset Components

Downloading the Data

Zarr Directory Layout

Preprocessing

Training & Evaluation

Pretraining

Evaluation

Downstream Finetuning

Results & Visualizations

Citation

Contributing

Acknowledgments

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance