DDPM pipeline for generating correlated CIB and tSZ extragalactic CMB foregrounds

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

AlexBM173

These details have not been verified by PyPI

Project links

Documentation

Project description

Learning Correlated Astrophysical Foregrounds with Denoising Diffusion Probabilistic Models

Overview

This repository implements a denoising diffusion probabilistic model (DDPM) pipeline for generating realistic synthetic maps of extragalactic cosmic microwave background (CMB) foregrounds. The model learns to generate correlated pairs of Cosmic Infrared Background (CIB) and thermal Sunyaev–Zeldovich (tSZ) maps from AGORA cosmological simulations, reproducing the statistical properties—power spectra, higher-order moments, and morphology—of the training data while preserving physically important cross-channel correlations.

The DDPM can be deployed as a differentiable prior in Bayesian inference pipelines (e.g., CMB lensing or kSZ analyses), as a tool for forecasting survey noise properties and component separation fidelity, or as a data augmentation pipeline for testing downstream analysis codes. The model is trained on 6°×6° flat-sky patches at 256×256 pixel resolution and includes options for fast sampling via DDIM acceleration.

This work is part of the MPhil in Data Intensive Science programme at the University of Cambridge.

Architecture

The pipeline consists of three stages:

Data Preparation: Raw HEALPix maps from the AGORA BAHAMAS simulation (hosted on Globus) are patched into 6°×6° flat-sky cutouts, masked at point-source and cluster thresholds, low-pass filtered at ℓ > 7000, and normalised to training-ready .npy arrays.
Training: Paired CIB and tSZ patches are stacked into 2-channel tensors of shape (N, 2, 256, 256), augmented with 4 rotations × horizontal flip (8× total), and used to train a U-Net-based DDPM via the denoising-diffusion-pytorch library. The U-Net architecture has dim=64, dim_mults=(1,2,4,8), and flash attention is enabled for efficiency. The diffusion schedule uses 1000 timesteps with a sigmoid noise schedule.
Sampling: A trained checkpoint generates batches of correlated CIB–tSZ map pairs. Standard sampling uses full DDPM (1000 reverse steps); DDIM sampling with fewer timesteps (e.g., 250 steps) is ~4× faster with minimal quality loss.

Package Modules

The foregrounds_diffusion/ package provides the following modules:

Module	Responsibility
`flatmaps.py`	Flat-sky Fourier utilities: power-spectrum conversion (`map2cl`, `cl2map`), map generation (`make_gaussian_realisation`), radial profiling, polarisation E/B↔Q/U conversion.
`preprocessing.py`	Data normalisation (`apply_maxmin_normalization`, `apply_stdnorm`), HEALPix patch extraction (`FlatCutter`, `get_patch_centers`), Fourier filtering (`get_lpf_hpf`, `bandpass_filter`, `wiener_filter`), and dataset splitting.
`statistics.py`	2D Gaussian fitting (`gaussian`, `moments`, `fitgaussian`) and summary statistics (`stats`).
`moments.py`	Power-spectrum summaries (`mean_cls`, `mean_cross_cls`) and higher-order moments (`compute_summed_moments`, `compute_cross_moments`).
`morphology.py`	Minkowski functionals (`compute_mfs`) and Minkowski tensors (`compute_minkowski_tensors`).
`stacking.py`	tSZ cluster stacking utilities (`select_snr_pixels`, `extract_cutouts`).
`masking.py`	Flat-sky peak masks (`get_peak_masks`, `inpaint_masked_regions`) and AGORA MDPL2 cluster/point-source masks (`get_point_source_mask_in_healpix`, `get_apodised_mdpl2_cluster_mask`, etc.).
`peak_counts.py`	Peak and minima counting statistics following Sabyr et al. (2024): `smooth_map`, `find_peaks`, `count_peaks_binned`, `compute_peak_minima_counts`. Requires only numpy/scipy.
`scattering_stats.py`	Scattering transform statistics: `compute_scattering_coefficients` (S1, S2), `compute_scattering_covariance` (C11), `scattering_summary`. Supports Cheng et al. or kymatio backends.
`train.py`	Training entry point (run via `accelerate launch train.py`). CLI: `--run-name`, `--steps`, `--batch-size`, `--lr`, `--wandb`.
`sample.py`	Sampling entry point (run via `accelerate launch sample.py`). CLI: `--checkpoint`, `--batches`, `--batch-size`, `--output`, `--sampling-timesteps` (DDIM), `--wandb`.

Installation

From PyPI

pip install foregrounds_diffusion

Optional Extras

The package includes optional dependencies for additional functionality:

# Development and testing
pip install foregrounds_diffusion[dev]

# Acceleration via Numba and quantimpy (Minkowski functionals)
pip install foregrounds_diffusion[fast]

# Building Sphinx documentation locally
pip install foregrounds_diffusion[docs]

# All of the above
pip install foregrounds_diffusion[dev,fast,docs]

From Source

Clone the repository and install in editable mode:

git clone https://github.com/AlexBM173/cmb_foregrounds_diffusion.git
cd cmb_foregrounds_diffusion
pip install -e ".[dev]"

Data

Globus Collections

The raw simulation files are distributed across two Globus collections. You will need a Globus account and the Globus Connect Personal client to transfer them.

Collection: Agora — full-sky HEALPix simulation maps (NSIDE=8192):

File	Globus path	Units
`agora_len_mag_cibmap_act_150ghz.fits`	`/components/cib/len/act/nocc/`	Jy/sr
`agora_len_mag_cibmap_act_150ghz.fits`	`/components/cib/len/act/uK/`	µK
`agora_ltszNG_bahamas80_bnd_unb_1.0e+12_1.0e+18_lensed.fits`	`/components/tsz/len/`	Compton-y

The preprocessing pipeline uses the Jy/sr CIB map and the Compton-y tSZ map. The µK CIB variant is provided for reference.

Collection: agora — halo catalogue slices:

Files	Globus path
`haloslc_rot_*.npz`	`halolc/`

The catalogue slices are concatenated and filtered by docs/tutorials/01_halo_catalogue.ipynb to produce data/halo_catalogue/halo_catalogue_m500gt3e14.npz, which is used by the cluster masking step.

Preprocessing

The full preprocessing pipeline runs across the first three tutorial notebooks:

01_halo_catalogue.ipynb — concatenates halo catalogue slices, filters to M_500c ≥ 3×10¹⁴ M☉, saves data/halo_catalogue/halo_catalogue_m500gt3e14.npz
02_masking.ipynb — loads raw FITS maps, applies 2 mJy point-source masking and apodised cluster masks, saves data/cib_150_masked.fits and data/tsz_150_masked.fits
03_patch_extraction.ipynb — extracts 6°×6° flat-sky patches at 256×256 resolution, low-pass filters at ℓ = 7000, normalises (CIB: z-score; tSZ: z-score), saves training-ready .npy arrays

Expected local data layout after preprocessing:

data/
├── agora_len_mag_cibmap_act_150ghz.fits         # raw CIB map (from Globus)
├── agora_ltszNG_bahamas80_...lensed.fits         # raw tSZ map (from Globus)
├── cib_150_masked.fits                           # after 02_masking
├── tsz_150_masked.fits                           # after 02_masking
├── halo_catalogue/
│   └── halo_catalogue_m500gt3e14.npz             # after 01_halo_catalogue
└── low_pass/
    └── 2mJy/
        ├── CIB_map_150GHz_256_st6_zscore_2mJy_lp.npy   # training-ready CIB
        ├── tSZ3_map_150GHz_256_st6_zscore_2mJy_lp.npy  # training-ready tSZ
        ├── gaussian_cib_tsz_2mJy_lp.npy                # Gaussian baseline
        └── norm_params_2mJy.npy                         # normalisation stats

Quick Start

Training

Train a new model with the default configuration:

accelerate launch foregrounds_diffusion/train.py --run-name my_run_v1

To enable Weights & Biases logging (see the Weights & Biases section for setup):

accelerate launch foregrounds_diffusion/train.py --run-name my_run_v1 --wandb

Checkpoints are saved to results/my_run_v1/model-{step}.pt every 5 steps (configurable via --checkpoint-freq).

Sampling with Full DDPM (1000 steps)

Generate samples from a trained checkpoint:

accelerate launch foregrounds_diffusion/sample.py \
  --checkpoint results/my_run_v1/model-20.pt \
  --batches 10 \
  --batch-size 16 \
  --output data/low_pass/2mJy/samples.npy

This generates 10 × 16 = 160 correlated CIB–tSZ patch pairs and saves them as a single .npy file with shape (160, 2, 256, 256).

Sampling with DDIM (250 steps, ~4× faster)

Use DDIM for faster sampling with minimal quality loss:

accelerate launch foregrounds_diffusion/sample.py \
  --checkpoint results/my_run_v1/model-20.pt \
  --batches 10 \
  --batch-size 16 \
  --output data/low_pass/2mJy/samples_ddim250.npy \
  --sampling-timesteps 250

The --sampling-timesteps argument accepts any integer < 1000. Typical choices are 50 (very fast, ~1s/patch), 100 (fast, ~2s/patch), or 250 (good quality/speed trade-off, ~0.5s/patch).

Weights & Biases

Weights & Biases (WandB) integration is optional and off by default. Both training and sampling can log to WandB with the --wandb flag.

Setup

Set your WandB API key before running:

export WANDB_API_KEY=<your_key>

To persist the key across sessions, add the line to your ~/.bashrc or ~/.zshrc:

echo 'export WANDB_API_KEY=<your_key>' >> ~/.bashrc

Logging

When enabled with the --wandb flag:

Training:

Logs train/loss per step
Logs CIB and tSZ sample image grids at each checkpoint milestone
Project name: cmb_foregrounds_diffusion

Sampling:

Logs sample image grids (visualisation of generated CIB and tSZ patches)
Saves the output .npy file as a WandB artifact for lineage tracking

Example with WandB

export WANDB_API_KEY=<your_key>
accelerate launch foregrounds_diffusion/train.py --run-name my_run_v1 --wandb

SLURM and HPC Clusters

For users with access to HPC clusters running SLURM, two shell scripts are provided to streamline job submission.

Training on a Single GPU

Edit train_slurm.sh to configure your run, then submit:

# Edit the variables at the top of the file
vim train_slurm.sh

# Submit the job
sbatch train_slurm.sh

Configuration Variables in train_slurm.sh:

Variable	Default	Purpose
`RUN_NAME`	`run_v1`	Run label; checkpoints saved to `results/<RUN_NAME>/`
`USE_WANDB`	`false`	Set to `true` to enable Weights & Biases logging

The script allocates:

1 GPU (Ampere, A100)
8 CPU cores
128 GB RAM
1–12 hour wall time

Sampling on Four GPUs

Edit sample_slurm.sh to configure your sampling run, then submit:

# Edit the variables at the top of the file
vim sample_slurm.sh

# Submit the job
sbatch sample_slurm.sh

Configuration Variables in sample_slurm.sh:

Variable	Default	Purpose
`CHECKPOINT`	`results/run_v1/model-20.pt`	Path to trained checkpoint
`OUTPUT`	`data/low_pass/2mJy/samples.npy`	Output `.npy` file path
`BATCHES`	`10`	Number of sampling batches
`BATCH_SIZE`	`16`	Samples per batch per GPU; total samples = `BATCHES × BATCH_SIZE × 4`
`SAMPLING_TIMESTEPS`	(empty)	Leave empty for full DDPM (1000 steps); set to an integer (e.g., `250`) for DDIM
`USE_WANDB`	`false`	Set to `true` to enable Weights & Biases logging

The script allocates:

4 GPUs (Ampere, A100)
8 CPU cores per GPU
128 GB RAM
2 hour wall time

Multi-GPU DDIM Sampling Example

To sample 640 CIB–tSZ patches with DDIM in 250 steps on the cluster:

# Edit sample_slurm.sh:
# BATCHES=10
# BATCH_SIZE=16
# SAMPLING_TIMESTEPS=250

sbatch sample_slurm.sh
# Total samples generated: 10 × 16 × 4 GPUs = 640 patches
# Expected wall time: ~30 minutes for 250-step DDIM sampling

Development

Running Tests

Install development dependencies and run the test suite:

pip install -e ".[dev]"
pytest tests/ -v

Pre-commit Hooks

Install pre-commit hooks to lint and format code before each commit:

pre-commit install

The hooks run ruff for linting and formatting, plus checks for trailing whitespace, YAML/TOML validity, and merge conflicts.

Building Documentation Locally

Install documentation dependencies and build the Sphinx docs:

pip install -e ".[docs]"
sphinx-build docs/ docs/_build/html

The built HTML documentation will be in docs/_build/html/index.html. Alternatively, use:

make -C docs html

Documentation is automatically deployed to https://cmb-foregrounds-diffusion.readthedocs.io/ on each push to the main branch.

Citation

If you use this code in your research, please cite:

@thesis{BlakeMartin2026,
  author    = {Alex Blake Martin},
  title     = {Learning Correlated Astrophysical Foregrounds with Denoising Diffusion Probabilistic Models},
  year      = {2026},
  school    = {University of Cambridge},
  type      = {MPhil thesis},
}

License

This project is licensed under the MIT License. See the LICENSE file for details.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

AlexBM173

These details have not been verified by PyPI

Project links

Documentation

Release history Release notifications | RSS feed

This version

0.1.1

Jul 1, 2026

0.1.0

Jul 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

foregrounds_diffusion-0.1.1.tar.gz (55.8 kB view details)

Uploaded Jul 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

foregrounds_diffusion-0.1.1-py3-none-any.whl (41.3 kB view details)

Uploaded Jul 1, 2026 Python 3

File details

Details for the file foregrounds_diffusion-0.1.1.tar.gz.

File metadata

Download URL: foregrounds_diffusion-0.1.1.tar.gz
Upload date: Jul 1, 2026
Size: 55.8 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for foregrounds_diffusion-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`3d4bccbe267f32dff6c05ee099ffed5727490ea5097ad40dea14a3b830f9fc09`
MD5	`8c261519c5d9f9abbf82eda0c6c965f2`
BLAKE2b-256	`ea44332fc34ffdc3aa409c5c1f7ffffacac369047b8dc1dbcb0cc3a147edb79a`

See more details on using hashes here.

Provenance

The following attestation bundles were made for foregrounds_diffusion-0.1.1.tar.gz:

Publisher: publish.yml on AlexBM173/cmb_foregrounds_diffusion

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: foregrounds_diffusion-0.1.1.tar.gz
- Subject digest: 3d4bccbe267f32dff6c05ee099ffed5727490ea5097ad40dea14a3b830f9fc09
- Sigstore transparency entry: 2040959797
- Sigstore integration time: Jul 1, 2026
Source repository:
- Permalink: AlexBM173/cmb_foregrounds_diffusion@480c6d885f33783a2c95ba7b220c5f84979577f9
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/AlexBM173
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@480c6d885f33783a2c95ba7b220c5f84979577f9
- Trigger Event: push

File details

Details for the file foregrounds_diffusion-0.1.1-py3-none-any.whl.

File metadata

Download URL: foregrounds_diffusion-0.1.1-py3-none-any.whl
Upload date: Jul 1, 2026
Size: 41.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.8

File hashes

Hashes for foregrounds_diffusion-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`520fa057a78485e22df96bb356f7fa88d4b60b63185f2b527fc1403ac2db5bca`
MD5	`e696d954faf8c325de736327f54d0896`
BLAKE2b-256	`9e8a31b20b7b3cf686ac0caa97bd34b90b40749a8eb2b69037accc7977b2d180`

See more details on using hashes here.

Provenance

The following attestation bundles were made for foregrounds_diffusion-0.1.1-py3-none-any.whl:

Publisher: publish.yml on AlexBM173/cmb_foregrounds_diffusion

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: foregrounds_diffusion-0.1.1-py3-none-any.whl
- Subject digest: 520fa057a78485e22df96bb356f7fa88d4b60b63185f2b527fc1403ac2db5bca
- Sigstore transparency entry: 2040960034
- Sigstore integration time: Jul 1, 2026
Source repository:
- Permalink: AlexBM173/cmb_foregrounds_diffusion@480c6d885f33783a2c95ba7b220c5f84979577f9
- Branch / Tag: refs/tags/v0.1.1
- Owner: https://github.com/AlexBM173
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@480c6d885f33783a2c95ba7b220c5f84979577f9
- Trigger Event: push

foregrounds-diffusion 0.1.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Learning Correlated Astrophysical Foregrounds with Denoising Diffusion Probabilistic Models

Overview

Architecture

Package Modules

Installation

From PyPI

Optional Extras

From Source

Data

Globus Collections

Preprocessing

Quick Start

Training

Sampling with Full DDPM (1000 steps)

Sampling with DDIM (250 steps, ~4× faster)

Weights & Biases

Setup

Logging

Example with WandB

SLURM and HPC Clusters

Training on a Single GPU

Sampling on Four GPUs

Multi-GPU DDIM Sampling Example

Development

Running Tests

Pre-commit Hooks

Building Documentation Locally

Citation

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance