Skip to main content

Joint inference of multiplicative and additive systematics in galaxy clustering

Project description

sys_mapping

Joint inference of multiplicative and additive systematics in galaxy clustering. Implements six decontamination methods with a shared contamination model, template normalisation, noise debiasing, and two-point function correction infrastructure.

Core method: Berlfein et al. 2024 (MNRAS 531, 4954).


Six decontamination methods

Method Model Parameter estimation
OLS Additive Ordinary least-squares pixel regression
ElasticNet Additive ℓ₁+ℓ₂-regularised regression, cross-validated
ISD-1 Additive–multiplicative Iterative reweighted OLS, poly order 1
ISD-3 Additive–multiplicative Iterative reweighted OLS, poly order 3
MCMC-add Additive Bayesian posterior sampling (emcee), b=0
MCMC-comb Combined Bayesian posterior sampling (emcee), free a, b

All methods produce per-galaxy weights (WEIGHT_SYS / WEIGHT_COMB) for use downstream in two-point function estimators.


Pipeline overview

Raw catalogs / randoms
        │
        ▼
scripts/build_systematic_maps.py   ← build HEALPix template maps (GAIA DR2, LS10)
        │
        ▼
    Template FITS files  (LS10_EBV_NSIDE_0064.fits, GAIA_nstar_faint_NSIDE_00064.fits, …)
        │
        ├──► scripts/run_ls10_analysis.py              ← LS10 BGS per-method weights + w(θ)
        │         (or scripts/run_all_methods_sequential.sh for phased multi-method run)
        ├──► scripts/run_mock_analysis.py              ← parameter recovery on mocks
        ├──► scripts/run_validation.py                 ← all methods × all scenarios
        ├──► scripts/run_systematic_tests.py           ← all methods × template configs
        └──► scripts/run_paper_validation.py           ← reproduce Berlfein 2024 Figs 2–8
                │
                ▼
        scripts/compute_sys_weights.py    ← batch per-galaxy weights → WEIGHT_SYS, WEIGHT_ADD, WEIGHT_COMB
                │
                ▼
        data/sys_weights/  ← consumed by sum_stat package

Method

Contamination model (Eq. 11–13)

The observed galaxy overdensity in pixel p is modelled as the linear combination of three nested models:

Model Equation Free params
Additive δ̂_g,p = δ_g,p + Σ_i a_i δ_{ti,p} a
Multiplicative δ̂_g,p = δ_g,p (1 + Σ_i a_i δ_{ti,p}) a (b=a)
Combined δ̂_g,p = δ_g,p (1 + Σ_i b_i δ_{ti,p}) + Σ_i a_i δ_{ti,p} a, b

Likelihoods (Eq. 17–18)

Integrating out the true δ_g (assumed Gaussian with dispersion σ) gives:

ln L = −(N/2) ln(2πσ²) − Σ_p ln|1 + Σ_i b_i δ_{ti,p}|
       − (1/2σ²) Σ_p [(δ̂_g,p − Σ_i a_i δ_{ti,p}) / (1 + Σ_i b_i δ_{ti,p})]²

An optional skew-normal extension (Eq. 18) adds N ln 2 + Σ log Φ(γ δ_g,p / σ).

Template rotation (Appendix A)

When templates are correlated, the PCA-rotated basis δ'_t = Vᵀ δ_t (where V diagonalises the template covariance C = VDVᵀ) makes each contamination parameter independently identifiable. Parameters are transformed back to the original basis after inference.

Noise debiasing (Eq. 21)

Because E[â²] = a² + Var[â], the squared MLE is noise-inflated. The debiased estimate is:

ã²_i = max(â²_i − Var[â_i], 0)

Two-point correction (Eq. 15–16)

After debiasing, the observed angular correlation function is corrected:

ŵ_corr(θ) = [ŵ(θ) − Σ_i ã²_i C_{ti}(θ)] / [1 + Σ_i b̃²_i C_{ti}(θ)]

Model selection (Eq. 19)

Nested models are compared with the likelihood ratio test:

λ_LR = 2 [ln L(Θ̂) − ln L(Θ̂₀)] ~ χ²(r)

where r is the number of additional free parameters.


Scripts

scripts/run_all_methods_sequential.sh

Main orchestration script: runs all six methods on LS10 BGS in phases (fast methods first), writing incremental Sphinx documentation after each phase. Environment variables control behaviour:

# Full run (nohup so the terminal can be closed)
nohup bash scripts/run_all_methods_sequential.sh > logs/run_all.log 2>&1 &

# OLS-only quick preview (seconds)
METHODS="OLS" bash scripts/run_all_methods_sequential.sh

# Resume from MCMC-add (OLS and ElasticNet already done)
METHODS="MCMC-add MCMC-comb" bash scripts/run_all_methods_sequential.sh

# Force re-run all bins
FORCE=1 METHODS="OLS" bash scripts/run_all_methods_sequential.sh

Key environment variables: BINS, DEVICE (cpu/gpu), METHODS, FORCE, SKIP_LS10, CATALOG_DIR, LS10_NSIDE.

scripts/build_systematic_maps.py

Build HEALPix systematic template maps from GAIA DR2 or LS10 BGS randoms.

# LS10 maps (EBV, GALDEPTH, PSFSIZE, NOBS)
python scripts/build_systematic_maps.py --source ls10 --nside 32 64 128 256

# GAIA DR2 maps (G/BP/RP flux, star counts by magnitude)
python scripts/build_systematic_maps.py --source gaia --nside 64

scripts/compute_sys_weights.py

Batch pipeline: processes every *_DATA.fits / *_RAND.fits pair, runs all methods, writes per-galaxy systematic weights for the sum_stat package.

# All samples, default settings (NSIDE=64)
python scripts/compute_sys_weights.py

# GPU — vectorised walker evaluation
python scripts/compute_sys_weights.py --device gpu

# Single sample
python scripts/compute_sys_weights.py \
    --sample LS10_VLIM_ANY_10.5_Mstar_12.0_0.05_z_0.26_N_3263228

Weighting scheme

Column Model Formula
WEIGHT_ADD Additive 1 / max(1 + Σ_i a_i · t_i(p), 0.01)
WEIGHT_COMB Combined 1 / max(1 + Σ_i b_i · t_i(p), 0.01)
WEIGHT_SYS Combined (recommended) identical to WEIGHT_COMB

scripts/run_ls10_analysis.py

Full pipeline on LS10 BGS catalogs: overdensity → all methods → correction → w(θ) plots.

python scripts/run_ls10_analysis.py \
    --catalog-dir /path/to/BGS_VLIM_Mstar \
    --template-dir /path/to/systematics/ \
    --nside 64 --output-dir results/ls10/

scripts/run_validation.py

Multi-method, multi-scenario validation on synthetic mocks (none / additive / multiplicative / combined contamination). Applies all six methods and compares against known ground truth.

python scripts/run_validation.py --nside 64 --n-mocks 50 \
    --output-dir results/validation/

scripts/run_systematic_tests.py

Systematic test matrix: all methods × all template configurations (7 synthetic templates, tiers of single and combined contamination types).

python scripts/run_systematic_tests.py --nside 64

scripts/run_mock_analysis.py

Parameter recovery on mock galaxy catalogs (LRT statistics).

# Synthetic mocks (no data files required)
python scripts/run_mock_analysis.py --synthetic --n-mocks 20 --nside 32

# Real mocks
python scripts/run_mock_analysis.py \
    --mock-dir /path/to/mocks --rand-file /path/to/randoms.fits \
    --n-mocks 100 --output-dir results/mock_analysis/

scripts/run_mock_analysis_diagnostic.py

Deep-dive diagnostic figures for one synthetic mock (sky maps, template maps, weight histograms, per-template S/N).

python scripts/run_mock_analysis_diagnostic.py \
    --nside 64 --n-sys 5 --seed 0 \
    --output-dir results/mock_analysis_diagnostic/

scripts/run_mock_analysis_progressive.py

Progressive template contamination study: varies the number of contaminated templates (k=1,2,3) and contamination mode across mocks.

scripts/run_mock_analysis_real_templates.py

Validation with real GAIA stellar and LS10 depth templates on the actual LS10 south footprint (~22 000 pixels at NSIDE=64).

scripts/run_paper_validation.py

Reproduce Berlfein 2024 Figures 2–8 and Tables 2–3 on lognormal synthetic mocks.

# Quick test (NSIDE=32, 2 realisations)
python scripts/run_paper_validation.py --nside 32 --n-real 2 --output-dir /tmp/val

# Full paper settings (use HPC)
python scripts/run_paper_validation.py --nside 512 --n-real 119 \
    --n-walkers 250 --n-steps 1500 --n-burn 300

Documentation-generation scripts

Script Output
scripts/generate_results_ls10_summary.py docs/results_ls10.rst from *_params.json files
scripts/generate_sample_pages.py Per-sample RST pages in docs/
scripts/generate_recommendations.py docs/results_ls10_recommendations.rst
scripts/plot_ls10_wtheta_corrected.py docs/_static/results_ls10/wtheta_corrected_nside64.png
scripts/plot_runtime_scaling.py docs/_static/runtime_scaling.png
scripts/patch_params_from_partials.py Patches *_params.json from per-method partial JSON files

scripts/benchmark_corrfunc_vs_treecorr.py

Performance comparison between Corrfunc and TreeCorr for the angular two-point function estimator.


Server execution (bash/ — background, all cores)

bash/ provides scripts that launch each pipeline in the background with nohup, writing timestamped logs to logs/ and a .pid file for process tracking. Each script sets OMP_NUM_THREADS=$(nproc) and JAX XLA flags.

# LS10 analysis
bash/ls10_analysis.sh
CATALOG_DIR=/path/to/BGS bash/ls10_analysis.sh --sample LS10_VLIM_ANY_10.5_...

# Paper validation
bash/paper_validation.sh
bash/paper_validation.sh --nside 512 --n-real 119   # full paper settings

# Mock analysis
bash/mock_analysis.sh --synthetic --n-mocks 20
bash/mock_analysis.sh --mock-dir /path/to/mocks --rand-file /path/to/randoms.fits

# Build template maps
bash/build_maps.sh --source ls10
bash/build_maps.sh --source gaia --nside 64 128

Monitoring:

tail -f logs/ls10_analysis_20260505_143200.log
kill -0 $(cat logs/ls10_analysis_20260505_143200.pid) && echo running || echo done
kill $(cat logs/ls10_analysis_20260505_143200.pid)

GPU notes

--device gpu lets JAX auto-detect the GPU backend. To enable GPU:

pip install 'jax[cuda12_pip]' \
    -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

Installation

cd ~/sys_mapping
mamba env create -f environment.yml
mamba activate sys_map
pip install -e .

Tests

pytest tests/ -v

Test modules: test_contamination, test_correction, test_likelihood, test_maps, test_inference, test_model_selection, test_bootstrap, test_power_spectrum, test_regression, test_diagnostics, test_mocks, test_real_templates, test_accuracy, test_benchmarks, test_timing.


Documentation

Full Sphinx documentation is in docs/, covering: overview, installation, quickstart, methods with paper references, validation results, LS10 BGS analysis results (9 samples × 4 NSIDEs), and full API reference.

cd docs && make html
# open docs/_build/html/index.html

Package layout

Module Public symbols
contamination apply_contamination, invert_contamination, compute_two_point_correction, pack_params, unpack_params, n_free_params
likelihood make_log_likelihood — factory returning a @jax.jit log-likelihood (Gaussian or skew-normal)
maps systematic_power_spectrum, generate_systematic_map, generate_systematic_maps, load_real_template, load_real_templates, pixelize_catalog, compute_overdensity, assign_template_values
inference make_log_prob, run_mcmc, get_mle_params, get_param_variance_from_chain, get_param_covariance_from_chain
correction debias_params (Eq. 21), rotate_templates (App. A), transform_params_from_rotated, correct_two_point_function, correct_power_spectrum_harmonic
model_selection likelihood_ratio_testLikelihoodRatioResult (Eq. 19)
bootstrap block_bootstrap_variance — spatial block bootstrap via HEALPix coarsening (Sec. 6.2); jackknife_covariance
regression elasticnet_contamination_fit, iterative_systematics_decontamination, method_comparison, run_decontamination
diagnostics null_test_cross_correlations, snr_template_ranking, footprint_mask_diagnostics
mocks generate_lognormal_field, make_galactic_mask, make_mock_catalog, make_mock_suite, MockCatalog
power_spectrum measure_pseudo_cl, subtract_template_cl, harmonic_bias, mode_projection_bias
utils compute_covariance_matrix (Eq. 24), compute_amplitude_bias (Eq. 25-26), measure_two_point_function (Landy-Szalay via TreeCorr), measure_two_point_function_corrfunc, measure_kk_correlation_treecorr, measure_kk_correlation_corrfunc
plotting METHOD_COLORS, METHOD_LINESTYLES, METHOD_MARKERS, METHOD_LABELS, METHOD_ORDER — shared colorblind-safe palette constants

All symbols above are importable directly from sys_mapping. LikelihoodRatioResult must be imported as from sys_mapping.model_selection import LikelihoodRatioResult.


Berlfein 2024 equation cross-reference

Equation Description Implementation
Eq. 11 Additive model contamination.py:apply_contamination (model='additive')
Eq. 12 Multiplicative model contamination.py:apply_contamination (model='multiplicative')
Eq. 13 Combined model contamination.py:apply_contamination (model='combined')
Eq. 15-16 2PCF correction contamination.py:compute_two_point_correction; correction.py:correct_two_point_function
Eq. 17 Gaussian log-likelihood + Jacobian likelihood.py:make_log_likelihood (skew=False)
Eq. 18 Skew-normal log-likelihood likelihood.py:make_log_likelihood (skew=True)
Eq. 19 Likelihood ratio test model_selection.py:likelihood_ratio_test
Eq. 21 Noise debiasing correction.py:debias_params
Eq. 24 Template covariance matrix utils.py:compute_covariance_matrix
Eq. 25-26 Amplitude bias ΔĀ utils.py:compute_amplitude_bias
App. A PCA template rotation C = VDVᵀ correction.py:rotate_templates + transform_params_from_rotated

Data files

Generated outputs (data/sys_weights/ — git-ignored)

Produced by scripts/compute_sys_weights.py or scripts/run_ls10_analysis.py:

File Contents
{sample_id}_NSIDE{NNNN}_WEIGHTS.fits Per-galaxy systematic weights (WEIGHT_SYS, WEIGHT_ADD, WEIGHT_COMB)
{sample_id}_NSIDE{NNNN}_params.json MAP parameters and chain diagnostics for all methods
{sample_id}_NSIDE{NNNN}_partial_*.json Partial results from fast-method phases

Paper tables and simulation configs (data/ — tracked)

File Contents
data/paper_tables/table1_mock_properties.csv KiDS-HOD mock survey properties
data/paper_tables/table2_lrt_results.csv LRT model-selection fractions
data/paper_tables/table3_cosmological_impact.csv Amplitude bias ΔĀ per model
data/simulation_config/contamination_params_25sys.csv 25 contamination parameters used in the paper
data/simulation_config/angular_bins.csv Log-spaced angular bins (0.5–60 arcmin)

Equations vs. implementation — notes

  • Combined model: the code implements the exact paper form δ̂_g = δ_g(1+b·t) + a·t (Eq. 13). The full product (δ_g + a·t)(1+b·t) differs by the second-order term a·t·b·t, which is negligible for |a|, |b| ≪ 1.
  • Jacobian sign: the Gaussian likelihood subtracts Σ ln|1+b·t| (positive Jacobian of the forward map), consistent with Eq. 17.
  • Skew-normal: the CDF term is Σ log Φ(γ·r/σ) = Σ log_ndtr(z), which includes the N·ln 2 from the 2/σ prefactor. A previous version used Σ log_ndtr(√2·z) (off by √2 in the erf argument); this is corrected in commit 10c01d0.
  • Variance propagation through PCA: Var[a_orig] = Vᵀ · diag(Var[a_rot]) · V is correct when the rotated parameters are approximately uncorrelated.

Reference

Berlfein et al. 2024, Joint inference of multiplicative and additive systematics in galaxy clustering, MNRAS 531, 4954, arXiv:2401.12293.

Project details


Release history Release notifications | RSS feed

This version

0.9

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sys_mapping-0.9.tar.gz (80.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sys_mapping-0.9-py3-none-any.whl (56.7 kB view details)

Uploaded Python 3

File details

Details for the file sys_mapping-0.9.tar.gz.

File metadata

  • Download URL: sys_mapping-0.9.tar.gz
  • Upload date:
  • Size: 80.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for sys_mapping-0.9.tar.gz
Algorithm Hash digest
SHA256 cf118ba96b7da3ef892d1f46aead7c3d449932c3eb098edafa0b462245522dda
MD5 a357641c54e0f614e0bdb20e65d5ff86
BLAKE2b-256 23f9453f288c0d982ca8a618f0ec66836c285caa7dd1e301a8c8843d24415fe0

See more details on using hashes here.

File details

Details for the file sys_mapping-0.9-py3-none-any.whl.

File metadata

  • Download URL: sys_mapping-0.9-py3-none-any.whl
  • Upload date:
  • Size: 56.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for sys_mapping-0.9-py3-none-any.whl
Algorithm Hash digest
SHA256 a678af85abaf4224cc19416869a175506f8da4fde1946fef190284d2ebe213b2
MD5 fe2bd20da207b01bcf92eb9008e95dc4
BLAKE2b-256 c5cb6390e73854ce271fb8f6f845d5e45320a64f69b1903099bbf738627a8b48

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page