Joint inference of multiplicative and additive systematics in galaxy clustering
Project description
sys_mapping
Joint inference of multiplicative and additive systematics in galaxy clustering. Implements six decontamination methods with a shared contamination model, template normalisation, noise debiasing, and two-point function correction infrastructure.
Core method: Berlfein et al. 2024 (MNRAS 531, 4954).
Six decontamination methods
| Method | Model | Parameter estimation |
|---|---|---|
| OLS | Additive | Ordinary least-squares pixel regression |
| ElasticNet | Additive | ℓ₁+ℓ₂-regularised regression, cross-validated |
| ISD-1 | Additive–multiplicative | Iterative reweighted OLS, poly order 1 |
| ISD-3 | Additive–multiplicative | Iterative reweighted OLS, poly order 3 |
| MCMC-add | Additive | Bayesian posterior sampling (emcee), b=0 |
| MCMC-comb | Combined | Bayesian posterior sampling (emcee), free a, b |
All methods produce per-galaxy weights (WEIGHT_SYS / WEIGHT_COMB) for use
downstream in two-point function estimators.
Pipeline overview
Raw catalogs / randoms
│
▼
scripts/build_systematic_maps.py ← build HEALPix template maps (GAIA DR2, LS10)
│
▼
Template FITS files (LS10_EBV_NSIDE_0064.fits, GAIA_nstar_faint_NSIDE_00064.fits, …)
│
├──► scripts/run_ls10_analysis.py ← LS10 BGS per-method weights + w(θ)
│ (or scripts/run_all_methods_sequential.sh for phased multi-method run)
├──► scripts/run_mock_analysis.py ← parameter recovery on mocks
├──► scripts/run_validation.py ← all methods × all scenarios
├──► scripts/run_systematic_tests.py ← all methods × template configs
└──► scripts/run_paper_validation.py ← reproduce Berlfein 2024 Figs 2–8
│
▼
scripts/compute_sys_weights.py ← batch per-galaxy weights → WEIGHT_SYS, WEIGHT_ADD, WEIGHT_COMB
│
▼
data/sys_weights/ ← consumed by sum_stat package
Method
Contamination model (Eq. 11–13)
The observed galaxy overdensity in pixel p is modelled as the linear combination of three nested models:
| Model | Equation | Free params |
|---|---|---|
| Additive | δ̂_g,p = δ_g,p + Σ_i a_i δ_{ti,p} |
a |
| Multiplicative | δ̂_g,p = δ_g,p (1 + Σ_i a_i δ_{ti,p}) |
a (b=a) |
| Combined | δ̂_g,p = δ_g,p (1 + Σ_i b_i δ_{ti,p}) + Σ_i a_i δ_{ti,p} |
a, b |
Likelihoods (Eq. 17–18)
Integrating out the true δ_g (assumed Gaussian with dispersion σ) gives:
ln L = −(N/2) ln(2πσ²) − Σ_p ln|1 + Σ_i b_i δ_{ti,p}|
− (1/2σ²) Σ_p [(δ̂_g,p − Σ_i a_i δ_{ti,p}) / (1 + Σ_i b_i δ_{ti,p})]²
An optional skew-normal extension (Eq. 18) adds N ln 2 + Σ log Φ(γ δ_g,p / σ).
Template rotation (Appendix A)
When templates are correlated, the PCA-rotated basis δ'_t = Vᵀ δ_t (where V diagonalises the template covariance C = VDVᵀ) makes each contamination parameter independently identifiable. Parameters are transformed back to the original basis after inference.
Noise debiasing (Eq. 21)
Because E[â²] = a² + Var[â], the squared MLE is noise-inflated. The debiased estimate is:
ã²_i = max(â²_i − Var[â_i], 0)
Two-point correction (Eq. 15–16)
After debiasing, the observed angular correlation function is corrected:
ŵ_corr(θ) = [ŵ(θ) − Σ_i ã²_i C_{ti}(θ)] / [1 + Σ_i b̃²_i C_{ti}(θ)]
Model selection (Eq. 19)
Nested models are compared with the likelihood ratio test:
λ_LR = 2 [ln L(Θ̂) − ln L(Θ̂₀)] ~ χ²(r)
where r is the number of additional free parameters.
Scripts
scripts/run_all_methods_sequential.sh
Main orchestration script: runs all six methods on LS10 BGS in phases (fast methods first), writing incremental Sphinx documentation after each phase. Environment variables control behaviour:
# Full run (nohup so the terminal can be closed)
nohup bash scripts/run_all_methods_sequential.sh > logs/run_all.log 2>&1 &
# OLS-only quick preview (seconds)
METHODS="OLS" bash scripts/run_all_methods_sequential.sh
# Resume from MCMC-add (OLS and ElasticNet already done)
METHODS="MCMC-add MCMC-comb" bash scripts/run_all_methods_sequential.sh
# Force re-run all bins
FORCE=1 METHODS="OLS" bash scripts/run_all_methods_sequential.sh
Key environment variables: BINS, DEVICE (cpu/gpu), METHODS,
FORCE, SKIP_LS10, CATALOG_DIR, LS10_NSIDE.
scripts/build_systematic_maps.py
Build HEALPix systematic template maps from GAIA DR2 or LS10 BGS randoms.
# LS10 maps (EBV, GALDEPTH, PSFSIZE, NOBS)
python scripts/build_systematic_maps.py --source ls10 --nside 32 64 128 256
# GAIA DR2 maps (G/BP/RP flux, star counts by magnitude)
python scripts/build_systematic_maps.py --source gaia --nside 64
scripts/compute_sys_weights.py
Batch pipeline: processes every *_DATA.fits / *_RAND.fits pair, runs
all methods, writes per-galaxy systematic weights for the sum_stat package.
# All samples, default settings (NSIDE=64)
python scripts/compute_sys_weights.py
# GPU — vectorised walker evaluation
python scripts/compute_sys_weights.py --device gpu
# Single sample
python scripts/compute_sys_weights.py \
--sample LS10_VLIM_ANY_10.5_Mstar_12.0_0.05_z_0.26_N_3263228
Weighting scheme
| Column | Model | Formula |
|---|---|---|
WEIGHT_ADD |
Additive | 1 / max(1 + Σ_i a_i · t_i(p), 0.01) |
WEIGHT_COMB |
Combined | 1 / max(1 + Σ_i b_i · t_i(p), 0.01) |
WEIGHT_SYS |
Combined (recommended) | identical to WEIGHT_COMB |
scripts/run_ls10_analysis.py
Full pipeline on LS10 BGS catalogs: overdensity → all methods → correction → w(θ) plots.
python scripts/run_ls10_analysis.py \
--catalog-dir /path/to/BGS_VLIM_Mstar \
--template-dir /path/to/systematics/ \
--nside 64 --output-dir results/ls10/
scripts/run_validation.py
Multi-method, multi-scenario validation on synthetic mocks (none / additive / multiplicative / combined contamination). Applies all six methods and compares against known ground truth.
python scripts/run_validation.py --nside 64 --n-mocks 50 \
--output-dir results/validation/
scripts/run_systematic_tests.py
Systematic test matrix: all methods × all template configurations (7 synthetic templates, tiers of single and combined contamination types).
python scripts/run_systematic_tests.py --nside 64
scripts/run_mock_analysis.py
Parameter recovery on mock galaxy catalogs (LRT statistics).
# Synthetic mocks (no data files required)
python scripts/run_mock_analysis.py --synthetic --n-mocks 20 --nside 32
# Real mocks
python scripts/run_mock_analysis.py \
--mock-dir /path/to/mocks --rand-file /path/to/randoms.fits \
--n-mocks 100 --output-dir results/mock_analysis/
scripts/run_mock_analysis_diagnostic.py
Deep-dive diagnostic figures for one synthetic mock (sky maps, template maps, weight histograms, per-template S/N).
python scripts/run_mock_analysis_diagnostic.py \
--nside 64 --n-sys 5 --seed 0 \
--output-dir results/mock_analysis_diagnostic/
scripts/run_mock_analysis_progressive.py
Progressive template contamination study: varies the number of contaminated templates (k=1,2,3) and contamination mode across mocks.
scripts/run_mock_analysis_real_templates.py
Validation with real GAIA stellar and LS10 depth templates on the actual LS10 south footprint (~22 000 pixels at NSIDE=64).
scripts/run_paper_validation.py
Reproduce Berlfein 2024 Figures 2–8 and Tables 2–3 on lognormal synthetic mocks.
# Quick test (NSIDE=32, 2 realisations)
python scripts/run_paper_validation.py --nside 32 --n-real 2 --output-dir /tmp/val
# Full paper settings (use HPC)
python scripts/run_paper_validation.py --nside 512 --n-real 119 \
--n-walkers 250 --n-steps 1500 --n-burn 300
Documentation-generation scripts
| Script | Output |
|---|---|
scripts/generate_results_ls10_summary.py |
docs/results_ls10.rst from *_params.json files |
scripts/generate_sample_pages.py |
Per-sample RST pages in docs/ |
scripts/generate_recommendations.py |
docs/results_ls10_recommendations.rst |
scripts/plot_ls10_wtheta_corrected.py |
docs/_static/results_ls10/wtheta_corrected_nside64.png |
scripts/plot_runtime_scaling.py |
docs/_static/runtime_scaling.png |
scripts/patch_params_from_partials.py |
Patches *_params.json from per-method partial JSON files |
scripts/benchmark_corrfunc_vs_treecorr.py
Performance comparison between Corrfunc and TreeCorr for the angular two-point function estimator.
Server execution (bash/ — background, all cores)
bash/ provides scripts that launch each pipeline in the background with
nohup, writing timestamped logs to logs/ and a .pid file for process
tracking. Each script sets OMP_NUM_THREADS=$(nproc) and JAX XLA flags.
# LS10 analysis
bash/ls10_analysis.sh
CATALOG_DIR=/path/to/BGS bash/ls10_analysis.sh --sample LS10_VLIM_ANY_10.5_...
# Paper validation
bash/paper_validation.sh
bash/paper_validation.sh --nside 512 --n-real 119 # full paper settings
# Mock analysis
bash/mock_analysis.sh --synthetic --n-mocks 20
bash/mock_analysis.sh --mock-dir /path/to/mocks --rand-file /path/to/randoms.fits
# Build template maps
bash/build_maps.sh --source ls10
bash/build_maps.sh --source gaia --nside 64 128
Monitoring:
tail -f logs/ls10_analysis_20260505_143200.log
kill -0 $(cat logs/ls10_analysis_20260505_143200.pid) && echo running || echo done
kill $(cat logs/ls10_analysis_20260505_143200.pid)
GPU notes
--device gpu lets JAX auto-detect the GPU backend. To enable GPU:
pip install 'jax[cuda12_pip]' \
-f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html
Installation
cd ~/sys_mapping
mamba env create -f environment.yml
mamba activate sys_map
pip install -e .
Tests
pytest tests/ -v
Test modules: test_contamination, test_correction, test_likelihood,
test_maps, test_inference, test_model_selection, test_bootstrap,
test_power_spectrum, test_regression, test_diagnostics, test_mocks,
test_real_templates, test_accuracy, test_benchmarks, test_timing.
Documentation
Full Sphinx documentation is in docs/, covering: overview, installation,
quickstart, methods with paper references, validation results, LS10 BGS
analysis results (9 samples × 4 NSIDEs), and full API reference.
cd docs && make html
# open docs/_build/html/index.html
Package layout
| Module | Public symbols |
|---|---|
contamination |
apply_contamination, invert_contamination, compute_two_point_correction, pack_params, unpack_params, n_free_params |
likelihood |
make_log_likelihood — factory returning a @jax.jit log-likelihood (Gaussian or skew-normal) |
maps |
systematic_power_spectrum, generate_systematic_map, generate_systematic_maps, load_real_template, load_real_templates, pixelize_catalog, compute_overdensity, assign_template_values |
inference |
make_log_prob, run_mcmc, get_mle_params, get_param_variance_from_chain, get_param_covariance_from_chain |
correction |
debias_params (Eq. 21), rotate_templates (App. A), transform_params_from_rotated, correct_two_point_function, correct_power_spectrum_harmonic |
model_selection |
likelihood_ratio_test → LikelihoodRatioResult (Eq. 19) |
bootstrap |
block_bootstrap_variance — spatial block bootstrap via HEALPix coarsening (Sec. 6.2); jackknife_covariance |
regression |
elasticnet_contamination_fit, iterative_systematics_decontamination, method_comparison, run_decontamination |
diagnostics |
null_test_cross_correlations, snr_template_ranking, footprint_mask_diagnostics |
mocks |
generate_lognormal_field, make_galactic_mask, make_mock_catalog, make_mock_suite, MockCatalog |
power_spectrum |
measure_pseudo_cl, subtract_template_cl, harmonic_bias, mode_projection_bias |
utils |
compute_covariance_matrix (Eq. 24), compute_amplitude_bias (Eq. 25-26), measure_two_point_function (Landy-Szalay via TreeCorr), measure_two_point_function_corrfunc, measure_kk_correlation_treecorr, measure_kk_correlation_corrfunc |
plotting |
METHOD_COLORS, METHOD_LINESTYLES, METHOD_MARKERS, METHOD_LABELS, METHOD_ORDER — shared colorblind-safe palette constants |
All symbols above are importable directly from sys_mapping.
LikelihoodRatioResult must be imported as
from sys_mapping.model_selection import LikelihoodRatioResult.
Berlfein 2024 equation cross-reference
| Equation | Description | Implementation |
|---|---|---|
| Eq. 11 | Additive model | contamination.py:apply_contamination (model='additive') |
| Eq. 12 | Multiplicative model | contamination.py:apply_contamination (model='multiplicative') |
| Eq. 13 | Combined model | contamination.py:apply_contamination (model='combined') |
| Eq. 15-16 | 2PCF correction | contamination.py:compute_two_point_correction; correction.py:correct_two_point_function |
| Eq. 17 | Gaussian log-likelihood + Jacobian | likelihood.py:make_log_likelihood (skew=False) |
| Eq. 18 | Skew-normal log-likelihood | likelihood.py:make_log_likelihood (skew=True) |
| Eq. 19 | Likelihood ratio test | model_selection.py:likelihood_ratio_test |
| Eq. 21 | Noise debiasing | correction.py:debias_params |
| Eq. 24 | Template covariance matrix | utils.py:compute_covariance_matrix |
| Eq. 25-26 | Amplitude bias ΔĀ | utils.py:compute_amplitude_bias |
| App. A | PCA template rotation C = VDVᵀ | correction.py:rotate_templates + transform_params_from_rotated |
Data files
Generated outputs (data/sys_weights/ — git-ignored)
Produced by scripts/compute_sys_weights.py or scripts/run_ls10_analysis.py:
| File | Contents |
|---|---|
{sample_id}_NSIDE{NNNN}_WEIGHTS.fits |
Per-galaxy systematic weights (WEIGHT_SYS, WEIGHT_ADD, WEIGHT_COMB) |
{sample_id}_NSIDE{NNNN}_params.json |
MAP parameters and chain diagnostics for all methods |
{sample_id}_NSIDE{NNNN}_partial_*.json |
Partial results from fast-method phases |
Paper tables and simulation configs (data/ — tracked)
| File | Contents |
|---|---|
data/paper_tables/table1_mock_properties.csv |
KiDS-HOD mock survey properties |
data/paper_tables/table2_lrt_results.csv |
LRT model-selection fractions |
data/paper_tables/table3_cosmological_impact.csv |
Amplitude bias ΔĀ per model |
data/simulation_config/contamination_params_25sys.csv |
25 contamination parameters used in the paper |
data/simulation_config/angular_bins.csv |
Log-spaced angular bins (0.5–60 arcmin) |
Equations vs. implementation — notes
- Combined model: the code implements the exact paper form
δ̂_g = δ_g(1+b·t) + a·t(Eq. 13). The full product(δ_g + a·t)(1+b·t)differs by the second-order term a·t·b·t, which is negligible for |a|, |b| ≪ 1. - Jacobian sign: the Gaussian likelihood subtracts
Σ ln|1+b·t|(positive Jacobian of the forward map), consistent with Eq. 17. - Skew-normal: the CDF term is
Σ log Φ(γ·r/σ) = Σ log_ndtr(z), which includes theN·ln 2from the2/σprefactor. A previous version usedΣ log_ndtr(√2·z)(off by √2 in the erf argument); this is corrected in commit10c01d0. - Variance propagation through PCA:
Var[a_orig] = Vᵀ · diag(Var[a_rot]) · Vis correct when the rotated parameters are approximately uncorrelated.
Reference
Berlfein et al. 2024, Joint inference of multiplicative and additive systematics in galaxy clustering, MNRAS 531, 4954, arXiv:2401.12293.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sys_mapping-0.9.tar.gz.
File metadata
- Download URL: sys_mapping-0.9.tar.gz
- Upload date:
- Size: 80.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cf118ba96b7da3ef892d1f46aead7c3d449932c3eb098edafa0b462245522dda
|
|
| MD5 |
a357641c54e0f614e0bdb20e65d5ff86
|
|
| BLAKE2b-256 |
23f9453f288c0d982ca8a618f0ec66836c285caa7dd1e301a8c8843d24415fe0
|
File details
Details for the file sys_mapping-0.9-py3-none-any.whl.
File metadata
- Download URL: sys_mapping-0.9-py3-none-any.whl
- Upload date:
- Size: 56.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a678af85abaf4224cc19416869a175506f8da4fde1946fef190284d2ebe213b2
|
|
| MD5 |
fe2bd20da207b01bcf92eb9008e95dc4
|
|
| BLAKE2b-256 |
c5cb6390e73854ce271fb8f6f845d5e45320a64f69b1903099bbf738627a8b48
|