Joint inference of multiplicative and additive systematics in galaxy clustering

These details have not been verified by PyPI

Project links

Project description

sys_mapping

Joint inference of multiplicative and additive systematics in galaxy clustering. Implements six decontamination methods with a shared contamination model, template normalisation, noise debiasing, and two-point function correction infrastructure.

Core method: Berlfein et al. 2024 (MNRAS 531, 4954).

Six decontamination methods

Method	Model	Parameter estimation
OLS	Additive	Ordinary least-squares pixel regression
ElasticNet	Additive	ℓ₁+ℓ₂-regularised regression, cross-validated
ISD-1	Additive–multiplicative	Iterative reweighted OLS, poly order 1
ISD-3	Additive–multiplicative	Iterative reweighted OLS, poly order 3
MCMC-add	Additive	Bayesian posterior sampling (emcee), b=0
MCMC-comb	Combined	Bayesian posterior sampling (emcee), free a, b

All methods produce per-galaxy weights (WEIGHT_SYS / WEIGHT_COMB) for use downstream in two-point function estimators.

Pipeline overview

Raw catalogs / randoms
        │
        ▼
scripts/build_systematic_maps.py   ← build HEALPix template maps (GAIA DR2, LS10)
        │
        ▼
    Template FITS files  (LS10_EBV_NSIDE_0064.fits, GAIA_nstar_faint_NSIDE_00064.fits, …)
        │
        ├──► scripts/run_ls10_analysis.py              ← LS10 BGS per-method weights + w(θ)
        │         (or scripts/run_all_methods_sequential.sh for phased multi-method run)
        ├──► scripts/run_mock_analysis.py              ← parameter recovery on mocks
        ├──► scripts/run_validation.py                 ← all methods × all scenarios
        ├──► scripts/run_systematic_tests.py           ← all methods × template configs
        └──► scripts/run_paper_validation.py           ← reproduce Berlfein 2024 Figs 2–8
                │
                ▼
        scripts/compute_sys_weights.py    ← batch per-galaxy weights → WEIGHT_SYS, WEIGHT_ADD, WEIGHT_COMB
                │
                ▼
        data/sys_weights/  ← consumed by sum_stat package

Method

Contamination model (Eq. 11–13)

The observed galaxy overdensity in pixel p is modelled as the linear combination of three nested models:

Model	Equation	Free params
Additive	`δ̂_g,p = δ_g,p + Σ_i a_i δ_{ti,p}`	a
Multiplicative	`δ̂_g,p = δ_g,p (1 + Σ_i a_i δ_{ti,p})`	a (b=a)
Combined	`δ̂_g,p = δ_g,p (1 + Σ_i b_i δ_{ti,p}) + Σ_i a_i δ_{ti,p}`	a, b

Likelihoods (Eq. 17–18)

Integrating out the true δ_g (assumed Gaussian with dispersion σ) gives:

ln L = −(N/2) ln(2πσ²) − Σ_p ln|1 + Σ_i b_i δ_{ti,p}|
       − (1/2σ²) Σ_p [(δ̂_g,p − Σ_i a_i δ_{ti,p}) / (1 + Σ_i b_i δ_{ti,p})]²

An optional skew-normal extension (Eq. 18) adds N ln 2 + Σ log Φ(γ δ_g,p / σ).

Template rotation (Appendix A)

When templates are correlated, the PCA-rotated basis δ'_t = Vᵀ δ_t (where V diagonalises the template covariance C = VDVᵀ) makes each contamination parameter independently identifiable. Parameters are transformed back to the original basis after inference.

Noise debiasing (Eq. 21)

Because E[â²] = a² + Var[â], the squared MLE is noise-inflated. The debiased estimate is:

ã²_i = max(â²_i − Var[â_i], 0)

Two-point correction (Eq. 15–16)

After debiasing, the observed angular correlation function is corrected:

ŵ_corr(θ) = [ŵ(θ) − Σ_i ã²_i C_{ti}(θ)] / [1 + Σ_i b̃²_i C_{ti}(θ)]

Model selection (Eq. 19)

Nested models are compared with the likelihood ratio test:

λ_LR = 2 [ln L(Θ̂) − ln L(Θ̂₀)] ~ χ²(r)

where r is the number of additional free parameters.

Scripts

`scripts/run_all_methods_sequential.sh`

Main orchestration script: runs all six methods on LS10 BGS in phases (fast methods first), writing incremental Sphinx documentation after each phase. Environment variables control behaviour:

# Full run (nohup so the terminal can be closed)
nohup bash scripts/run_all_methods_sequential.sh > logs/run_all.log 2>&1 &

# OLS-only quick preview (seconds)
METHODS="OLS" bash scripts/run_all_methods_sequential.sh

# Resume from MCMC-add (OLS and ElasticNet already done)
METHODS="MCMC-add MCMC-comb" bash scripts/run_all_methods_sequential.sh

# Force re-run all bins
FORCE=1 METHODS="OLS" bash scripts/run_all_methods_sequential.sh

Key environment variables: BINS, DEVICE (cpu/gpu), METHODS, FORCE, SKIP_LS10, CATALOG_DIR, LS10_NSIDE.

`scripts/build_systematic_maps.py`

Build HEALPix systematic template maps from GAIA DR2 or LS10 BGS randoms.

# LS10 maps (EBV, GALDEPTH, PSFSIZE, NOBS)
python scripts/build_systematic_maps.py --source ls10 --nside 32 64 128 256

# GAIA DR2 maps (G/BP/RP flux, star counts by magnitude)
python scripts/build_systematic_maps.py --source gaia --nside 64

`scripts/compute_sys_weights.py`

Batch pipeline: processes every *_DATA.fits / *_RAND.fits pair, runs all methods, writes per-galaxy systematic weights for the sum_stat package.

# All samples, default settings (NSIDE=64)
python scripts/compute_sys_weights.py

# GPU — vectorised walker evaluation
python scripts/compute_sys_weights.py --device gpu

# Single sample
python scripts/compute_sys_weights.py \
    --sample LS10_VLIM_ANY_10.5_Mstar_12.0_0.05_z_0.26_N_3263228

Weighting scheme

Column	Model	Formula
`WEIGHT_ADD`	Additive	`1 / max(1 + Σ_i a_i · t_i(p), 0.01)`
`WEIGHT_COMB`	Combined	`1 / max(1 + Σ_i b_i · t_i(p), 0.01)`
`WEIGHT_SYS`	Combined (recommended)	identical to `WEIGHT_COMB`

`scripts/run_ls10_analysis.py`

Full pipeline on LS10 BGS catalogs: overdensity → all methods → correction → w(θ) plots.

python scripts/run_ls10_analysis.py \
    --catalog-dir /path/to/BGS_VLIM_Mstar \
    --template-dir /path/to/systematics/ \
    --nside 64 --output-dir results/ls10/

`scripts/run_validation.py`

Multi-method, multi-scenario validation on synthetic mocks (none / additive / multiplicative / combined contamination). Applies all six methods and compares against known ground truth.

python scripts/run_validation.py --nside 64 --n-mocks 50 \
    --output-dir results/validation/

`scripts/run_systematic_tests.py`

Systematic test matrix: all methods × all template configurations (7 synthetic templates, tiers of single and combined contamination types).

python scripts/run_systematic_tests.py --nside 64

`scripts/run_mock_analysis.py`

Parameter recovery on mock galaxy catalogs (LRT statistics).

# Synthetic mocks (no data files required)
python scripts/run_mock_analysis.py --synthetic --n-mocks 20 --nside 32

# Real mocks
python scripts/run_mock_analysis.py \
    --mock-dir /path/to/mocks --rand-file /path/to/randoms.fits \
    --n-mocks 100 --output-dir results/mock_analysis/

`scripts/run_mock_analysis_diagnostic.py`

Deep-dive diagnostic figures for one synthetic mock (sky maps, template maps, weight histograms, per-template S/N).

python scripts/run_mock_analysis_diagnostic.py \
    --nside 64 --n-sys 5 --seed 0 \
    --output-dir results/mock_analysis_diagnostic/

`scripts/run_mock_analysis_progressive.py`

Progressive template contamination study: varies the number of contaminated templates (k=1,2,3) and contamination mode across mocks.

`scripts/run_mock_analysis_real_templates.py`

Validation with real GAIA stellar and LS10 depth templates on the actual LS10 south footprint (~22 000 pixels at NSIDE=64).

`scripts/run_paper_validation.py`

Reproduce Berlfein 2024 Figures 2–8 and Tables 2–3 on lognormal synthetic mocks.

# Quick test (NSIDE=32, 2 realisations)
python scripts/run_paper_validation.py --nside 32 --n-real 2 --output-dir /tmp/val

# Full paper settings (use HPC)
python scripts/run_paper_validation.py --nside 512 --n-real 119 \
    --n-walkers 250 --n-steps 1500 --n-burn 300

Documentation-generation scripts

Script	Output
`scripts/generate_results_ls10_summary.py`	`docs/results_ls10.rst` from `*_params.json` files
`scripts/generate_sample_pages.py`	Per-sample RST pages in `docs/`
`scripts/generate_recommendations.py`	`docs/results_ls10_recommendations.rst`
`scripts/plot_ls10_wtheta_corrected.py`	`docs/_static/results_ls10/wtheta_corrected_nside64.png`
`scripts/plot_runtime_scaling.py`	`docs/_static/runtime_scaling.png`
`scripts/patch_params_from_partials.py`	Patches `*_params.json` from per-method partial JSON files

`scripts/benchmark_corrfunc_vs_treecorr.py`

Performance comparison between Corrfunc and TreeCorr for the angular two-point function estimator.

Server execution (bash/ — background, all cores)

bash/ provides scripts that launch each pipeline in the background with nohup, writing timestamped logs to logs/ and a .pid file for process tracking. Each script sets OMP_NUM_THREADS=$(nproc) and JAX XLA flags.

# LS10 analysis
bash/ls10_analysis.sh
CATALOG_DIR=/path/to/BGS bash/ls10_analysis.sh --sample LS10_VLIM_ANY_10.5_...

# Paper validation
bash/paper_validation.sh
bash/paper_validation.sh --nside 512 --n-real 119   # full paper settings

# Mock analysis
bash/mock_analysis.sh --synthetic --n-mocks 20
bash/mock_analysis.sh --mock-dir /path/to/mocks --rand-file /path/to/randoms.fits

# Build template maps
bash/build_maps.sh --source ls10
bash/build_maps.sh --source gaia --nside 64 128

Monitoring:

tail -f logs/ls10_analysis_20260505_143200.log
kill -0 $(cat logs/ls10_analysis_20260505_143200.pid) && echo running || echo done
kill $(cat logs/ls10_analysis_20260505_143200.pid)

GPU notes

--device gpu lets JAX auto-detect the GPU backend. To enable GPU:

pip install 'jax[cuda12_pip]' \
    -f https://storage.googleapis.com/jax-releases/jax_cuda_releases.html

Installation

cd ~/sys_mapping
mamba env create -f environment.yml
mamba activate sys_map
pip install -e .

Tests

pytest tests/ -v

Test modules: test_contamination, test_correction, test_likelihood, test_maps, test_inference, test_model_selection, test_bootstrap, test_power_spectrum, test_regression, test_diagnostics, test_mocks, test_real_templates, test_accuracy, test_benchmarks, test_timing.

Documentation

Full Sphinx documentation is in docs/, covering: overview, installation, quickstart, methods with paper references, validation results, LS10 BGS analysis results (9 samples × 4 NSIDEs), and full API reference.

cd docs && make html
# open docs/_build/html/index.html

Package layout

Module	Public symbols
`contamination`	`apply_contamination`, `invert_contamination`, `compute_two_point_correction`, `pack_params`, `unpack_params`, `n_free_params`
`likelihood`	`make_log_likelihood` — factory returning a `@jax.jit` log-likelihood (Gaussian or skew-normal)
`maps`	`systematic_power_spectrum`, `generate_systematic_map`, `generate_systematic_maps`, `load_real_template`, `load_real_templates`, `pixelize_catalog`, `compute_overdensity`, `assign_template_values`
`inference`	`make_log_prob`, `run_mcmc`, `get_mle_params`, `get_param_variance_from_chain`, `get_param_covariance_from_chain`
`correction`	`debias_params` (Eq. 21), `rotate_templates` (App. A), `transform_params_from_rotated`, `correct_two_point_function`, `correct_power_spectrum_harmonic`
`model_selection`	`likelihood_ratio_test` → `LikelihoodRatioResult` (Eq. 19)
`bootstrap`	`block_bootstrap_variance` — spatial block bootstrap via HEALPix coarsening (Sec. 6.2); `jackknife_covariance`
`regression`	`elasticnet_contamination_fit`, `iterative_systematics_decontamination`, `method_comparison`, `run_decontamination`
`diagnostics`	`null_test_cross_correlations`, `snr_template_ranking`, `footprint_mask_diagnostics`
`mocks`	`generate_lognormal_field`, `make_galactic_mask`, `make_mock_catalog`, `make_mock_suite`, `MockCatalog`
`power_spectrum`	`measure_pseudo_cl`, `subtract_template_cl`, `harmonic_bias`, `mode_projection_bias`
`utils`	`compute_covariance_matrix` (Eq. 24), `compute_amplitude_bias` (Eq. 25-26), `measure_two_point_function` (Landy-Szalay via TreeCorr), `measure_two_point_function_corrfunc`, `measure_kk_correlation_treecorr`, `measure_kk_correlation_corrfunc`
`plotting`	`METHOD_COLORS`, `METHOD_LINESTYLES`, `METHOD_MARKERS`, `METHOD_LABELS`, `METHOD_ORDER` — shared colorblind-safe palette constants

All symbols above are importable directly from sys_mapping. LikelihoodRatioResult must be imported as from sys_mapping.model_selection import LikelihoodRatioResult.

Berlfein 2024 equation cross-reference

Equation	Description	Implementation
Eq. 11	Additive model	`contamination.py:apply_contamination` (model='additive')
Eq. 12	Multiplicative model	`contamination.py:apply_contamination` (model='multiplicative')
Eq. 13	Combined model	`contamination.py:apply_contamination` (model='combined')
Eq. 15-16	2PCF correction	`contamination.py:compute_two_point_correction`; `correction.py:correct_two_point_function`
Eq. 17	Gaussian log-likelihood + Jacobian	`likelihood.py:make_log_likelihood` (skew=False)
Eq. 18	Skew-normal log-likelihood	`likelihood.py:make_log_likelihood` (skew=True)
Eq. 19	Likelihood ratio test	`model_selection.py:likelihood_ratio_test`
Eq. 21	Noise debiasing	`correction.py:debias_params`
Eq. 24	Template covariance matrix	`utils.py:compute_covariance_matrix`
Eq. 25-26	Amplitude bias ΔĀ	`utils.py:compute_amplitude_bias`
App. A	PCA template rotation C = VDVᵀ	`correction.py:rotate_templates` + `transform_params_from_rotated`

Data files

Generated outputs (`data/sys_weights/` — git-ignored)

Produced by scripts/compute_sys_weights.py or scripts/run_ls10_analysis.py:

File	Contents
`{sample_id}_NSIDE{NNNN}_WEIGHTS.fits`	Per-galaxy systematic weights (`WEIGHT_SYS`, `WEIGHT_ADD`, `WEIGHT_COMB`)
`{sample_id}_NSIDE{NNNN}_params.json`	MAP parameters and chain diagnostics for all methods
`{sample_id}_NSIDE{NNNN}_partial_*.json`	Partial results from fast-method phases

Paper tables and simulation configs (`data/` — tracked)

File	Contents
`data/paper_tables/table1_mock_properties.csv`	KiDS-HOD mock survey properties
`data/paper_tables/table2_lrt_results.csv`	LRT model-selection fractions
`data/paper_tables/table3_cosmological_impact.csv`	Amplitude bias ΔĀ per model
`data/simulation_config/contamination_params_25sys.csv`	25 contamination parameters used in the paper
`data/simulation_config/angular_bins.csv`	Log-spaced angular bins (0.5–60 arcmin)

Equations vs. implementation — notes

Combined model: the code implements the exact paper form δ̂_g = δ_g(1+b·t) + a·t (Eq. 13). The full product (δ_g + a·t)(1+b·t) differs by the second-order term a·t·b·t, which is negligible for |a|, |b| ≪ 1.
Jacobian sign: the Gaussian likelihood subtracts Σ ln|1+b·t| (positive Jacobian of the forward map), consistent with Eq. 17.
Skew-normal: the CDF term is Σ log Φ(γ·r/σ) = Σ log_ndtr(z), which includes the N·ln 2 from the 2/σ prefactor. A previous version used Σ log_ndtr(√2·z) (off by √2 in the erf argument); this is corrected in commit 10c01d0.
Variance propagation through PCA: Var[a_orig] = Vᵀ · diag(Var[a_rot]) · V is correct when the rotated parameters are approximately uncorrelated.

Reference

Berlfein et al. 2024, Joint inference of multiplicative and additive systematics in galaxy clustering, MNRAS 531, 4954, arXiv:2401.12293.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.9

May 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sys_mapping-0.9.tar.gz (80.6 kB view details)

Uploaded May 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sys_mapping-0.9-py3-none-any.whl (56.7 kB view details)

Uploaded May 11, 2026 Python 3

File details

Details for the file sys_mapping-0.9.tar.gz.

File metadata

Download URL: sys_mapping-0.9.tar.gz
Upload date: May 11, 2026
Size: 80.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for sys_mapping-0.9.tar.gz
Algorithm	Hash digest
SHA256	`cf118ba96b7da3ef892d1f46aead7c3d449932c3eb098edafa0b462245522dda`
MD5	`a357641c54e0f614e0bdb20e65d5ff86`
BLAKE2b-256	`23f9453f288c0d982ca8a618f0ec66836c285caa7dd1e301a8c8843d24415fe0`

See more details on using hashes here.

File details

Details for the file sys_mapping-0.9-py3-none-any.whl.

File metadata

Download URL: sys_mapping-0.9-py3-none-any.whl
Upload date: May 11, 2026
Size: 56.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for sys_mapping-0.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a678af85abaf4224cc19416869a175506f8da4fde1946fef190284d2ebe213b2`
MD5	`fe2bd20da207b01bcf92eb9008e95dc4`
BLAKE2b-256	`c5cb6390e73854ce271fb8f6f845d5e45320a64f69b1903099bbf738627a8b48`

See more details on using hashes here.

sys-mapping 0.9

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

sys_mapping

Six decontamination methods

Pipeline overview

Method

Contamination model (Eq. 11–13)

Likelihoods (Eq. 17–18)

Template rotation (Appendix A)

Noise debiasing (Eq. 21)

Two-point correction (Eq. 15–16)

Model selection (Eq. 19)

Scripts

scripts/run_all_methods_sequential.sh

scripts/build_systematic_maps.py

scripts/compute_sys_weights.py

scripts/run_ls10_analysis.py

scripts/run_validation.py

scripts/run_systematic_tests.py

scripts/run_mock_analysis.py

scripts/run_mock_analysis_diagnostic.py

scripts/run_mock_analysis_progressive.py

scripts/run_mock_analysis_real_templates.py

scripts/run_paper_validation.py

Documentation-generation scripts

scripts/benchmark_corrfunc_vs_treecorr.py

Server execution (bash/ — background, all cores)

Installation

Tests

Documentation

Package layout

Berlfein 2024 equation cross-reference

Data files

Generated outputs (data/sys_weights/ — git-ignored)

Paper tables and simulation configs (data/ — tracked)

Equations vs. implementation — notes

Reference

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`scripts/run_all_methods_sequential.sh`

`scripts/build_systematic_maps.py`

`scripts/compute_sys_weights.py`

`scripts/run_ls10_analysis.py`

`scripts/run_validation.py`

`scripts/run_systematic_tests.py`

`scripts/run_mock_analysis.py`

`scripts/run_mock_analysis_diagnostic.py`

`scripts/run_mock_analysis_progressive.py`

`scripts/run_mock_analysis_real_templates.py`

`scripts/run_paper_validation.py`

`scripts/benchmark_corrfunc_vs_treecorr.py`

Generated outputs (`data/sys_weights/` — git-ignored)

Paper tables and simulation configs (`data/` — tracked)