Skip to main content

Synthetic cluster injection, detection, matching, photometry, and NN completeness pipeline

Project description

Cluster Completeness Pipeline

Pipeline for synthetic cluster injection, detection, matching, 5-filter photometry with CI cut, and neural-network completeness learning. Used to measure and model detection completeness as a function of magnitude, mass, and age.

Overview

  1. Pipeline (stages 1–5): Inject synthetic clusters on white-light and 5-filter images → run SExtractor → match injected vs detected positions → run IRAF aperture photometry → apply concentration-index (CI) cut → write detection labels and catalogue.
  2. Build ML inputs: Assemble 3D detection array and property .npz from pipeline outputs (CFR order).
  3. NN training: Train an MLP to predict completeness from physical and photometric features; save best model and diagnostics.

Requirements

  • Python 3.10+
  • External binaries: SExtractor, IRAF/PyRAF (for aperture photometry), BAOlab (for injection). Install separately; paths are configurable (e.g. BAOlab under .deps/local/bin).
  • Data (not in repo): Galaxy FITS, galaxy_filter_dict.npy, readme with zeropoints/CI, SLUG library, PSF files. See docs/RUNNING.md for the complete required-files list.

Installation

git clone <your-repo-url>
cd cluster-completeness-pipeline
python -m venv .venv
source .venv/bin/activate   # or .venv\Scripts\activate on Windows
pip install -e ".[api]"

For the full pipeline you also need IRAF/PyRAF, SExtractor, and BAOlab; see your institution’s setup or docs/DEPLOY_FOR_PAPER.md.

Quick start

See docs/RUNNING.md for the full list of required files and step-by-step run instructions.

1. Run the pipeline

Entry point: scripts/run_pipeline.py. It runs cleanup (optional), Phase A (white injection), Phase B (detection, matching, optional 5-filter inject + photometry + catalogue), and optional completeness plots.

python scripts/run_pipeline.py --cleanup --nframe 2 --reff_list "1,3,6,10" --run_photometry
  • --cleanup: Remove previous pipeline outputs before running.
  • --nframe: Number of frames.
  • --reff_list: Comma-separated effective radii (e.g. "1,3,6,10").
  • --run_photometry: Run 5-filter injection, photometry, and CI cut (otherwise only detection + matching).

2. Build ML inputs

From pipeline outputs, build det_3d.npy and allprop.npz for the NN:

python scripts/build_ml_inputs.py --main-dir . --galaxy ngc628-c --outname test \
  --nframe 2 --reff-list 1 3 6 10 \
  --out-det det_3d.npy --out-npz allprop.npz

Use --use-white-match to use white-match detection labels (detection rate) instead of post–CI labels. The script prints the exact scripts/perform_ml_to_learn_completeness.py command to run next.

3. Train the NN

python scripts/perform_ml_to_learn_completeness.py \
  --det-path det_3d.npy \
  --npz-path allprop.npz \
  --out-dir ./nn_sweep_out \
  --clusters-per-frame 500 \
  --nframes 2 \
  --nreff 4 \
  --prop-flatten-order CFR \
  --save-best

Outputs: best model, scalers, and plots under --out-dir. Dependencies: torch, numpy, scikit-learn, joblib, matplotlib (no IRAF/BAOlab needed for this step).

4. Deploy as Python package + API (no ML run)

If you have the four checkpoint files (or put them on GitHub), you can skip training and run the API directly:

  1. Put the four files in one directory (e.g. repo checkpoints/):

    • best_model_phys_model0.pt, best_model_phot_model0.pt
    • scaler_phys_model0.pkl, scaler_phot_model0.pkl
      See checkpoints/README.md. You can commit these to GitHub so others don’t need to run ML.
  2. Install and start API (from repo root):

    pip install -e ".[api]"
    deploy-completeness
    

    Or use a custom directory and one-shot install:

    python scripts/deploy.py --model-dir /path/to/your/checkpoints --install
    

Repository layout (code only)

Path Description
scripts/ All runnable scripts; see scripts/README.md and docs/SCRIPTS.md. Entry points: run_pipeline, deploy-completeness, serve-completeness-api, and scripts: run_pipeline.py, deploy.py, generate_white_clusters.py, …
completeness_nn_api/ Completeness NN API: from completeness_nn_api import ngc628_completeness_predict, HTTP server, deploy.
checkpoints/ Optional: put the four NN checkpoint files here and run deploy-completeness (see checkpoints/README.md).
cluster_pipeline/ Config, data loaders, detection, matching, pipeline, photometry, catalogue, utils
docs/ RUNNING, PIPELINE_FILES, SCRIPTS (script index + inputs/outputs), FILES_FOR_GIT, DEPLOY, ARCHITECTURE, INSTALL_IRAF, COMPLETENESS_FIGURE
tests/ Unit, integration, and E2E tests

Data (FITS, SLUG library, PSF, etc.) and large outputs are not in the repo; see docs/DEPLOY_FOR_PAPER.md and docs/FILES_FOR_GIT.md. Input/output files per stage: docs/PIPELINE_FILES.md.

Tests and lint

# Lint
ruff check .

# Tests (unit + integration + e2e smoke)
pytest

CI runs on push/PR: ruff check and pytest (see .github/workflows/ci.yml).

Documentation

  • docs/RUNNING.mdHow to run the pipeline: required files and directories, step-by-step run commands, environment variables.
  • docs/PIPELINE_FILES.md – Input/output files per pipeline stage.
  • docs/SCRIPTS.md – Script index with inputs/outputs and locations.
  • docs/DEPLOY_FOR_PAPER.md – What to include for paper/GitHub: pipeline modules, ML step, optional reference script, exclude list.
  • docs/FILES_FOR_GIT.md – Explicit list of files to commit for a pipeline-only push.
  • docs/PUBLISH.md – Publish the package to PyPI and host the API (Docker, Railway, Render, etc.).
  • docs/ARCHITECTURE.md – Pipeline architecture and refactor design.
  • docs/INSTALL_IRAF.md – Local IRAF install for 5-filter photometry.
  • docs/COMPLETENESS_FIGURE.md – Completeness workflow, scripts, and assumptions.
  • scripts/README.md – Script quick reference and pointers to docs.
  • tests/README.md – How to run tests and the completeness visualisation script.

License

See repository or paper for license terms.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cluster_completeness_pipeline-0.1.0.tar.gz (2.1 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file cluster_completeness_pipeline-0.1.0.tar.gz.

File metadata

File hashes

Hashes for cluster_completeness_pipeline-0.1.0.tar.gz
Algorithm Hash digest
SHA256 95a9679cd96203f4d8d5008b185afd5ce2b495674e3545de49df1f4cd26b3468
MD5 50ace108e9ff128fdfc753be2e5aa3fe
BLAKE2b-256 d244b825363d8a8e1ae901fa7b1ccbdf5f9651a9b63308956746658cad5d1b2f

See more details on using hashes here.

File details

Details for the file cluster_completeness_pipeline-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for cluster_completeness_pipeline-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a38a7c10d436e36e705cb3001832809099929e801cca709f1b4da65f0b7dd9e3
MD5 2e4c670243d3e0f3ba3b38e5c9255236
BLAKE2b-256 4b0c229c7f3651e2ad9481bab7d7fc86db230c72c28293038a78abb756c373d5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page