Synthetic cluster injection, detection, matching, photometry, and NN completeness pipeline
Project description
Cluster Completeness Pipeline
Pipeline for synthetic cluster injection, detection, matching, 5-filter photometry with CI cut, and neural-network completeness learning. Used to measure and model detection completeness as a function of magnitude, mass, and age.
Overview
- Pipeline (stages 1–5): Inject synthetic clusters on white-light and 5-filter images → run SExtractor → match injected vs detected positions → run IRAF aperture photometry → apply concentration-index (CI) cut → write detection labels and catalogue.
- Build ML inputs: Assemble 3D detection array and property
.npzfrom pipeline outputs (CFR order). - NN training: Train an MLP to predict completeness from physical and photometric features; save best model and diagnostics.
Requirements
- Python 3.10+
- External binaries: SExtractor, IRAF/PyRAF (for aperture photometry), BAOlab (for injection). Install separately; paths are configurable (e.g. BAOlab under
.deps/local/bin). - Data (not in repo): Galaxy FITS,
galaxy_filter_dict.npy, readme with zeropoints/CI, SLUG library, PSF files. Seedocs/RUNNING.mdfor the complete required-files list.
Installation
git clone <your-repo-url>
cd cluster-completeness-pipeline
python -m venv .venv
source .venv/bin/activate # or .venv\Scripts\activate on Windows
pip install -e ".[api]"
For the full pipeline you also need IRAF/PyRAF, SExtractor, and BAOlab; see your institution’s setup or docs/DEPLOY_FOR_PAPER.md.
Quick start
See docs/RUNNING.md for the full list of required files and step-by-step run instructions.
1. Run the pipeline
Entry point: scripts/run_pipeline.py. It runs cleanup (optional), Phase A (white injection), Phase B (detection, matching, optional 5-filter inject + photometry + catalogue), and optional completeness plots.
python scripts/run_pipeline.py --cleanup --nframe 2 --reff_list "1,3,6,10" --run_photometry
--cleanup: Remove previous pipeline outputs before running.--nframe: Number of frames.--reff_list: Comma-separated effective radii (e.g."1,3,6,10").--run_photometry: Run 5-filter injection, photometry, and CI cut (otherwise only detection + matching).
2. Build ML inputs
From pipeline outputs, build det_3d.npy and allprop.npz for the NN:
python scripts/build_ml_inputs.py --main-dir . --galaxy ngc628-c --outname test \
--nframe 2 --reff-list 1 3 6 10 \
--out-det det_3d.npy --out-npz allprop.npz
Use --use-white-match to use white-match detection labels (detection rate) instead of post–CI labels. The script prints the exact scripts/perform_ml_to_learn_completeness.py command to run next.
3. Train the NN
python scripts/perform_ml_to_learn_completeness.py \
--det-path det_3d.npy \
--npz-path allprop.npz \
--out-dir ./nn_sweep_out \
--clusters-per-frame 500 \
--nframes 2 \
--nreff 4 \
--prop-flatten-order CFR \
--save-best
Outputs: best model, scalers, and plots under --out-dir. Dependencies: torch, numpy, scikit-learn, joblib, matplotlib (no IRAF/BAOlab needed for this step).
4. Deploy as Python package + API (no ML run)
If you have the four checkpoint files (or put them on GitHub), you can skip training and run the API directly:
-
Put the four files in one directory (e.g. repo
checkpoints/):best_model_phys_model0.pt,best_model_phot_model0.ptscaler_phys_model0.pkl,scaler_phot_model0.pkl
Seecheckpoints/README.md. You can commit these to GitHub so others don’t need to run ML.
-
Install and start API (from repo root):
pip install -e ".[api]" deploy-completeness
Or use a custom directory and one-shot install:
python scripts/deploy.py --model-dir /path/to/your/checkpoints --install
- API docs: http://localhost:8000/docs
- Health: http://localhost:8000/health
Repository layout (code only)
| Path | Description |
|---|---|
scripts/ |
All runnable scripts; see scripts/README.md and docs/SCRIPTS.md. Entry points: run_pipeline, deploy-completeness, serve-completeness-api, and scripts: run_pipeline.py, deploy.py, generate_white_clusters.py, … |
completeness_nn_api/ |
Completeness NN API: from completeness_nn_api import ngc628_completeness_predict, HTTP server, deploy. |
checkpoints/ |
Optional: put the four NN checkpoint files here and run deploy-completeness (see checkpoints/README.md). |
cluster_pipeline/ |
Config, data loaders, detection, matching, pipeline, photometry, catalogue, utils |
docs/ |
RUNNING, PIPELINE_FILES, SCRIPTS (script index + inputs/outputs), FILES_FOR_GIT, DEPLOY, ARCHITECTURE, INSTALL_IRAF, COMPLETENESS_FIGURE |
tests/ |
Unit, integration, and E2E tests |
Data (FITS, SLUG library, PSF, etc.) and large outputs are not in the repo; see docs/DEPLOY_FOR_PAPER.md and docs/FILES_FOR_GIT.md. Input/output files per stage: docs/PIPELINE_FILES.md.
Tests and lint
# Lint
ruff check .
# Tests (unit + integration + e2e smoke)
pytest
CI runs on push/PR: ruff check and pytest (see .github/workflows/ci.yml).
Documentation
docs/RUNNING.md– How to run the pipeline: required files and directories, step-by-step run commands, environment variables.docs/PIPELINE_FILES.md– Input/output files per pipeline stage.docs/SCRIPTS.md– Script index with inputs/outputs and locations.docs/DEPLOY_FOR_PAPER.md– What to include for paper/GitHub: pipeline modules, ML step, optional reference script, exclude list.docs/FILES_FOR_GIT.md– Explicit list of files to commit for a pipeline-only push.docs/PUBLISH.md– Publish the package to PyPI and host the API (Docker, Railway, Render, etc.).docs/ARCHITECTURE.md– Pipeline architecture and refactor design.docs/INSTALL_IRAF.md– Local IRAF install for 5-filter photometry.docs/COMPLETENESS_FIGURE.md– Completeness workflow, scripts, and assumptions.scripts/README.md– Script quick reference and pointers to docs.tests/README.md– How to run tests and the completeness visualisation script.
License
See repository or paper for license terms.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file cluster_completeness_pipeline-0.1.0.tar.gz.
File metadata
- Download URL: cluster_completeness_pipeline-0.1.0.tar.gz
- Upload date:
- Size: 2.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
95a9679cd96203f4d8d5008b185afd5ce2b495674e3545de49df1f4cd26b3468
|
|
| MD5 |
50ace108e9ff128fdfc753be2e5aa3fe
|
|
| BLAKE2b-256 |
d244b825363d8a8e1ae901fa7b1ccbdf5f9651a9b63308956746658cad5d1b2f
|
File details
Details for the file cluster_completeness_pipeline-0.1.0-py3-none-any.whl.
File metadata
- Download URL: cluster_completeness_pipeline-0.1.0-py3-none-any.whl
- Upload date:
- Size: 2.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.14
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a38a7c10d436e36e705cb3001832809099929e801cca709f1b4da65f0b7dd9e3
|
|
| MD5 |
2e4c670243d3e0f3ba3b38e5c9255236
|
|
| BLAKE2b-256 |
4b0c229c7f3651e2ad9481bab7d7fc86db230c72c28293038a78abb756c373d5
|