Skip to main content

Machine-learning INtegrated Analysis with photometric Astronomical Surveys

Project description

MINAS logo

MINAS — Machine-learning INtegrated Analysis with photometric Astronomical Surveys

PyPI version Python License: MIT Tests

MINAS is a Python package for the complete Machine Learning workflow applied to photometric astronomical surveys. It integrates all stages — from preprocessing to final model application — in a single, modular interface.

Fun fact: MINAS is also the name of a Brazilian state (Minas Gerais), the home state of Icaro Meidem, the package creator. As a proud mineiro, the name represents both the astronomical focus and personal heritage.


Installation

pip install minas

Quick Start — Full ML Workflow

import minas as mg

# 1. Load catalog
catalog = mg.read_csv('my_catalog.csv')

# 2. Assemble feature DataFrame (magnitudes + pairwise colors)
work_df = mg.preprocess.assemble_work_df(
    df=catalog,
    filters=mg.FILTERS['JPLUS'],
    correction_pairs=dict(zip(mg.FILTERS['JPLUS'], mg.CORRECTIONS['JPLUS'])),
    add_colors=True,
)

# 3. Select most important features
features, df_importance = mg.evaluation.get_important_features(
    X=work_df,
    y=catalog['Teff'],
    n_features_to_save=20,
)
work_df = work_df[features]

# 4. Tune hyperparameters
param_dist = {
    'selectkbest__k'                         : [10, 15, 20],
    'randomforestregressor__n_estimators'    : [100, 300, 500],
    'randomforestregressor__min_samples_leaf': [1, 5, 10],
    'randomforestregressor__max_features'    : ['sqrt', 'log2'],
    'randomforestregressor__bootstrap'       : [True, False],
}
best_pipeline, search = mg.hyperparameter_search(
    X=work_df,
    Y=catalog['Teff'],
    model_type='RF',
    param_dist=param_dist,
    tuning_id='teff_rf',
    n_iter=30,
    save_dir='pipelines/',
)

# 5. Apply model with Monte Carlo error propagation
predictor = mg.models.Predictor(
    id_col='ID',
    mag_cols=mg.FILTERS['JPLUS'],
    err_cols=mg.ERRORS['JPLUS'],
    dist_col=None,
    correction_pairs=dict(zip(mg.FILTERS['JPLUS'], mg.CORRECTIONS['JPLUS'])),
    models={'Teff': best_pipeline},
    mc_reps=100,
    batch_partitions=10,
)
predictor.predict_parameters((catalog, 'results/teff_predictions.csv', ['ID'], 'w', True))

Bolometric Correction

MINAS includes pre-trained models for bolometric correction (BC) based on Jordi et al. (2010), trained on Gaia-observed stars using Teff, log g, and [Fe/H].

Model Performance

Model MAD Std Deviation
XGBoost 0.9983 0.0062 mag 0.0430 mag
Random Forest 0.9970 0.0067 mag 0.0573 mag

XGBoost BC Random Forest BC

Figure: Performance of XGBoost (left) and Random Forest (right) for bolometric correction prediction.

Usage

import minas as mg

df = mg.bolometric.apply_bc(
    data='catalog.csv',
    teff_col='Teff',
    logg_col='logg',
    feh_col='[M/H]',
    model_type='XGB',        # 'XGB' or 'RF'
    sigma_multiplier=3.0,    # uncertainty = multiplier x STD
    output_file='catalog_bc.csv',
)

print(df[['Teff', 'BC_pred', 'err_BC_pred']].head())

Reference

Jordi, C. et al. (2010). Gaia broad band photometry. A&A 523, A48. DOI: 10.1051/0004-6361/200913234


Supported Surveys and Filters

MINAS provides built-in filter definitions for the following photometric surveys. All filter lists are accessible via mg.FILTERS, mg.ERRORS, and mg.CORRECTIONS.

Survey Filters mg.FILTERS key
J-PLUS uJAVA, J0378, J0395, J0410, J0430, gSDSS, J0515, rSDSS, J0660, iSDSS, J0861, zSDSS 'JPLUS'
S-PLUS uJAVA, J0378, J0395, J0410, J0430, gSDSS, J0515, rSDSS, J0660, iSDSS, J0861, zSDSS 'SPLUS'
J-PAS uJAVA + 56 narrow bands (J0378-J1007) + iSDSS 'JPAS'
WISE W1, W2, J, H, K 'WISE'
GALEX NUVmag 'GALEX'
Gaia G, BP, RP 'GAIA'
import minas as mg

print(mg.FILTERS['JPLUS'])      # magnitude column names
print(mg.ERRORS['JPLUS'])       # photometric error column names
print(mg.CORRECTIONS['JPLUS'])  # extinction correction column names

Model Comparison — RF vs XGBoost

Feature Random Forest XGBoost
Pipeline steps Imputer → SelectKBest → RF SelectKBest → XGB
Missing value handling Built-in (median imputation) Must be handled externally
Training speed Moderate Fast
Typical accuracy Good Excellent
Model key 'RF-REG' / 'RF-CLA' 'XGB-REG' / 'XGB-CLA'
Saved format .sav (joblib) .json
import minas as mg

# Default models
rf_model  = mg.models.create_model('RF-REG')
xgb_model = mg.models.create_model('XGB-REG')

# With tuned hyperparameters
hp = (0.8, 0.05, 6, 500, 0.8, 0.1)  # colsample, lr, depth, n_est, subsample, gamma
xgb_tuned = mg.models.create_model('XGB-REG', hp_combination=hp)

Package Structure

minas/
├── preprocess/     magnitude correction, color creation, work DataFrame assembly
├── models/         ML pipeline factory (RF, XGB) and Monte Carlo predictor
├── tuning/         hyperparameter search with RandomizedSearchCV
├── evaluation/     metrics (MAD, R2), plots, feature importance
└── bolometric/     bolometric correction with pre-trained models

Key Functions

Function Description
mg.preprocess.assemble_work_df() Build feature DataFrame from magnitudes
mg.preprocess.correct_magnitudes() Apply extinction corrections
mg.preprocess.calculate_abs_mag() Convert apparent to absolute magnitudes
mg.models.create_model() Create RF or XGBoost pipeline
mg.models.Predictor Monte Carlo predictor with uncertainty estimation
mg.hyperparameter_search() RandomizedSearchCV for RF or XGB
mg.evaluation.get_important_features() Impurity-based feature importance (RF)
mg.evaluation.get_permutation_importance_rf() Permutation importance (RF)
mg.evaluation.get_permutation_importance_xgb() Permutation importance (XGB)
mg.evaluation.calculate_mad() MAD per bin
mg.evaluation.plot_test_graphs() Scatter + KDE error plot
mg.evaluation.plot_comparison_graph() Bar chart comparison across models
mg.bolometric.apply_bc() Apply pre-trained bolometric correction model

Examples

The examples/ folder contains complete Jupyter notebooks covering the full workflow:

Folder Contents
data/ Catalog creation and preprocessing
tuning/ Hyperparameter search and feature importance
training/ Model training, evaluation, and visualization
apply/ Model application with Monte Carlo error propagation

Citation

If you use MINAS in your research, please cite:

@software{minas,
  author  = {Meidem, Icaro},
  title   = {{MINAS}: Machine-learning INtegrated Analysis with photometric Astronomical Surveys},
  year    = {2025},
  url     = {https://github.com/icaromeidem/minas},
}

Bolometric correction reference:


License

MIT © Icaro Meidem

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

minas-1.2.0.tar.gz (3.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

minas-1.2.0-py3-none-any.whl (3.9 MB view details)

Uploaded Python 3

File details

Details for the file minas-1.2.0.tar.gz.

File metadata

  • Download URL: minas-1.2.0.tar.gz
  • Upload date:
  • Size: 3.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for minas-1.2.0.tar.gz
Algorithm Hash digest
SHA256 0edd63206170e3fcf1b53b19615fbe6dbdcdeddedd28755c3d434cd6efedece8
MD5 4f94f7a4994794c5ecbfb9dd591e4a13
BLAKE2b-256 bed75131ba67dc16183646f4990a361159d10c3d2a680adb6f878fde8593a95b

See more details on using hashes here.

File details

Details for the file minas-1.2.0-py3-none-any.whl.

File metadata

  • Download URL: minas-1.2.0-py3-none-any.whl
  • Upload date:
  • Size: 3.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for minas-1.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ec5b89d13b0a784f6f5136bd0f983b8921123f48bd5b63b11e107e47a89b9851
MD5 2d6984706ef8c2b8007d833c1f65a1b1
BLAKE2b-256 c01d331c66cd9a5cea93251566699b4278e06552eb6e08738647dcaba24591e8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page