Skip to main content

Exoplanet candidate classification SDK V2 with stacking ensemble model (LightGBM + XGBoost + CatBoost)

Project description

Exso-SDK Documentation

This document provides a comprehensive guide to the Exso-SDK, an exoplanet candidate classification toolkit with preprocessing, feature engineering, model training and serving capabilities.

Table of Contents


Project Overview

Project Overview

The Exso-SDK classifies exoplanet candidates into three classes: False Positive, Candidate, and Positive.
It offers modules for data loading, validation, preprocessing, feature computation, model training, prediction, explanation, and a Flask-based REST API.

Main Features

  • Data ingestion from CSV or public URLs
  • Data validation and cleaning
  • Lightcurve preprocessing
  • Domain and statistical feature engineering
  • Neural network model training and evaluation
  • Prediction with probability outputs
  • Gradient-based saliency explanations
  • Flask API for batch prediction
  • Utility functions for logging and error handling

Dependencies & Requirements

  • Python ≥ 3.11
  • pandas ≥ 1.0
  • numpy ≥ 1.18
  • scikit-learn ≥ 0.22
  • torch ≥ 1.7
  • flask ≥ 1.1
  • requests
  • llvmlite ≥ 0.44.0

Dependency declarations appear in pyproject.toml and requirements.txt. The uv.lock file locks exact versions.


Installation Guide

Installation Guide

Follow these steps to set up Exso-SDK locally.

1. Clone the Repository

git clone https://github.com/yourname/exso-sdk.git
cd exso-sdk

2. Create Virtual Environment

python -m venv .venv
source .venv/bin/activate  # Linux/Mac
.venv\Scripts\activate     # Windows

3. Install Dependencies

pip install --upgrade pip
pip install -r requirements.txt

Alternatively, install via pyproject.toml:

pip install .

4. Configuration

  • MODEL_PATH: Set EXSO_MODEL_PATH environment variable to override default model file location.
  • API: No API key required; runs locally on port 5000 by default.

Module and Function Documentation

Module and Function Documentation

This section details every module, class, and function in the project.


Configuration (config.py)

Defines global constants for model path and required input schema.

Constant Description
_PACKAGE_DIR Absolute path to this package directory
MODEL_PATH Path to the trained model file (.pth), default via envvar
REQUIRED_COLUMNS List of numeric feature column names for model input

Data Management (data.py)

Handles dataset fetching, loading, validation, merging, and splitting.

fetch_datasets()

Download example mission CSVs and return as DataFrames.

  • Returns: List[pd.DataFrame]
  • Raises: requests.HTTPError on download failure
  • Example:
    from exso_sdk.data import fetch_datasets
    dfs = fetch_datasets()
    

load_csv(path)

Load a local CSV file.

  • Parameters:
    • path (str): File path to CSV
  • Returns: pd.DataFrame
  • Raises: FileNotFoundError if file missing
  • Example:
    df = load_csv("data/exoplanets.csv")
    

validate_dataset(df)

Ensure DataFrame has all required numeric columns and plausible ranges.

  • Parameters:
    • df (pd.DataFrame): Input data
  • Returns: True if valid
  • Raises:
    • ValueError if columns missing or invalid values
    • TypeError if column dtype is non-numeric
  • Example:
    validate_dataset(df)
    

merge_datasets(list_of_dfs)

Concatenate multiple mission DataFrames, aligning to REQUIRED_COLUMNS.

  • Parameters:
    • list_of_dfs (List[pd.DataFrame]): DataFrames to merge
  • Returns: pd.DataFrame merged dataset
  • Example:
    all_df = merge_datasets([df1, df2, df3])
    

split_train_val_test(df, ratios=(0.7,0.15,0.15), random_state=42)

Split dataset into train/val/test.

  • Parameters:
    • df (pd.DataFrame): Full dataset
    • ratios (tuple): Fractions summing to 1.0
    • random_state (int): Seed for reproducibility
  • Returns: (train_df, val_df, test_df)
  • Raises: ValueError if ratios sum ≠ 1
  • Example:
    train, val, test = split_train_val_test(all_df)
    

Data Preprocessing (preprocessing.py)

Cleans and scales raw data; handles missing values and encoding.

clean_missing(df, strategy='drop')

Fill or drop missing values.

  • Parameters:
    • df (pd.DataFrame)
    • strategy ('drop'|'fill')
  • Returns: pd.DataFrame cleaned
  • Raises: ValueError if strategy invalid
  • Example:
    df_clean = clean_missing(df, strategy='fill')
    

normalize_scale(df, cols, method='standard')

Scale numeric columns by z-score or min-max.

  • Parameters:
    • df (pd.DataFrame)
    • cols (List[str]): Columns to scale
    • method ('standard'|'minmax')
  • Returns: (df_scaled, scaler_object)
  • Raises: ValueError if method invalid
  • Example:
    df_scaled, scaler = normalize_scale(df, REQUIRED_COLUMNS)
    

encode_categorical(df)

One-hot encode string columns.

  • Parameters:
    • df (pd.DataFrame)
  • Returns: pd.DataFrame encoded
  • Example:
    df_enc = encode_categorical(df)
    

preprocess_lightcurve(lc)

Detrend and resample a lightcurve.

  • Parameters:
    • lc (pd.DataFrame): must contain time, flux
  • Returns: pd.DataFrame resampled with flux_detrended
  • Example:
    lc_processed = preprocess_lightcurve(lightcurve_df)
    

Feature Engineering (features.py)

Compute domain-specific and statistical features.

compute_period_features(df)

Add period harmonics and simple folded stats.

  • Parameters:
    • df (pd.DataFrame)
  • Returns: pd.DataFrame extended
  • Example:
    df_feat = compute_period_features(df)
    

compute_statistical_features(df)

Compute skewness and kurtosis for all numeric columns.

  • Parameters:
    • df (pd.DataFrame)
  • Returns: pd.DataFrame extended
  • Example:
    df_stats = compute_statistical_features(df)
    

compute_domain_features(df)

Compute transit SNR and vetting flag.

  • Parameters:
    • df (pd.DataFrame)
  • Returns: pd.DataFrame extended
  • Example:
    df_domain = compute_domain_features(df)
    

Evaluation Metrics (metrics.py)

Evaluate classification performance.

compute_metrics(y_true, y_pred)

Return common metrics (accuracy, precision, recall, f1, auc).

  • Parameters:
    • y_true (array-like)
    • y_pred (array-like)
  • Returns: dict of metric values
  • Example:
    stats = compute_metrics(y_true, y_pred)
    

Modeling (model.py)

Defines dataset wrapper, neural network, training, evaluation, and inference.

Class: ExoplanetDataset(Dataset)

Wrap pandas DataFrame for PyTorch.

  • Constructor:
    • df (pd.DataFrame)
    • feature_cols (List[str])
    • target_col (str or None)
  • Methods:
    • __len__() → int
    • __getitem__(idx)(X[idx], y[idx]) or X[idx]

Example:

from exso_sdk.model import ExoplanetDataset
dataset = ExoplanetDataset(df, REQUIRED_COLUMNS, target_col='label')

Class: SimpleNN(nn.Module)

Feed-forward network with two hidden layers.

  • Constructor:
    • input_dim (int)
    • hidden_dim (int, default=64)
    • num_classes (int, default=3)
  • Method:
    • forward(x) → logits

build_model(input_dim, config=None)

Instantiate SimpleNN.

  • Parameters:
    • input_dim (int)
    • config (dict with keys hidden_dim, num_classes)
  • Returns: SimpleNN model
  • Example:
    model = build_model(len(REQUIRED_COLUMNS), config={'hidden_dim':128})
    

train_model(model, train_loader, val_loader, config)

Train model with checkpoint saving.

  • Parameters:
    • model (nn.Module)
    • train_loader (DataLoader)
    • val_loader (DataLoader)
    • config (dict with lr, epochs)
  • Returns: None (saves best model to MODEL_PATH)
  • Example:
    train_model(model, train_loader, val_loader, {'lr':1e-3, 'epochs':5})
    

evaluate_model(model, data_loader)

Compute accuracy, precision, recall, f1, confusion matrix.

  • Parameters:
    • model (nn.Module)
    • data_loader (DataLoader)
  • Returns: dict with metrics and confusion_matrix
  • Example:
    results = evaluate_model(model, test_loader)
    

predict(model, sample)

Predict a single sample.

  • Parameters:
    • model (nn.Module)
    • sample (np.ndarray or torch.Tensor)
  • Returns:
    • pred_class (int), probs (np.ndarray)
  • Example:
    cls, probs = predict(model, sample_vector)
    

save_model(model, path)

Save model state_dict.

  • Parameters:
    • model (nn.Module)
    • path (str)
  • Example:
    save_model(model, "exoplanet_model.pth")
    

load_model(input_dim, path=MODEL_PATH, config=None)

Load model weights into new instance.

  • Parameters:
    • input_dim (int)
    • path (str)
    • config (dict or None)
  • Returns: SimpleNN in eval mode
  • Example:
    model = load_model(len(REQUIRED_COLUMNS))
    

Model Explanations (explain.py)

Gradient-based saliency explanation without external libs.

explain_prediction(model, sample, target_class_index=None)

Compute absolute gradient of class logit w.r.t. input features.

  • Parameters:
    • model (nn.Module)
    • sample (np.ndarray or torch.Tensor)
    • target_class_index (int or None)
  • Returns: np.ndarray saliency map
  • Example:
    sal = explain_prediction(model, sample_vec)
    

REST API (api.py)

Provides a Flask app to serve predictions.

  • App Initialization: loads model on startup using load_model.

Endpoint: GET /

Render HTML form for CSV upload.

{
  "title": "Home Page",
  "description": "Render CSV upload form",
  "method": "GET",
  "baseUrl": "http://localhost:5000",
  "endpoint": "/",
  "headers": [],
  "pathParams": [],
  "queryParams": [],
  "bodyType": "none",
  "responses": {
    "200": {
      "description": "HTML form page",
      "body": "<h1>Exoplanet Predictor</h1>…"
    }
  }
}

Endpoint: POST /predict

Process uploaded CSV and return predictions.

{
  "title": "Batch Prediction",
  "description": "Upload CSV and receive predictions",
  "method": "POST",
  "baseUrl": "http://localhost:5000",
  "endpoint": "/predict",
  "headers": [
    {"key":"Content-Type","value":"multipart/form-data","required":true}
  ],
  "bodyType":"form",
  "formData":[
    {"key":"file","value":"CSV file with required columns","required":true}
  ],
  "responses":{
    "200":{"description":"Success","body":"{\"results\":[…]}"},
    "400":{"description":"Bad Request","body":"{\"error\":\"No file part\"}"},
    "500":{"description":"Server Error","body":"{\"error\":\"...\"}"}
  }
}

Integration: calls validate_dataset, clean_missing, normalize_scale, and predict for each row.


Utilities (utils.py)

Utility logging and error handling.

  • log_metrics(run_id, metrics): Log experiment metrics via logging.info.
  • monitor_training(run_id): Placeholder for training monitoring.
  • handle_errors(e): Log errors via logging.error.

Basic Test Script (test_basic.py)

Simple script to predict one sample from a dict.

  • predict_single_sample(sample):
    • Converts dict → DataFrame
    • Cleans, scales, loads model, predicts, and prints results.
  • Entry Point: Executed when __name__ == '__main__'.

Package Initialization (init.py)

Package docstring; prevents side-effect imports.


Workflow / Usage Guide

Workflow / Usage Guide

1. Data Preparation

  • Gather mission CSVs with REQUIRED_COLUMNS.
  • Optionally call fetch_datasets() for examples.
  • Use merge_datasets() to combine missions.

2. Data Validation & Cleaning

validate_dataset(df)
df_clean = clean_missing(df, strategy='fill')

3. Feature Engineering

from exso_sdk.features import (
    compute_period_features,
    compute_statistical_features,
    compute_domain_features
)
df_feat = compute_period_features(df_clean)
df_feat = compute_statistical_features(df_feat)
df_feat = compute_domain_features(df_feat)

4. Train/Test Split

train_df, val_df, test_df = split_train_val_test(df_feat)

5. Model Training

from torch.utils.data import DataLoader
train_loader = DataLoader(ExoplanetDataset(train_df,), batch_size=32)
val_loader   = DataLoader(ExoplanetDataset(val_df,), batch_size=32)
train_model(model, train_loader, val_loader, {'lr':1e-3, 'epochs':10})

6. Evaluation

test_loader = DataLoader(ExoplanetDataset(test_df,), batch_size=32)
results = evaluate_model(model, test_loader)

7. Batch Prediction via CLI or API

  • CLI: use test_basic.py
  • API: start server and POST /predict with CSV

Diagrams and Flowcharts

Diagrams and Flowcharts

Data & API Flow

API Flowchart

Model Class Relationships

classDiagram
  class ExoplanetDataset {
    +__init__(df, feature_cols, target_col=None)
    +__len__()
    +__getitem__(idx)
  }
  class SimpleNN {
    +__init__(input_dim, hidden_dim, num_classes)
    +forward(x)
  }
  ExoplanetDataset --> SimpleNN : provides batched data
  SimpleNN <|-- build_model

Best Practices

Best Practices

  • Extend Features: add new feature functions in features.py and integrate in pipeline.
  • Custom Preprocessing: override clean_missing or add new strategies.
  • Model Tuning: adjust hidden_dim, learning rate, and epochs via train_model config.
  • Logging: call log_metrics() inside training loops.
  • Error Handling: wrap calls with handle_errors(e) from utils.py.

FAQ / Troubleshooting

FAQ / Troubleshooting

Q: ValueError: Missing required columns

  • Ensure CSV has all names in REQUIRED_COLUMNS (see config.py).

Q: API returns 400 “No file part”

  • Send form-data key as file.

Q: GPU unavailable

  • Model falls back to CPU. Remove CUDA code if unsupported.

Q: Version conflicts

  • Use provided uv.lock to lock dependencies. Recreate venv and pip install ..

Happy Exoplanet Hunting!

Exso-SDK Documentation

This document provides a comprehensive guide to the Exso-SDK, an exoplanet candidate classification toolkit with preprocessing, feature engineering, model training and serving capabilities.

Table of Contents


Project Overview

Project Overview

The Exso-SDK classifies exoplanet candidates into three classes: False Positive, Candidate, and Positive.
It offers modules for data loading, validation, preprocessing, feature computation, model training, prediction, explanation, and a Flask-based REST API.

Main Features

  • Data ingestion from CSV or public URLs
  • Data validation and cleaning
  • Lightcurve preprocessing
  • Domain and statistical feature engineering
  • Neural network model training and evaluation
  • Prediction with probability outputs
  • Gradient-based saliency explanations
  • Flask API for batch prediction
  • Utility functions for logging and error handling

Dependencies & Requirements

  • Python ≥ 3.11
  • pandas ≥ 1.0
  • numpy ≥ 1.18
  • scikit-learn ≥ 0.22
  • torch ≥ 1.7
  • flask ≥ 1.1
  • requests
  • llvmlite ≥ 0.44.0

Dependency declarations appear in pyproject.toml and requirements.txt. The uv.lock file locks exact versions.


Installation Guide

Installation Guide

Follow these steps to set up Exso-SDK locally.

1. Clone the Repository

git clone https://github.com/yourname/exso-sdk.git
cd exso-sdk

2. Create Virtual Environment

python -m venv .venv
source .venv/bin/activate  # Linux/Mac
.venv\Scripts\activate     # Windows

3. Install Dependencies

pip install --upgrade pip
pip install -r requirements.txt

Alternatively, install via pyproject.toml:

pip install .

4. Configuration

  • MODEL_PATH: Set EXSO_MODEL_PATH environment variable to override default model file location.
  • API: No API key required; runs locally on port 5000 by default.

Module and Function Documentation

Module and Function Documentation

This section details every module, class, and function in the project.


Configuration (config.py)

Defines global constants for model path and required input schema.

Constant Description
_PACKAGE_DIR Absolute path to this package directory
MODEL_PATH Path to the trained model file (.pth), default via envvar
REQUIRED_COLUMNS List of numeric feature column names for model input

Data Management (data.py)

Handles dataset fetching, loading, validation, merging, and splitting.

fetch_datasets()

Download example mission CSVs and return as DataFrames.

  • Returns: List[pd.DataFrame]
  • Raises: requests.HTTPError on download failure
  • Example:
    from exso_sdk.data import fetch_datasets
    dfs = fetch_datasets()
    

load_csv(path)

Load a local CSV file.

  • Parameters:
    • path (str): File path to CSV
  • Returns: pd.DataFrame
  • Raises: FileNotFoundError if file missing
  • Example:
    df = load_csv("data/exoplanets.csv")
    

validate_dataset(df)

Ensure DataFrame has all required numeric columns and plausible ranges.

  • Parameters:
    • df (pd.DataFrame): Input data
  • Returns: True if valid
  • Raises:
    • ValueError if columns missing or invalid values
    • TypeError if column dtype is non-numeric
  • Example:
    validate_dataset(df)
    

merge_datasets(list_of_dfs)

Concatenate multiple mission DataFrames, aligning to REQUIRED_COLUMNS.

  • Parameters:
    • list_of_dfs (List[pd.DataFrame]): DataFrames to merge
  • Returns: pd.DataFrame merged dataset
  • Example:
    all_df = merge_datasets([df1, df2, df3])
    

split_train_val_test(df, ratios=(0.7,0.15,0.15), random_state=42)

Split dataset into train/val/test.

  • Parameters:
    • df (pd.DataFrame): Full dataset
    • ratios (tuple): Fractions summing to 1.0
    • random_state (int): Seed for reproducibility
  • Returns: (train_df, val_df, test_df)
  • Raises: ValueError if ratios sum ≠ 1
  • Example:
    train, val, test = split_train_val_test(all_df)
    

Data Preprocessing (preprocessing.py)

Cleans and scales raw data; handles missing values and encoding.

clean_missing(df, strategy='drop')

Fill or drop missing values.

  • Parameters:
    • df (pd.DataFrame)
    • strategy ('drop'|'fill')
  • Returns: pd.DataFrame cleaned
  • Raises: ValueError if strategy invalid
  • Example:
    df_clean = clean_missing(df, strategy='fill')
    

normalize_scale(df, cols, method='standard')

Scale numeric columns by z-score or min-max.

  • Parameters:
    • df (pd.DataFrame)
    • cols (List[str]): Columns to scale
    • method ('standard'|'minmax')
  • Returns: (df_scaled, scaler_object)
  • Raises: ValueError if method invalid
  • Example:
    df_scaled, scaler = normalize_scale(df, REQUIRED_COLUMNS)
    

encode_categorical(df)

One-hot encode string columns.

  • Parameters:
    • df (pd.DataFrame)
  • Returns: pd.DataFrame encoded
  • Example:
    df_enc = encode_categorical(df)
    

preprocess_lightcurve(lc)

Detrend and resample a lightcurve.

  • Parameters:
    • lc (pd.DataFrame): must contain time, flux
  • Returns: pd.DataFrame resampled with flux_detrended
  • Example:
    lc_processed = preprocess_lightcurve(lightcurve_df)
    

Feature Engineering (features.py)

Compute domain-specific and statistical features.

compute_period_features(df)

Add period harmonics and simple folded stats.

  • Parameters:
    • df (pd.DataFrame)
  • Returns: pd.DataFrame extended
  • Example:
    df_feat = compute_period_features(df)
    

compute_statistical_features(df)

Compute skewness and kurtosis for all numeric columns.

  • Parameters:
    • df (pd.DataFrame)
  • Returns: pd.DataFrame extended
  • Example:
    df_stats = compute_statistical_features(df)
    

compute_domain_features(df)

Compute transit SNR and vetting flag.

  • Parameters:
    • df (pd.DataFrame)
  • Returns: pd.DataFrame extended
  • Example:
    df_domain = compute_domain_features(df)
    

Evaluation Metrics (metrics.py)

Evaluate classification performance.

compute_metrics(y_true, y_pred)

Return common metrics (accuracy, precision, recall, f1, auc).

  • Parameters:
    • y_true (array-like)
    • y_pred (array-like)
  • Returns: dict of metric values
  • Example:
    stats = compute_metrics(y_true, y_pred)
    

Modeling (model.py)

Defines dataset wrapper, neural network, training, evaluation, and inference.

Class: ExoplanetDataset(Dataset)

Wrap pandas DataFrame for PyTorch.

  • Constructor:
    • df (pd.DataFrame)
    • feature_cols (List[str])
    • target_col (str or None)
  • Methods:
    • __len__() → int
    • __getitem__(idx)(X[idx], y[idx]) or X[idx]

Example:

from exso_sdk.model import ExoplanetDataset
dataset = ExoplanetDataset(df, REQUIRED_COLUMNS, target_col='label')

Class: SimpleNN(nn.Module)

Feed-forward network with two hidden layers.

  • Constructor:
    • input_dim (int)
    • hidden_dim (int, default=64)
    • num_classes (int, default=3)
  • Method:
    • forward(x) → logits

build_model(input_dim, config=None)

Instantiate SimpleNN.

  • Parameters:
    • input_dim (int)
    • config (dict with keys hidden_dim, num_classes)
  • Returns: SimpleNN model
  • Example:
    model = build_model(len(REQUIRED_COLUMNS), config={'hidden_dim':128})
    

train_model(model, train_loader, val_loader, config)

Train model with checkpoint saving.

  • Parameters:
    • model (nn.Module)
    • train_loader (DataLoader)
    • val_loader (DataLoader)
    • config (dict with lr, epochs)
  • Returns: None (saves best model to MODEL_PATH)
  • Example:
    train_model(model, train_loader, val_loader, {'lr':1e-3, 'epochs':5})
    

evaluate_model(model, data_loader)

Compute accuracy, precision, recall, f1, confusion matrix.

  • Parameters:
    • model (nn.Module)
    • data_loader (DataLoader)
  • Returns: dict with metrics and confusion_matrix
  • Example:
    results = evaluate_model(model, test_loader)
    

predict(model, sample)

Predict a single sample.

  • Parameters:
    • model (nn.Module)
    • sample (np.ndarray or torch.Tensor)
  • Returns:
    • pred_class (int), probs (np.ndarray)
  • Example:
    cls, probs = predict(model, sample_vector)
    

save_model(model, path)

Save model state_dict.

  • Parameters:
    • model (nn.Module)
    • path (str)
  • Example:
    save_model(model, "exoplanet_model.pth")
    

load_model(input_dim, path=MODEL_PATH, config=None)

Load model weights into new instance.

  • Parameters:
    • input_dim (int)
    • path (str)
    • config (dict or None)
  • Returns: SimpleNN in eval mode
  • Example:
    model = load_model(len(REQUIRED_COLUMNS))
    

Model Explanations (explain.py)

Gradient-based saliency explanation without external libs.

explain_prediction(model, sample, target_class_index=None)

Compute absolute gradient of class logit w.r.t. input features.

  • Parameters:
    • model (nn.Module)
    • sample (np.ndarray or torch.Tensor)
    • target_class_index (int or None)
  • Returns: np.ndarray saliency map
  • Example:
    sal = explain_prediction(model, sample_vec)
    

REST API (api.py)

Provides a Flask app to serve predictions.

  • App Initialization: loads model on startup using load_model.

Endpoint: GET /

Render HTML form for CSV upload.

{
  "title": "Home Page",
  "description": "Render CSV upload form",
  "method": "GET",
  "baseUrl": "http://localhost:5000",
  "endpoint": "/",
  "headers": [],
  "pathParams": [],
  "queryParams": [],
  "bodyType": "none",
  "responses": {
    "200": {
      "description": "HTML form page",
      "body": "<h1>Exoplanet Predictor</h1>…"
    }
  }
}

Endpoint: POST /predict

Process uploaded CSV and return predictions.

{
  "title": "Batch Prediction",
  "description": "Upload CSV and receive predictions",
  "method": "POST",
  "baseUrl": "http://localhost:5000",
  "endpoint": "/predict",
  "headers": [
    {"key":"Content-Type","value":"multipart/form-data","required":true}
  ],
  "bodyType":"form",
  "formData":[
    {"key":"file","value":"CSV file with required columns","required":true}
  ],
  "responses":{
    "200":{"description":"Success","body":"{\"results\":[…]}"},
    "400":{"description":"Bad Request","body":"{\"error\":\"No file part\"}"},
    "500":{"description":"Server Error","body":"{\"error\":\"...\"}"}
  }
}

Integration: calls validate_dataset, clean_missing, normalize_scale, and predict for each row.


Utilities (utils.py)

Utility logging and error handling.

  • log_metrics(run_id, metrics): Log experiment metrics via logging.info.
  • monitor_training(run_id): Placeholder for training monitoring.
  • handle_errors(e): Log errors via logging.error.

Basic Test Script (test_basic.py)

Simple script to predict one sample from a dict.

  • predict_single_sample(sample):
    • Converts dict → DataFrame
    • Cleans, scales, loads model, predicts, and prints results.
  • Entry Point: Executed when __name__ == '__main__'.

Package Initialization (init.py)

Package docstring; prevents side-effect imports.


Workflow / Usage Guide

Workflow / Usage Guide

1. Data Preparation

  • Gather mission CSVs with REQUIRED_COLUMNS.
  • Optionally call fetch_datasets() for examples.
  • Use merge_datasets() to combine missions.

2. Data Validation & Cleaning

validate_dataset(df)
df_clean = clean_missing(df, strategy='fill')

3. Feature Engineering

from exso_sdk.features import (
    compute_period_features,
    compute_statistical_features,
    compute_domain_features
)
df_feat = compute_period_features(df_clean)
df_feat = compute_statistical_features(df_feat)
df_feat = compute_domain_features(df_feat)

4. Train/Test Split

train_df, val_df, test_df = split_train_val_test(df_feat)

5. Model Training

from torch.utils.data import DataLoader
train_loader = DataLoader(ExoplanetDataset(train_df,), batch_size=32)
val_loader   = DataLoader(ExoplanetDataset(val_df,), batch_size=32)
train_model(model, train_loader, val_loader, {'lr':1e-3, 'epochs':10})

6. Evaluation

test_loader = DataLoader(ExoplanetDataset(test_df,), batch_size=32)
results = evaluate_model(model, test_loader)

7. Batch Prediction via CLI or API

  • CLI: use test_basic.py
  • API: start server and POST /predict with CSV

Diagrams and Flowcharts

Diagrams and Flowcharts

Data & API Flow

API Flowchart

Model Class Relationships

classDiagram
  class ExoplanetDataset {
    +__init__(df, feature_cols, target_col=None)
    +__len__()
    +__getitem__(idx)
  }
  class SimpleNN {
    +__init__(input_dim, hidden_dim, num_classes)
    +forward(x)
  }
  ExoplanetDataset --> SimpleNN : provides batched data
  SimpleNN <|-- build_model

Best Practices

Best Practices

  • Extend Features: add new feature functions in features.py and integrate in pipeline.
  • Custom Preprocessing: override clean_missing or add new strategies.
  • Model Tuning: adjust hidden_dim, learning rate, and epochs via train_model config.
  • Logging: call log_metrics() inside training loops.
  • Error Handling: wrap calls with handle_errors(e) from utils.py.

FAQ / Troubleshooting

FAQ / Troubleshooting

Q: ValueError: Missing required columns

  • Ensure CSV has all names in REQUIRED_COLUMNS (see config.py).

Q: API returns 400 “No file part”

  • Send form-data key as file.

Q: GPU unavailable

  • Model falls back to CPU. Remove CUDA code if unsupported.

Q: Version conflicts

  • Use provided uv.lock to lock dependencies. Recreate venv and pip install ..

Happy Exoplanet Hunting!

🧪 API Testing Suite

The tests/ directory contains a comprehensive testing suite for the Exo-SDK API using the published PyPI package:

Quick Start

cd tests/
python install_package.py  # Install from PyPI
python run_tests.py        # Automated testing

Test Files

  • test_api.py - Comprehensive test suite covering all endpoints
  • quick_api_test.py - Simple quick test for basic functionality
  • start_api_server.py - Script to start the API server
  • run_tests.py - Test runner with automatic server management
  • install_package.py - Script to install the exso-sdk package from PyPI
  • API_TESTING_GUIDE.md - Detailed testing documentation

Test Coverage

✅ All API endpoints (health, info, predict, feature importance)
✅ JSON and CSV data formats
✅ Batch predictions and error handling
✅ Performance testing and web interface
✅ Feature importance analysis

See tests/README.md for detailed testing instructions.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

exso_sdk-2.0.1.tar.gz (14.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

exso_sdk-2.0.1-py3-none-any.whl (8.6 MB view details)

Uploaded Python 3

File details

Details for the file exso_sdk-2.0.1.tar.gz.

File metadata

  • Download URL: exso_sdk-2.0.1.tar.gz
  • Upload date:
  • Size: 14.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for exso_sdk-2.0.1.tar.gz
Algorithm Hash digest
SHA256 a9c4b396ab556e9ea2841f78ced12d8855fc664a591fd730b5ed3e4ce046e978
MD5 ec76e0c48d1e07e0f6685a83a2f6aa6d
BLAKE2b-256 40de5e0e3cbe7d94ab7421f157debe008ad6bee9a13ea847190f12f7f6f13c4f

See more details on using hashes here.

File details

Details for the file exso_sdk-2.0.1-py3-none-any.whl.

File metadata

  • Download URL: exso_sdk-2.0.1-py3-none-any.whl
  • Upload date:
  • Size: 8.6 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for exso_sdk-2.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 81d96503b849e7b100bf955284125ad61ddc0dfa0103d79ee30c323c58488c33
MD5 4c5a6f13734e4f773248a1a78fb1f1bb
BLAKE2b-256 977582ec303cae7705fa5a12c8c2e434c7e55edc8d50fc2e6441d2ea38c02287

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page