Exoplanet candidate classification SDK V2 with stacking ensemble model (LightGBM + XGBoost + CatBoost)

These details have not been verified by PyPI

Project links

Project description

Exso-SDK Documentation

This document provides a comprehensive guide to the Exso-SDK, an exoplanet candidate classification toolkit with preprocessing, feature engineering, model training and serving capabilities.

Project Overview
Installation Guide
Module and Function Documentation
Workflow / Usage Guide
Diagrams and Flowcharts
Best Practices
FAQ / Troubleshooting

Project Overview

The Exso-SDK classifies exoplanet candidates into three classes: False Positive, Candidate, and Positive.
It offers modules for data loading, validation, preprocessing, feature computation, model training, prediction, explanation, and a Flask-based REST API.

Main Features

Data ingestion from CSV or public URLs
Data validation and cleaning
Lightcurve preprocessing
Domain and statistical feature engineering
Neural network model training and evaluation
Prediction with probability outputs
Gradient-based saliency explanations
Flask API for batch prediction
Utility functions for logging and error handling

Dependencies & Requirements

Python ≥ 3.11
pandas ≥ 1.0
numpy ≥ 1.18
scikit-learn ≥ 0.22
torch ≥ 1.7
flask ≥ 1.1
requests
llvmlite ≥ 0.44.0

Dependency declarations appear in pyproject.toml and requirements.txt. The uv.lock file locks exact versions.

Installation Guide

Follow these steps to set up Exso-SDK locally.

1. Clone the Repository

git clone https://github.com/yourname/exso-sdk.git
cd exso-sdk

2. Create Virtual Environment

python -m venv .venv
source .venv/bin/activate  # Linux/Mac
.venv\Scripts\activate     # Windows

3. Install Dependencies

pip install --upgrade pip
pip install -r requirements.txt

Alternatively, install via pyproject.toml:

pip install .

4. Configuration

MODEL_PATH: Set EXSO_MODEL_PATH environment variable to override default model file location.
API: No API key required; runs locally on port 5000 by default.

Module and Function Documentation

This section details every module, class, and function in the project.

Configuration (config.py)

Defines global constants for model path and required input schema.

Constant	Description
_PACKAGE_DIR	Absolute path to this package directory
MODEL_PATH	Path to the trained model file (`.pth`), default via envvar
REQUIRED_COLUMNS	List of numeric feature column names for model input

Data Management (data.py)

Handles dataset fetching, loading, validation, merging, and splitting.

fetch_datasets()

Download example mission CSVs and return as DataFrames.

Returns: List[pd.DataFrame]
Raises: requests.HTTPError on download failure

Example:

from exso_sdk.data import fetch_datasets
dfs = fetch_datasets()

load_csv(path)

Load a local CSV file.

Parameters:
- path (str): File path to CSV
Returns: pd.DataFrame
Raises: FileNotFoundError if file missing
Example:
```
df = load_csv("data/exoplanets.csv")
```

validate_dataset(df)

Ensure DataFrame has all required numeric columns and plausible ranges.

Parameters:
- df (pd.DataFrame): Input data
Returns: True if valid
Raises:
- ValueError if columns missing or invalid values
- TypeError if column dtype is non-numeric
Example:
```
validate_dataset(df)
```

merge_datasets(list_of_dfs)

Concatenate multiple mission DataFrames, aligning to REQUIRED_COLUMNS.

Parameters:
- list_of_dfs (List[pd.DataFrame]): DataFrames to merge
Returns: pd.DataFrame merged dataset

Example:

all_df = merge_datasets([df1, df2, df3])

split_train_val_test(df, ratios=(0.7,0.15,0.15), random_state=42)

Split dataset into train/val/test.

Parameters:
- df (pd.DataFrame): Full dataset
- ratios (tuple): Fractions summing to 1.0
- random_state (int): Seed for reproducibility
Returns: (train_df, val_df, test_df)
Raises: ValueError if ratios sum ≠ 1

Example:

train, val, test = split_train_val_test(all_df)

Data Preprocessing (preprocessing.py)

Cleans and scales raw data; handles missing values and encoding.

clean_missing(df, strategy='drop')

Fill or drop missing values.

Parameters:
- df (pd.DataFrame)
- strategy ('drop'|'fill')
Returns: pd.DataFrame cleaned
Raises: ValueError if strategy invalid

Example:

df_clean = clean_missing(df, strategy='fill')

normalize_scale(df, cols, method='standard')

Scale numeric columns by z-score or min-max.

Parameters:
- df (pd.DataFrame)
- cols (List[str]): Columns to scale
- method ('standard'|'minmax')
Returns: (df_scaled, scaler_object)
Raises: ValueError if method invalid

Example:

df_scaled, scaler = normalize_scale(df, REQUIRED_COLUMNS)

encode_categorical(df)

One-hot encode string columns.

Parameters:
- df (pd.DataFrame)
Returns: pd.DataFrame encoded
Example:
```
df_enc = encode_categorical(df)
```

preprocess_lightcurve(lc)

Detrend and resample a lightcurve.

Parameters:
- lc (pd.DataFrame): must contain time, flux
Returns: pd.DataFrame resampled with flux_detrended

Example:

lc_processed = preprocess_lightcurve(lightcurve_df)

Feature Engineering (features.py)

Compute domain-specific and statistical features.

compute_period_features(df)

Add period harmonics and simple folded stats.

Parameters:
- df (pd.DataFrame)
Returns: pd.DataFrame extended
Example:
```
df_feat = compute_period_features(df)
```

compute_statistical_features(df)

Compute skewness and kurtosis for all numeric columns.

Parameters:
- df (pd.DataFrame)
Returns: pd.DataFrame extended

Example:

df_stats = compute_statistical_features(df)

compute_domain_features(df)

Compute transit SNR and vetting flag.

Parameters:
- df (pd.DataFrame)
Returns: pd.DataFrame extended

Example:

df_domain = compute_domain_features(df)

Evaluation Metrics (metrics.py)

Evaluate classification performance.

compute_metrics(y_true, y_pred)

Return common metrics (accuracy, precision, recall, f1, auc).

Parameters:
- y_true (array-like)
- y_pred (array-like)
Returns: dict of metric values

Example:

stats = compute_metrics(y_true, y_pred)

Modeling (model.py)

Defines dataset wrapper, neural network, training, evaluation, and inference.

Class: ExoplanetDataset(Dataset)

Wrap pandas DataFrame for PyTorch.

Constructor:
- df (pd.DataFrame)
- feature_cols (List[str])
- target_col (str or None)
Methods:
- __len__() → int
- __getitem__(idx) → (X[idx], y[idx]) or X[idx]

Example:

from exso_sdk.model import ExoplanetDataset
dataset = ExoplanetDataset(df, REQUIRED_COLUMNS, target_col='label')

Class: SimpleNN(nn.Module)

Feed-forward network with two hidden layers.

Constructor:
- input_dim (int)
- hidden_dim (int, default=64)
- num_classes (int, default=3)
Method:
- forward(x) → logits

build_model(input_dim, config=None)

Instantiate SimpleNN.

Parameters:
- input_dim (int)
- config (dict with keys hidden_dim, num_classes)
Returns: SimpleNN model

Example:

model = build_model(len(REQUIRED_COLUMNS), config={'hidden_dim':128})

train_model(model, train_loader, val_loader, config)

Train model with checkpoint saving.

Parameters:
- model (nn.Module)
- train_loader (DataLoader)
- val_loader (DataLoader)
- config (dict with lr, epochs)
Returns: None (saves best model to MODEL_PATH)

Example:

train_model(model, train_loader, val_loader, {'lr':1e-3, 'epochs':5})

evaluate_model(model, data_loader)

Compute accuracy, precision, recall, f1, confusion matrix.

Parameters:
- model (nn.Module)
- data_loader (DataLoader)
Returns: dict with metrics and confusion_matrix

Example:

results = evaluate_model(model, test_loader)

predict(model, sample)

Predict a single sample.

Parameters:
- model (nn.Module)
- sample (np.ndarray or torch.Tensor)
Returns:
- pred_class (int), probs (np.ndarray)

Example:

cls, probs = predict(model, sample_vector)

save_model(model, path)

Save model state_dict.

Parameters:
- model (nn.Module)
- path (str)

Example:

save_model(model, "exoplanet_model.pth")

load_model(input_dim, path=MODEL_PATH, config=None)

Load model weights into new instance.

Parameters:
- input_dim (int)
- path (str)
- config (dict or None)
Returns: SimpleNN in eval mode

Example:

model = load_model(len(REQUIRED_COLUMNS))

Model Explanations (explain.py)

Gradient-based saliency explanation without external libs.

explain_prediction(model, sample, target_class_index=None)

Compute absolute gradient of class logit w.r.t. input features.

Parameters:
- model (nn.Module)
- sample (np.ndarray or torch.Tensor)
- target_class_index (int or None)
Returns: np.ndarray saliency map

Example:

sal = explain_prediction(model, sample_vec)

REST API (api.py)

Provides a Flask app to serve predictions.

App Initialization: loads model on startup using load_model.

Endpoint: GET `/`

Render HTML form for CSV upload.

{
  "title": "Home Page",
  "description": "Render CSV upload form",
  "method": "GET",
  "baseUrl": "http://localhost:5000",
  "endpoint": "/",
  "headers": [],
  "pathParams": [],
  "queryParams": [],
  "bodyType": "none",
  "responses": {
    "200": {
      "description": "HTML form page",
      "body": "<h1>Exoplanet Predictor</h1>…"
    }
  }
}

Endpoint: POST `/predict`

Process uploaded CSV and return predictions.

{
  "title": "Batch Prediction",
  "description": "Upload CSV and receive predictions",
  "method": "POST",
  "baseUrl": "http://localhost:5000",
  "endpoint": "/predict",
  "headers": [
    {"key":"Content-Type","value":"multipart/form-data","required":true}
  ],
  "bodyType":"form",
  "formData":[
    {"key":"file","value":"CSV file with required columns","required":true}
  ],
  "responses":{
    "200":{"description":"Success","body":"{\"results\":[…]}"},
    "400":{"description":"Bad Request","body":"{\"error\":\"No file part\"}"},
    "500":{"description":"Server Error","body":"{\"error\":\"...\"}"}
  }
}

Integration: calls validate_dataset, clean_missing, normalize_scale, and predict for each row.

Utilities (utils.py)

Utility logging and error handling.

log_metrics(run_id, metrics): Log experiment metrics via logging.info.
monitor_training(run_id): Placeholder for training monitoring.
handle_errors(e): Log errors via logging.error.

Basic Test Script (test_basic.py)

Simple script to predict one sample from a dict.

predict_single_sample(sample):
- Converts dict → DataFrame
- Cleans, scales, loads model, predicts, and prints results.
Entry Point: Executed when __name__ == '__main__'.

Package Initialization (init.py)

Package docstring; prevents side-effect imports.

Workflow / Usage Guide

1. Data Preparation

Gather mission CSVs with REQUIRED_COLUMNS.
Optionally call fetch_datasets() for examples.
Use merge_datasets() to combine missions.

2. Data Validation & Cleaning

validate_dataset(df)
df_clean = clean_missing(df, strategy='fill')

3. Feature Engineering

from exso_sdk.features import (
    compute_period_features,
    compute_statistical_features,
    compute_domain_features
)
df_feat = compute_period_features(df_clean)
df_feat = compute_statistical_features(df_feat)
df_feat = compute_domain_features(df_feat)

4. Train/Test Split

train_df, val_df, test_df = split_train_val_test(df_feat)

5. Model Training

from torch.utils.data import DataLoader
train_loader = DataLoader(ExoplanetDataset(train_df,…), batch_size=32)
val_loader   = DataLoader(ExoplanetDataset(val_df,…), batch_size=32)
train_model(model, train_loader, val_loader, {'lr':1e-3, 'epochs':10})

6. Evaluation

test_loader = DataLoader(ExoplanetDataset(test_df,…), batch_size=32)
results = evaluate_model(model, test_loader)

7. Batch Prediction via CLI or API

CLI: use test_basic.py
API: start server and POST /predict with CSV

Diagrams and Flowcharts

Data & API Flow

Model Class Relationships

classDiagram
  class ExoplanetDataset {
    +__init__(df, feature_cols, target_col=None)
    +__len__()
    +__getitem__(idx)
  }
  class SimpleNN {
    +__init__(input_dim, hidden_dim, num_classes)
    +forward(x)
  }
  ExoplanetDataset --> SimpleNN : provides batched data
  SimpleNN <|-- build_model

Best Practices

Extend Features: add new feature functions in features.py and integrate in pipeline.
Custom Preprocessing: override clean_missing or add new strategies.
Model Tuning: adjust hidden_dim, learning rate, and epochs via train_model config.
Logging: call log_metrics() inside training loops.
Error Handling: wrap calls with handle_errors(e) from utils.py.

FAQ / Troubleshooting

Q: ValueError: Missing required columns

Ensure CSV has all names in REQUIRED_COLUMNS (see config.py).

Q: API returns 400 “No file part”

Send form-data key as file.

Q: GPU unavailable

Model falls back to CPU. Remove CUDA code if unsupported.

Q: Version conflicts

Use provided uv.lock to lock dependencies. Recreate venv and pip install ..

Happy Exoplanet Hunting!

Exso-SDK Documentation

This document provides a comprehensive guide to the Exso-SDK, an exoplanet candidate classification toolkit with preprocessing, feature engineering, model training and serving capabilities.

Project Overview
Installation Guide
Module and Function Documentation
Workflow / Usage Guide
Diagrams and Flowcharts
Best Practices
FAQ / Troubleshooting

Project Overview

Main Features

Data ingestion from CSV or public URLs
Data validation and cleaning
Lightcurve preprocessing
Domain and statistical feature engineering
Neural network model training and evaluation
Prediction with probability outputs
Gradient-based saliency explanations
Flask API for batch prediction
Utility functions for logging and error handling

Dependencies & Requirements

Python ≥ 3.11
pandas ≥ 1.0
numpy ≥ 1.18
scikit-learn ≥ 0.22
torch ≥ 1.7
flask ≥ 1.1
requests
llvmlite ≥ 0.44.0

Dependency declarations appear in pyproject.toml and requirements.txt. The uv.lock file locks exact versions.

Installation Guide

Follow these steps to set up Exso-SDK locally.

1. Clone the Repository

git clone https://github.com/yourname/exso-sdk.git
cd exso-sdk

2. Create Virtual Environment

python -m venv .venv
source .venv/bin/activate  # Linux/Mac
.venv\Scripts\activate     # Windows

3. Install Dependencies

pip install --upgrade pip
pip install -r requirements.txt

Alternatively, install via pyproject.toml:

pip install .

4. Configuration

MODEL_PATH: Set EXSO_MODEL_PATH environment variable to override default model file location.
API: No API key required; runs locally on port 5000 by default.

Module and Function Documentation

This section details every module, class, and function in the project.

Configuration (config.py)

Defines global constants for model path and required input schema.

Constant	Description
_PACKAGE_DIR	Absolute path to this package directory
MODEL_PATH	Path to the trained model file (`.pth`), default via envvar
REQUIRED_COLUMNS	List of numeric feature column names for model input

Data Management (data.py)

Handles dataset fetching, loading, validation, merging, and splitting.

fetch_datasets()

Download example mission CSVs and return as DataFrames.

Returns: List[pd.DataFrame]
Raises: requests.HTTPError on download failure

Example:

from exso_sdk.data import fetch_datasets
dfs = fetch_datasets()

load_csv(path)

Load a local CSV file.

Parameters:
- path (str): File path to CSV
Returns: pd.DataFrame
Raises: FileNotFoundError if file missing
Example:
```
df = load_csv("data/exoplanets.csv")
```

validate_dataset(df)

Ensure DataFrame has all required numeric columns and plausible ranges.

Parameters:
- df (pd.DataFrame): Input data
Returns: True if valid
Raises:
- ValueError if columns missing or invalid values
- TypeError if column dtype is non-numeric
Example:
```
validate_dataset(df)
```

merge_datasets(list_of_dfs)

Concatenate multiple mission DataFrames, aligning to REQUIRED_COLUMNS.

Parameters:
- list_of_dfs (List[pd.DataFrame]): DataFrames to merge
Returns: pd.DataFrame merged dataset

Example:

all_df = merge_datasets([df1, df2, df3])

split_train_val_test(df, ratios=(0.7,0.15,0.15), random_state=42)

Split dataset into train/val/test.

Parameters:
- df (pd.DataFrame): Full dataset
- ratios (tuple): Fractions summing to 1.0
- random_state (int): Seed for reproducibility
Returns: (train_df, val_df, test_df)
Raises: ValueError if ratios sum ≠ 1

Example:

train, val, test = split_train_val_test(all_df)

Data Preprocessing (preprocessing.py)

Cleans and scales raw data; handles missing values and encoding.

clean_missing(df, strategy='drop')

Fill or drop missing values.

Parameters:
- df (pd.DataFrame)
- strategy ('drop'|'fill')
Returns: pd.DataFrame cleaned
Raises: ValueError if strategy invalid

Example:

df_clean = clean_missing(df, strategy='fill')

normalize_scale(df, cols, method='standard')

Scale numeric columns by z-score or min-max.

Parameters:
- df (pd.DataFrame)
- cols (List[str]): Columns to scale
- method ('standard'|'minmax')
Returns: (df_scaled, scaler_object)
Raises: ValueError if method invalid

Example:

df_scaled, scaler = normalize_scale(df, REQUIRED_COLUMNS)

encode_categorical(df)

One-hot encode string columns.

Parameters:
- df (pd.DataFrame)
Returns: pd.DataFrame encoded
Example:
```
df_enc = encode_categorical(df)
```

preprocess_lightcurve(lc)

Detrend and resample a lightcurve.

Parameters:
- lc (pd.DataFrame): must contain time, flux
Returns: pd.DataFrame resampled with flux_detrended

Example:

lc_processed = preprocess_lightcurve(lightcurve_df)

Feature Engineering (features.py)

Compute domain-specific and statistical features.

compute_period_features(df)

Add period harmonics and simple folded stats.

Parameters:
- df (pd.DataFrame)
Returns: pd.DataFrame extended
Example:
```
df_feat = compute_period_features(df)
```

compute_statistical_features(df)

Compute skewness and kurtosis for all numeric columns.

Parameters:
- df (pd.DataFrame)
Returns: pd.DataFrame extended

Example:

df_stats = compute_statistical_features(df)

compute_domain_features(df)

Compute transit SNR and vetting flag.

Parameters:
- df (pd.DataFrame)
Returns: pd.DataFrame extended

Example:

df_domain = compute_domain_features(df)

Evaluation Metrics (metrics.py)

Evaluate classification performance.

compute_metrics(y_true, y_pred)

Return common metrics (accuracy, precision, recall, f1, auc).

Parameters:
- y_true (array-like)
- y_pred (array-like)
Returns: dict of metric values

Example:

stats = compute_metrics(y_true, y_pred)

Modeling (model.py)

Defines dataset wrapper, neural network, training, evaluation, and inference.

Class: ExoplanetDataset(Dataset)

Wrap pandas DataFrame for PyTorch.

Constructor:
- df (pd.DataFrame)
- feature_cols (List[str])
- target_col (str or None)
Methods:
- __len__() → int
- __getitem__(idx) → (X[idx], y[idx]) or X[idx]

Example:

from exso_sdk.model import ExoplanetDataset
dataset = ExoplanetDataset(df, REQUIRED_COLUMNS, target_col='label')

Class: SimpleNN(nn.Module)

Feed-forward network with two hidden layers.

Constructor:
- input_dim (int)
- hidden_dim (int, default=64)
- num_classes (int, default=3)
Method:
- forward(x) → logits

build_model(input_dim, config=None)

Instantiate SimpleNN.

Parameters:
- input_dim (int)
- config (dict with keys hidden_dim, num_classes)
Returns: SimpleNN model

Example:

model = build_model(len(REQUIRED_COLUMNS), config={'hidden_dim':128})

train_model(model, train_loader, val_loader, config)

Train model with checkpoint saving.

Parameters:
- model (nn.Module)
- train_loader (DataLoader)
- val_loader (DataLoader)
- config (dict with lr, epochs)
Returns: None (saves best model to MODEL_PATH)

Example:

train_model(model, train_loader, val_loader, {'lr':1e-3, 'epochs':5})

evaluate_model(model, data_loader)

Compute accuracy, precision, recall, f1, confusion matrix.

Parameters:
- model (nn.Module)
- data_loader (DataLoader)
Returns: dict with metrics and confusion_matrix

Example:

results = evaluate_model(model, test_loader)

predict(model, sample)

Predict a single sample.

Parameters:
- model (nn.Module)
- sample (np.ndarray or torch.Tensor)
Returns:
- pred_class (int), probs (np.ndarray)

Example:

cls, probs = predict(model, sample_vector)

save_model(model, path)

Save model state_dict.

Parameters:
- model (nn.Module)
- path (str)

Example:

save_model(model, "exoplanet_model.pth")

load_model(input_dim, path=MODEL_PATH, config=None)

Load model weights into new instance.

Parameters:
- input_dim (int)
- path (str)
- config (dict or None)
Returns: SimpleNN in eval mode

Example:

model = load_model(len(REQUIRED_COLUMNS))

Model Explanations (explain.py)

Gradient-based saliency explanation without external libs.

explain_prediction(model, sample, target_class_index=None)

Compute absolute gradient of class logit w.r.t. input features.

Parameters:
- model (nn.Module)
- sample (np.ndarray or torch.Tensor)
- target_class_index (int or None)
Returns: np.ndarray saliency map

Example:

sal = explain_prediction(model, sample_vec)

REST API (api.py)

Provides a Flask app to serve predictions.

App Initialization: loads model on startup using load_model.

Endpoint: GET `/`

Render HTML form for CSV upload.

{
  "title": "Home Page",
  "description": "Render CSV upload form",
  "method": "GET",
  "baseUrl": "http://localhost:5000",
  "endpoint": "/",
  "headers": [],
  "pathParams": [],
  "queryParams": [],
  "bodyType": "none",
  "responses": {
    "200": {
      "description": "HTML form page",
      "body": "<h1>Exoplanet Predictor</h1>…"
    }
  }
}

Endpoint: POST `/predict`

Process uploaded CSV and return predictions.

{
  "title": "Batch Prediction",
  "description": "Upload CSV and receive predictions",
  "method": "POST",
  "baseUrl": "http://localhost:5000",
  "endpoint": "/predict",
  "headers": [
    {"key":"Content-Type","value":"multipart/form-data","required":true}
  ],
  "bodyType":"form",
  "formData":[
    {"key":"file","value":"CSV file with required columns","required":true}
  ],
  "responses":{
    "200":{"description":"Success","body":"{\"results\":[…]}"},
    "400":{"description":"Bad Request","body":"{\"error\":\"No file part\"}"},
    "500":{"description":"Server Error","body":"{\"error\":\"...\"}"}
  }
}

Integration: calls validate_dataset, clean_missing, normalize_scale, and predict for each row.

Utilities (utils.py)

Utility logging and error handling.

log_metrics(run_id, metrics): Log experiment metrics via logging.info.
monitor_training(run_id): Placeholder for training monitoring.
handle_errors(e): Log errors via logging.error.

Basic Test Script (test_basic.py)

Simple script to predict one sample from a dict.

predict_single_sample(sample):
- Converts dict → DataFrame
- Cleans, scales, loads model, predicts, and prints results.
Entry Point: Executed when __name__ == '__main__'.

Package Initialization (init.py)

Package docstring; prevents side-effect imports.

Workflow / Usage Guide

1. Data Preparation

Gather mission CSVs with REQUIRED_COLUMNS.
Optionally call fetch_datasets() for examples.
Use merge_datasets() to combine missions.

2. Data Validation & Cleaning

validate_dataset(df)
df_clean = clean_missing(df, strategy='fill')

3. Feature Engineering

from exso_sdk.features import (
    compute_period_features,
    compute_statistical_features,
    compute_domain_features
)
df_feat = compute_period_features(df_clean)
df_feat = compute_statistical_features(df_feat)
df_feat = compute_domain_features(df_feat)

4. Train/Test Split

train_df, val_df, test_df = split_train_val_test(df_feat)

5. Model Training

from torch.utils.data import DataLoader
train_loader = DataLoader(ExoplanetDataset(train_df,…), batch_size=32)
val_loader   = DataLoader(ExoplanetDataset(val_df,…), batch_size=32)
train_model(model, train_loader, val_loader, {'lr':1e-3, 'epochs':10})

6. Evaluation

test_loader = DataLoader(ExoplanetDataset(test_df,…), batch_size=32)
results = evaluate_model(model, test_loader)

7. Batch Prediction via CLI or API

CLI: use test_basic.py
API: start server and POST /predict with CSV

Diagrams and Flowcharts

Data & API Flow

Model Class Relationships

classDiagram
  class ExoplanetDataset {
    +__init__(df, feature_cols, target_col=None)
    +__len__()
    +__getitem__(idx)
  }
  class SimpleNN {
    +__init__(input_dim, hidden_dim, num_classes)
    +forward(x)
  }
  ExoplanetDataset --> SimpleNN : provides batched data
  SimpleNN <|-- build_model

Best Practices

Extend Features: add new feature functions in features.py and integrate in pipeline.
Custom Preprocessing: override clean_missing or add new strategies.
Model Tuning: adjust hidden_dim, learning rate, and epochs via train_model config.
Logging: call log_metrics() inside training loops.
Error Handling: wrap calls with handle_errors(e) from utils.py.

FAQ / Troubleshooting

Q: ValueError: Missing required columns

Ensure CSV has all names in REQUIRED_COLUMNS (see config.py).

Q: API returns 400 “No file part”

Send form-data key as file.

Q: GPU unavailable

Model falls back to CPU. Remove CUDA code if unsupported.

Q: Version conflicts

Use provided uv.lock to lock dependencies. Recreate venv and pip install ..

Happy Exoplanet Hunting!

🧪 API Testing Suite

The tests/ directory contains a comprehensive testing suite for the Exo-SDK API using the published PyPI package:

Quick Start

cd tests/
python install_package.py  # Install from PyPI
python run_tests.py        # Automated testing

Test Files

test_api.py - Comprehensive test suite covering all endpoints
quick_api_test.py - Simple quick test for basic functionality
start_api_server.py - Script to start the API server
run_tests.py - Test runner with automatic server management
install_package.py - Script to install the exso-sdk package from PyPI
API_TESTING_GUIDE.md - Detailed testing documentation

Test Coverage

✅ All API endpoints (health, info, predict, feature importance)
✅ JSON and CSV data formats
✅ Batch predictions and error handling
✅ Performance testing and web interface
✅ Feature importance analysis

See tests/README.md for detailed testing instructions.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

2.0.1

Oct 2, 2025

2.0.0

Oct 2, 2025

0.1.5

Sep 19, 2025

0.1.4

Sep 19, 2025

0.1.3

Sep 19, 2025

0.1.2

Sep 19, 2025

0.1.1

Sep 18, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

exso_sdk-2.0.1.tar.gz (14.0 MB view details)

Uploaded Oct 2, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

exso_sdk-2.0.1-py3-none-any.whl (8.6 MB view details)

Uploaded Oct 2, 2025 Python 3

File details

Details for the file exso_sdk-2.0.1.tar.gz.

File metadata

Download URL: exso_sdk-2.0.1.tar.gz
Upload date: Oct 2, 2025
Size: 14.0 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for exso_sdk-2.0.1.tar.gz
Algorithm	Hash digest
SHA256	`a9c4b396ab556e9ea2841f78ced12d8855fc664a591fd730b5ed3e4ce046e978`
MD5	`ec76e0c48d1e07e0f6685a83a2f6aa6d`
BLAKE2b-256	`40de5e0e3cbe7d94ab7421f157debe008ad6bee9a13ea847190f12f7f6f13c4f`

See more details on using hashes here.

File details

Details for the file exso_sdk-2.0.1-py3-none-any.whl.

File metadata

Download URL: exso_sdk-2.0.1-py3-none-any.whl
Upload date: Oct 2, 2025
Size: 8.6 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.13

File hashes

Hashes for exso_sdk-2.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`81d96503b849e7b100bf955284125ad61ddc0dfa0103d79ee30c323c58488c33`
MD5	`4c5a6f13734e4f773248a1a78fb1f1bb`
BLAKE2b-256	`977582ec303cae7705fa5a12c8c2e434c7e55edc8d50fc2e6441d2ea38c02287`

See more details on using hashes here.

exso-sdk 2.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Exso-SDK Documentation

Table of Contents

Project Overview

Installation Guide

1. Clone the Repository

2. Create Virtual Environment

3. Install Dependencies

4. Configuration

Module and Function Documentation

Configuration (config.py)

Data Management (data.py)

fetch_datasets()

load_csv(path)

validate_dataset(df)

merge_datasets(list_of_dfs)

split_train_val_test(df, ratios=(0.7,0.15,0.15), random_state=42)

Data Preprocessing (preprocessing.py)

clean_missing(df, strategy='drop')

normalize_scale(df, cols, method='standard')

encode_categorical(df)

preprocess_lightcurve(lc)

Feature Engineering (features.py)

compute_period_features(df)

compute_statistical_features(df)

compute_domain_features(df)

Evaluation Metrics (metrics.py)

compute_metrics(y_true, y_pred)

Modeling (model.py)

Class: ExoplanetDataset(Dataset)

Class: SimpleNN(nn.Module)

build_model(input_dim, config=None)

train_model(model, train_loader, val_loader, config)

evaluate_model(model, data_loader)

predict(model, sample)

save_model(model, path)

load_model(input_dim, path=MODEL_PATH, config=None)

Model Explanations (explain.py)

explain_prediction(model, sample, target_class_index=None)

REST API (api.py)

Endpoint: GET /

Endpoint: POST /predict

Utilities (utils.py)

Basic Test Script (test_basic.py)

Package Initialization (init.py)

Workflow / Usage Guide

1. Data Preparation

2. Data Validation & Cleaning

3. Feature Engineering

4. Train/Test Split

5. Model Training

6. Evaluation

7. Batch Prediction via CLI or API

Diagrams and Flowcharts

Data & API Flow

Model Class Relationships

Best Practices

FAQ / Troubleshooting

Exso-SDK Documentation

Table of Contents

Project Overview

Installation Guide

1. Clone the Repository

2. Create Virtual Environment

3. Install Dependencies

4. Configuration

Module and Function Documentation

Configuration (config.py)

Data Management (data.py)

fetch_datasets()

load_csv(path)

validate_dataset(df)

Endpoint: GET `/`

Endpoint: POST `/predict`

Endpoint: GET `/`

Endpoint: POST `/predict`