A production-ready CLI toolkit for training, evaluating, and tracking Machine Learning and Deep Learning models with experiment tracking, hyperparameter tuning, model explainability, and an interactive TUI
Project description
███╗ ███╗██╗ ██████╗██╗ ██╗
████╗ ████║██║ ██╔════╝██║ ██║
██╔████╔██║██║ ██║ ██║ ██║
██║╚██╔╝██║██║ ██║ ██║ ██║
██║ ╚═╝ ██║███████╗╚██████╗███████╗██║
╚═╝ ╚═╝╚══════╝ ╚═════╝╚══════╝╚═╝
🤖 MLCLI - Machine Learning Command Line Interface
A powerful, modular CLI tool for training, evaluating, and tracking ML/DL models
📖 Documentation • 📦 PyPI • Features • Installation • Usage • Configuration • Contributing
mlcli is a modular, configuration-driven command-line tool for training, evaluating, saving, and tracking both Machine Learning and Deep Learning models. It also includes an interactive terminal UI for users who prefer a guided workflow.
🚀 Features
-
Train ML models:
- Logistic Regression
- SVM
- Random Forest
- XGBoost
-
Train Deep Learning models:
- TensorFlow DNN
- CNN models
- RNN/LSTM/GRU models
-
🆕 Hyperparameter Tuning:
- Grid Search
- Random Search
- Bayesian Optimization (Optuna)
-
🆕 Model Explainability:
- SHAP (SHapley Additive exPlanations)
- LIME (Local Interpretable Model-agnostic Explanations)
- Feature importance visualization
- Instance-level explanations
-
🆕 Data Preprocessing Pipeline:
- Scaling: StandardScaler, MinMaxScaler, RobustScaler
- Normalization: L1, L2, Max norm
- Encoding: LabelEncoder, OneHotEncoder, OrdinalEncoder
- Feature Selection: SelectKBest, RFE, VarianceThreshold
- Pipeline Support: Chain multiple preprocessors
-
Unified configuration system (JSON/YAML)
-
Automatic Model Registry (plug-and-play trainers)
-
Model saving:
- ML → Pickle, Joblib & ONNX
- DL → SavedModel & H5
-
Built-in experiment tracker (mini-MLflow with JSON storage)
-
Interactive terminal UI (TUI)
📁 Project Structure
mlcli/
├── mlcli/
│ ├── __init__.py
│ ├── __main__.py
│ ├── cli.py
│ ├── config/
│ │ ├── __init__.py
│ │ └── loader.py
│ ├── trainers/
│ │ ├── __init__.py
│ │ ├── base_trainer.py
│ │ ├── logistic_trainer.py
│ │ ├── svm_trainer.py
│ │ ├── rf_trainer.py
│ │ ├── xgb_trainer.py
│ │ ├── tf_dnn_trainer.py
│ │ ├── tf_cnn_trainer.py
│ │ └── tf_rnn_trainer.py
│ ├── tuner/ # Hyperparameter Tuning
│ │ ├── __init__.py
│ │ ├── base_tuner.py
│ │ ├── grid_tuner.py
│ │ ├── random_tuner.py
│ │ └── optuna_tuner.py
│ ├── explainer/ # 🆕 Model Explainability
│ │ ├── __init__.py
│ │ ├── base_explainer.py
│ │ ├── shap_explainer.py
│ │ ├── lime_explainer.py
│ │ └── explainer_factory.py
│ ├── preprocessor/ # 🆕 Data Preprocessing Pipeline
│ │ ├── __init__.py
│ │ ├── base_preprocessor.py
│ │ ├── scalers.py
│ │ ├── normalizers.py
│ │ ├── encoders.py
│ │ ├── feature_selectors.py
│ │ ├── preprocessor_factory.py
│ │ └── pipeline.py
│ ├── utils/
│ │ ├── __init__.py
│ │ ├── io.py
│ │ ├── metrics.py
│ │ ├── logger.py
│ │ └── registry.py
│ ├── runner/
│ │ ├── __init__.py
│ │ └── experiment_tracker.py
│ ├── ui/
│ │ ├── __init__.py
│ │ ├── app.py
│ │ ├── screens/
│ │ └── widgets/
│ └── models/
├── configs/
├── data/
├── artifacts/
├── logs/
├── runs/
├── scripts/
├── README.md
├── pyproject.toml
└── requirements.txt
🛠️ Complete Setup Guide (From Scratch)
Step 1: Clone the Repository
git clone https://github.com/codeMaestro78/MLcli.git
cd mlcli
Step 2: Create Virtual Environment
Windows (PowerShell):
python -m venv .venv
.\.venv\Scripts\Activate.ps1
Windows (CMD):
python -m venv .venv
.\.venv\Scripts\activate.bat
Linux/macOS:
python -m venv .venv
source .venv/bin/activate
Step 3: Install Dependencies
pip install --upgrade pip
pip install -r requirements.txt
Step 4: Install mlcli in Development Mode
pip install -e .
Step 5: Verify Installation
mlcli --help
Expected Output:
Usage: mlcli [OPTIONS] COMMAND [ARGS]...
MLCLI - Machine Learning Command Line Interface
Options:
--help Show this message and exit.
Commands:
eval Evaluate a saved model on test data.
export-runs Export experiment runs to CSV.
list-models List all available model trainers.
list-runs List all experiment runs.
show-run Show details of a specific experiment run.
train Train a model using a configuration file.
ui Launch the interactive terminal UI.
📖 All CLI Commands
1. List Available Models
View all registered model trainers:
mlcli list-models
Output:
Available Model Trainers:
================================================================================
logistic_regression Logistic Regression Classifier [sklearn]
svm Support Vector Machine Classifier [sklearn]
random_forest Random Forest Classifier [sklearn]
xgboost XGBoost Gradient Boosting Classifier [xgboost]
tf_dnn TensorFlow Dense Neural Network [tensorflow]
tf_cnn TensorFlow CNN for Image Classification [tensorflow]
tf_rnn TensorFlow RNN for Sequence Data [tensorflow]
================================================================================
2. Train Models
Train with Configuration File
mlcli train --config <path-to-config.json>
Train Logistic Regression
mlcli train --config configs/logistic_config.json
Train Random Forest
mlcli train --config configs/rf_config.json
Train SVM
mlcli train --config configs/svm_config.json
Train XGBoost
mlcli train --config configs/xgb_config.json
Train TensorFlow DNN
mlcli train --config configs/tf_dnn_config.json
Train TensorFlow CNN (for image data)
mlcli train --config configs/tf_cnn_config.json
Train TensorFlow RNN (for sequence data)
mlcli train --config configs/tf_rnn_config.json
Train with Parameter Overrides
mlcli train --config configs/tf_dnn_config.json --epochs 50 --batch-size 64
3. 🆕 Hyperparameter Tuning
Tune model hyperparameters using Grid Search, Random Search, or Bayesian Optimization.
List Available Tuning Methods
mlcli list-tuners
Output:
┏━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Method ┃ Name ┃ Best For ┃
┡━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ grid │ Grid Search │ Small parameter spaces with discrete values │
│ random │ Random Search │ Large parameter spaces, continuous params │
│ bayesian │ Bayesian Optimization (Optuna) │ Expensive evaluations, complex param spaces │
└──────────┴──────────────────────────────────┴──────────────────────────────────────────────┘
Tune with Grid Search
mlcli tune --config configs/tune_rf_config.json --method grid --cv 5
Tune with Random Search
mlcli tune --config configs/tune_rf_config.json --method random --n-trials 100 --cv 5
Tune with Bayesian Optimization (Optuna)
mlcli tune --config configs/tune_xgb_config.json --method bayesian --n-trials 200 --scoring accuracy
Tune and Train Best Model
mlcli tune --config configs/tune_rf_config.json --method random --n-trials 50 --train-best
Tune Options
| Option | Description |
|---|---|
--config, -c |
Path to tuning configuration file |
--method, -m |
Tuning method: grid, random, or bayesian |
--n-trials, -n |
Number of trials (for random/bayesian) |
--cv |
Number of cross-validation folds |
--scoring, -s |
Metric to optimize: accuracy, f1, roc_auc, precision, recall |
--output, -o |
Path to save tuning results (JSON) |
--train-best |
Train a model with best params after tuning |
4. 🆕 Model Explainability (SHAP/LIME)
Understand why your models make predictions using SHAP and LIME.
List Available Explainers
mlcli list-explainers
Output:
┏━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Method ┃ Full Name ┃ Best For ┃
┡━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ shap │ SHapley Additive exPlanations │ Tree-based models, global explanations │
│ lime │ Local Interpretable Model-agnostic Explanations │ Any model, local explanations │
└────────┴─────────────────────────────────────────────┴───────────────────────────────────────────┘
Explain Model with SHAP
mlcli explain --model models/rf_model.pkl --data data/train.csv --type random_forest --method shap
Explain Model with LIME
mlcli explain --model models/xgb_model.pkl --data data/train.csv --type xgboost --method lime
Explain with Plot Output
mlcli explain -m models/rf_model.pkl -d data/train.csv -t random_forest -e shap --plot-output feature_importance.png
Explain Single Instance
Understand why a specific prediction was made:
mlcli explain-instance --model models/rf_model.pkl --data data/test.csv --type random_forest --instance 0
mlcli explain-instance -m models/xgb_model.pkl -d data/test.csv -t xgboost -i 5 -e lime
Explainability Options
| Option | Description |
|---|---|
--model, -m |
Path to saved model file |
--data, -d |
Path to data file |
--type, -t |
Model type (random_forest, xgboost, logistic_regression) |
--method, -e |
Explanation method: shap or lime |
--num-samples, -n |
Number of samples to explain (default: 100) |
--output, -o |
Path to save explanation results (JSON) |
--plot/--no-plot |
Generate explanation plot |
--plot-output, -p |
Path to save plot (PNG) |
Understanding SHAP vs LIME
| Feature | SHAP | LIME |
|---|---|---|
| Type | Global + Local | Local |
| Theory | Game Theory (Shapley Values) | Local Surrogate Models |
| Best For | Tree models (RF, XGBoost) | Any black-box model |
| Speed | Fast for trees | Slower (samples required) |
| Consistency | Mathematically consistent | Varies by sampling |
5. 🆕 Data Preprocessing
Preprocess your data using various scaling, normalization, encoding, and feature selection methods.
List Available Preprocessors
mlcli list-preprocessors
Output:
┏━━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━┳━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┓
┃ Method ┃ Name ┃ Description ┃
┡━━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━╇━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━┩
│ Scaling │ │ │
│ standard_scaler │ StandardScaler │ Standardize features by removing mean and scaling to unit var │
│ minmax_scaler │ MinMaxScaler │ Scale features to a given range (default 0-1) │
│ robust_scaler │ RobustScaler │ Scale features using statistics robust to outliers (median/IQR) │
├──────────────────────┼─────────────────────┼─────────────────────────────────────────────────────────────────┤
│ Normalization │ │ │
│ normalizer │ Normalizer │ Normalize samples individually to unit norm │
│ l1_normalizer │ L1 Normalizer │ Normalize samples to L1 norm (sum of absolute values = 1) │
│ l2_normalizer │ L2 Normalizer │ Normalize samples to L2 norm (Euclidean norm = 1) │
├──────────────────────┼─────────────────────┼─────────────────────────────────────────────────────────────────┤
│ Encoding │ │ │
│ label_encoder │ LabelEncoder │ Encode target labels with values between 0 and n_classes-1 │
│ onehot_encoder │ OneHotEncoder │ Encode categorical features as one-hot numeric arrays │
│ ordinal_encoder │ OrdinalEncoder │ Encode categorical features as ordinal integers │
├──────────────────────┼─────────────────────┼─────────────────────────────────────────────────────────────────┤
│ Feature Selection │ │ │
│ select_k_best │ SelectKBest │ Select features according to the k highest scores │
│ rfe │ RFE │ Recursive Feature Elimination based on model importance │
│ variance_threshold │ VarianceThreshold │ Remove features with variance below threshold │
└──────────────────────┴─────────────────────┴─────────────────────────────────────────────────────────────────┘
Preprocess with StandardScaler
mlcli preprocess --data data/train.csv --output data/train_scaled.csv --method standard_scaler
Preprocess with MinMaxScaler
mlcli preprocess -d data/train.csv -o data/train_minmax.csv -m minmax_scaler --range-min 0 --range-max 1
Preprocess with RobustScaler (outlier-resistant)
mlcli preprocess -d data/train.csv -o data/train_robust.csv -m robust_scaler
Normalize Data (L2 norm)
mlcli preprocess -d data/train.csv -o data/train_norm.csv -m normalizer --norm l2
Feature Selection with SelectKBest
Select top K features based on statistical tests:
mlcli preprocess -d data/train.csv -o data/train_selected.csv -m select_k_best --target label --k 10
Feature Selection with RFE
Recursive Feature Elimination using model importance:
mlcli preprocess -d data/train.csv -o data/train_rfe.csv -m rfe --target label --k 15
Remove Low-Variance Features
mlcli preprocess -d data/train.csv -o data/train_var.csv -m variance_threshold --threshold 0.1
Save Fitted Preprocessor
mlcli preprocess -d data/train.csv -o data/train_scaled.csv -m standard_scaler --save-preprocessor models/scaler.pkl
Apply Preprocessing Pipeline (Multiple Steps)
mlcli preprocess-pipeline --data data/train.csv --output data/processed.csv --steps "standard_scaler,select_k_best" --target label
Preprocessing Options
| Option | Description |
|---|---|
--data, -d |
Path to input CSV data |
--output, -o |
Path to save preprocessed data |
--method, -m |
Preprocessing method |
--target, -t |
Target column (for feature selection) |
--columns, -c |
Specific columns to preprocess |
--k |
Number of features (SelectKBest/RFE) |
--threshold |
Variance threshold |
--norm |
Norm type (l1, l2, max) |
--range-min, --range-max |
MinMaxScaler range |
--save-preprocessor, -s |
Save fitted preprocessor |
Preprocessing Methods Comparison
| Method | Best For | Key Feature |
|---|---|---|
| StandardScaler | Most ML algorithms | Zero mean, unit variance |
| MinMaxScaler | Neural networks, bounded outputs | Fixed range (0-1) |
| RobustScaler | Data with outliers | Uses median/IQR |
| Normalizer | Text data, similarity measures | Unit norm per sample |
| SelectKBest | Quick feature filtering | Statistical scoring |
| RFE | Model-based selection | Iterative importance |
| VarianceThreshold | Removing constant features | Unsupervised |
6. Evaluate Models
Evaluate a saved model on test data:
mlcli eval --model-path <path-to-model> --data-path <path-to-test-data> --model-type <model-type>
Evaluate Pickle Model
mlcli eval --model-path artifacts/model.pkl --data-path data/test.csv --model-type logistic_regression
Evaluate Joblib Model
mlcli eval --model-path artifacts/model.joblib --data-path data/test.csv --model-type random_forest
Evaluate TensorFlow Model (H5)
mlcli eval --model-path artifacts/model.h5 --data-path data/test.csv --model-type tf_dnn
7. Experiment Tracking Commands
List All Experiment Runs
mlcli list-runs
Output:
Experiment Runs:
================================================================================
Run ID Model Type Accuracy Duration
--------------------------------------------------------------------------------
abc123-def456-789... random_forest 0.8318 4.2s
xyz789-abc123-456... xgboost 0.8288 1.1s
...
================================================================================
Show Details of a Specific Run
mlcli show-run <run-id>
Example:
mlcli show-run abc123-def456-789
Export All Runs to CSV
mlcli export-runs --output experiments.csv
8. Interactive Terminal UI
Launch the interactive interface:
mlcli ui
TUI Features:
- 🎯 Train Model - Select config, model type, and override parameters
- 📊 Evaluate Model - Load and evaluate saved models
- 📈 View Experiments - Browse, filter, and export experiment history
- 🔧 List Models - View all registered trainers with metadata
📝 Configuration Files
Create a Configuration File
Configuration files define the model, dataset, training parameters, and output settings.
Configuration Structure
{
"model": {
"type": "<model-type>",
"params": { ... }
},
"dataset": {
"path": "<path-to-data>",
"type": "csv",
"target_column": "<target-column-name>"
},
"training": {
"test_size": 0.2,
"random_state": 42
},
"output": {
"model_dir": "artifacts",
"save_formats": ["pickle", "joblib"]
}
}
Example Configurations
Logistic Regression (configs/logistic_config.json)
{
"model": {
"type": "logistic_regression",
"params": {
"penalty": "l2",
"C": 1.0,
"solver": "lbfgs",
"max_iter": 1000
}
},
"dataset": {
"path": "data/train.csv",
"type": "csv",
"target_column": "target"
},
"training": {
"test_size": 0.2,
"random_state": 42
},
"output": {
"model_dir": "artifacts",
"save_formats": ["pickle", "joblib"]
}
}
Random Forest (configs/rf_config.json)
{
"model": {
"type": "random_forest",
"params": {
"n_estimators": 100,
"max_depth": null,
"min_samples_split": 2,
"min_samples_leaf": 1,
"random_state": 42
}
},
"dataset": {
"path": "data/train.csv",
"type": "csv",
"target_column": "target"
},
"training": {
"test_size": 0.2,
"random_state": 42
},
"output": {
"model_dir": "artifacts",
"save_formats": ["pickle", "joblib"]
}
}
XGBoost (configs/xgb_config.json)
{
"model": {
"type": "xgboost",
"params": {
"n_estimators": 100,
"max_depth": 6,
"learning_rate": 0.1,
"subsample": 0.8,
"colsample_bytree": 0.8,
"early_stopping_rounds": 10,
"random_state": 42
}
},
"dataset": {
"path": "data/train.csv",
"type": "csv",
"target_column": "target"
},
"training": {
"test_size": 0.2,
"random_state": 42
},
"output": {
"model_dir": "artifacts",
"save_formats": ["pickle", "joblib"]
}
}
SVM (configs/svm_config.json)
{
"model": {
"type": "svm",
"params": {
"kernel": "rbf",
"C": 1.0,
"gamma": "scale",
"probability": true
}
},
"dataset": {
"path": "data/train.csv",
"type": "csv",
"target_column": "target"
},
"training": {
"test_size": 0.2,
"random_state": 42
},
"output": {
"model_dir": "artifacts",
"save_formats": ["pickle", "joblib"]
}
}
TensorFlow DNN (configs/tf_dnn_config.json)
{
"model": {
"type": "tf_dnn",
"params": {
"layers": [128, 64, 32],
"activation": "relu",
"dropout": 0.3,
"optimizer": "adam",
"learning_rate": 0.001,
"epochs": 20,
"batch_size": 32,
"early_stopping": true,
"patience": 5
}
},
"dataset": {
"path": "data/train.csv",
"type": "csv",
"target_column": "target"
},
"training": {
"test_size": 0.2,
"random_state": 42
},
"output": {
"model_dir": "artifacts",
"save_formats": ["h5", "savedmodel"]
}
}
🔧 Hyperparameter Tuning Configuration
Tuning configurations include a tuning.param_space section that defines the search space.
Grid Search Configuration
For grid search, use lists of discrete values:
{
"model": {
"type": "random_forest",
"params": {}
},
"dataset": {
"path": "data/train.csv",
"type": "csv",
"target_column": "target"
},
"training": {
"test_size": 0.2,
"random_state": 42
},
"tuning": {
"param_space": {
"n_estimators": [50, 100, 200, 300],
"max_depth": [5, 10, 15, 20, null],
"min_samples_split": [2, 5, 10],
"min_samples_leaf": [1, 2, 4],
"max_features": ["sqrt", "log2"]
}
},
"output": {
"model_dir": "artifacts",
"save_formats": ["pickle", "joblib"]
}
}
Random/Bayesian Search Configuration
For random and Bayesian search, use distribution specifications:
{
"model": {
"type": "xgboost",
"params": {}
},
"dataset": {
"path": "data/train.csv",
"type": "csv",
"target_column": "target"
},
"training": {
"test_size": 0.2,
"random_state": 42
},
"tuning": {
"param_space": {
"n_estimators": {"type": "int", "low": 50, "high": 500},
"max_depth": {"type": "int", "low": 3, "high": 15},
"learning_rate": {"type": "loguniform", "low": 0.01, "high": 0.3},
"subsample": {"type": "uniform", "low": 0.6, "high": 1.0},
"colsample_bytree": {"type": "uniform", "low": 0.6, "high": 1.0},
"min_child_weight": {"type": "int", "low": 1, "high": 10}
}
},
"output": {
"model_dir": "artifacts",
"save_formats": ["pickle", "joblib"]
}
}
Parameter Distribution Types
| Type | Description | Example |
|---|---|---|
list/tuple |
Discrete choices | [50, 100, 200] |
int |
Integer range | {"type": "int", "low": 1, "high": 100} |
uniform |
Uniform float | {"type": "uniform", "low": 0.0, "high": 1.0} |
loguniform |
Log-uniform | {"type": "loguniform", "low": 0.001, "high": 1.0} |
categorical |
Choice | {"type": "categorical", "choices": ["a", "b"]} |
🏨 Real-World Example: Hotel Booking Cancellation Prediction
Step 1: Prepare Your Data
Place your CSV file in the data/ directory:
data/hotel_bookings.csv
Step 2: Preprocess Data (if needed)
Create a preprocessing script scripts/preprocess_data.py:
import pandas as pd
from sklearn.preprocessing import LabelEncoder
# Load data
df = pd.read_csv('data/hotel_bookings.csv')
# Handle missing values
df = df.fillna(0)
# Encode categorical columns
label_encoders = {}
for col in df.select_dtypes(include=['object']).columns:
if col != 'target_column':
le = LabelEncoder()
df[col] = le.fit_transform(df[col].astype(str))
label_encoders[col] = le
# Save processed data
df.to_csv('data/hotel_bookings_processed.csv', index=False)
print("Preprocessing complete!")
Run preprocessing:
python scripts/preprocess_data.py
Step 3: Create Configuration Files
Create configs/hotel_rf_config.json:
{
"model": {
"type": "random_forest",
"params": {
"n_estimators": 100,
"max_depth": null,
"random_state": 42
}
},
"dataset": {
"path": "data/hotel_bookings_processed.csv",
"type": "csv",
"target_column": "is_canceled"
},
"training": {
"test_size": 0.2,
"random_state": 42
},
"output": {
"model_dir": "artifacts",
"save_formats": ["pickle", "joblib"]
}
}
Step 4: Train the Model
mlcli train --config configs/hotel_rf_config.json
Step 5: View Results
mlcli list-runs
Step 6: Train Multiple Models for Comparison
# Train Logistic Regression
mlcli train --config configs/hotel_logistic_config.json
# Train Random Forest
mlcli train --config configs/hotel_rf_config.json
# Train XGBoost
mlcli train --config configs/hotel_xgb_config.json
# Train TensorFlow DNN
mlcli train --config configs/hotel_dnn_config.json
Step 7: Export Results
mlcli export-runs --output hotel_experiments.csv
📊 Model Comparison Results (Hotel Booking Dataset)
| Model | Accuracy | Precision | Recall | F1-Score | AUC-ROC | Training Time |
|---|---|---|---|---|---|---|
| Random Forest 🏆 | 83.18% | 83.80% | 83.18% | 82.51% | 90.90% | 4.2s |
| XGBoost | 82.88% | 83.31% | 82.88% | 82.27% | 90.45% | 1.1s |
| Logistic Regression | 79.90% | 81.03% | 79.90% | 78.68% | 85.20% | 2.8s |
| TF DNN | 62.43% | 38.97% | 62.43% | 47.99% | 50.00% | 43.1s |
Note: Neural networks require feature standardization for optimal performance.
🧩 Extending mlcli
Adding a New Trainer
- Create a new file in
mlcli/trainers/:
from mlcli.trainers.base_trainer import BaseTrainer
from mlcli.utils.registry import register_model
@register_model(
name="my_custom_model",
description="My Custom Model Trainer",
framework="custom",
model_type="classification"
)
class MyCustomTrainer(BaseTrainer):
def train(self, X_train, y_train, X_val=None, y_val=None):
# Implementation
pass
def evaluate(self, X_test, y_test):
# Implementation
pass
def predict(self, X):
# Implementation
pass
@classmethod
def get_default_params(cls):
return {"param1": "value1"}
- Import in
mlcli/trainers/__init__.py:
from mlcli.trainers.my_custom_trainer import MyCustomTrainer
The model will be automatically registered and available via CLI!
🔧 Troubleshooting
Common Issues
1. "mlcli: command not found"
Solution: Make sure the virtual environment is activated and mlcli is installed:
.\.venv\Scripts\Activate.ps1
pip install -e .
2. "ModuleNotFoundError: No module named 'mlcli'"
Solution: Install in development mode:
pip install -e .
3. "FileNotFoundError: data/train.csv"
Solution: Ensure your data file exists at the specified path in the config file.
4. TensorFlow DNN Poor Performance
Solution: Neural networks need standardized features. Add StandardScaler preprocessing:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)
5. ONNX Export Errors
Solution: Install skl2onnx:
pip install skl2onnx
6. Optuna Not Found
Solution: Install optuna for Bayesian optimization:
pip install optuna
7. SHAP/LIME Not Found
Solution: Install SHAP and LIME for model explainability:
pip install shap lime matplotlib
8. SHAP TreeExplainer Error
Solution: For non-tree models, SHAP will automatically fall back to KernelExplainer. This is expected behavior.
📚 Quick Reference
| Task | Command |
|---|---|
| Install mlcli | pip install -e . |
| Show help | mlcli --help |
| List models | mlcli list-models |
| List tuners | mlcli list-tuners |
| List explainers | mlcli list-explainers |
| List preprocessors | mlcli list-preprocessors |
| Train model | mlcli train --config <config.json> |
| Tune hyperparameters | mlcli tune --config <config.json> --method random |
| Tune with Bayesian | mlcli tune -c <config> -m bayesian -n 100 |
| Tune and train best | mlcli tune -c <config> -m random --train-best |
| Explain model (SHAP) | mlcli explain -m <model.pkl> -d <data.csv> -t <type> -e shap |
| Explain model (LIME) | mlcli explain -m <model.pkl> -d <data.csv> -t <type> -e lime |
| Explain instance | mlcli explain-instance -m <model.pkl> -d <data.csv> -t <type> -i <idx> |
| Preprocess data | mlcli preprocess -d <data.csv> -o <output.csv> -m standard_scaler |
| Feature selection | mlcli preprocess -d <data.csv> -o <output.csv> -m select_k_best -t label --k 10 |
| Preprocessing pipeline | mlcli preprocess-pipeline -d <data.csv> -o <output.csv> -s "standard_scaler,select_k_best" |
| Evaluate model | mlcli eval --model-path <path> --data-path <path> --model-type <type> |
| List runs | mlcli list-runs |
| Show run details | mlcli show-run <run-id> |
| Export runs | mlcli export-runs --output <file.csv> |
| Launch UI | mlcli ui |
🤝 Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
📄 License
This project is licensed under the MIT License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mlcli_toolkit-0.2.0.tar.gz.
File metadata
- Download URL: mlcli_toolkit-0.2.0.tar.gz
- Upload date:
- Size: 99.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
67d255df4444bd7b1933703dc34935e3ad42cf4f4d2c78c13865f9aa62dd383b
|
|
| MD5 |
2dccfac01335495e676a4086576790a8
|
|
| BLAKE2b-256 |
f6048677f8f29db1b1b529d0f5fbb1b6ebd49ce98905802974572c280b2b9100
|
File details
Details for the file mlcli_toolkit-0.2.0-py3-none-any.whl.
File metadata
- Download URL: mlcli_toolkit-0.2.0-py3-none-any.whl
- Upload date:
- Size: 116.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2413051c6b098b944cdd5a848d93d6e66ec2aa1815790d5b50bd24debcbe58ab
|
|
| MD5 |
c94556764a96544eedc260f21649744e
|
|
| BLAKE2b-256 |
930b3d907847ddb841ac914b0f68b3c34ee3563c57386bb531eccb3e1263b141
|