An advanced machine learning library for model training and selection
Project description
SmartPredict
SmartPredict is an advanced machine learning library designed to simplify model training, evaluation, and selection. It provides a comprehensive set of tools for classification and regression tasks, including automated hyperparameter tuning, feature engineering, ensemble methods, and model explainability.
Table of Contents
Installation
You can install SmartPredict using pip:
pip install smartpredict
Features
- Unified API for ML Models: Provides a consistent interface for both classification and regression tasks
- Automated Feature Engineering: Handles missing values, scaling, encoding, feature interactions, and selection
- Robust Ensemble Methods: Supports voting, averaging, weighted combining, and stacking approaches
- Hyperparameter Tuning: Uses Optuna for efficient reproducible hyperparameter optimization
- Model Explainability: Provides SHAP-based explanations and feature importance analysis
- Comprehensive Error Handling: Gracefully handles common errors during model training and evaluation
Quick Start
Here's a quick example to get you started:
from smartpredict import SmartClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split
# Load and split data
data = load_breast_cancer()
X = data.data
y = data.target
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Create classifier and fit models
try:
clf = SmartClassifier(
models=['Random Forest', 'Logistic Regression'],
verbose=1
)
results = clf.fit(X_train, X_test, y_train, y_test)
# Display model performance results
print(results)
# Make predictions with the best model
predictions = clf.predict(X_test)
except ValueError as e:
print(f"Error: {e}")
print("Please check the model names. Available classification models:")
print("'Logistic Regression', 'Random Forest', 'Gradient Boosting', 'AdaBoost',")
print("'Decision Tree', 'Support Vector Machine', 'K-Nearest Neighbors',")
print("'Gaussian Naive Bayes', 'Neural Network', 'XGBoost', 'LightGBM', 'CatBoost'")
Usage
Classification
from smartpredict import SmartClassifier
# Create classifier with correct model names
try:
clf = SmartClassifier(
models=['Random Forest', 'Logistic Regression', 'Support Vector Machine'],
# Pass custom parameters for each model
Random_Forest={'n_estimators': 200, 'max_depth': 10},
Logistic_Regression={'C': 0.1, 'max_iter': 200},
verbose=1
)
# Fit and evaluate all models
results = clf.fit(X_train, X_test, y_train, y_test)
# The best model is automatically selected for predictions
predictions = clf.predict(new_data)
except ValueError as e:
print(f"Error: {e}")
# List available classification models
print("Available classification models:")
print("'Logistic Regression', 'Random Forest', 'Gradient Boosting', 'AdaBoost',")
print("'Decision Tree', 'Support Vector Machine', 'K-Nearest Neighbors',")
print("'Gaussian Naive Bayes', 'Neural Network', 'XGBoost', 'LightGBM', 'CatBoost'")
Regression
from smartpredict import SmartRegressor
# Create regressor with correct model names
try:
reg = SmartRegressor(
models=['Random Forest', 'Linear Regression', 'Support Vector Machine'],
# Pass custom parameters for a specific model
Random_Forest={'n_estimators': 200, 'max_depth': 15},
verbose=1
)
# Fit and evaluate all models
results = reg.fit(X_train, X_test, y_train, y_test)
# The best model is automatically selected for predictions
predictions = reg.predict(new_data)
except ValueError as e:
print(f"Error: {e}")
# List available regression models
print("Available regression models:")
print("'Linear Regression', 'Ridge Regression', 'Lasso Regression', 'Random Forest',")
print("'Gradient Boosting', 'AdaBoost', 'Decision Tree', 'Support Vector Machine',")
print("'K-Nearest Neighbors', 'Neural Network', 'XGBoost', 'LightGBM', 'CatBoost'")
Available Models
Classification Models
- 'Logistic Regression'
- 'Random Forest'
- 'Gradient Boosting'
- 'AdaBoost'
- 'Decision Tree'
- 'Support Vector Machine'
- 'K-Nearest Neighbors'
- 'Gaussian Naive Bayes'
- 'Neural Network'
- 'XGBoost'
- 'LightGBM'
- 'CatBoost'
Regression Models
- 'Linear Regression'
- 'Ridge Regression'
- 'Lasso Regression'
- 'Random Forest'
- 'Gradient Boosting'
- 'AdaBoost'
- 'Decision Tree'
- 'Support Vector Machine'
- 'K-Nearest Neighbors'
- 'Neural Network'
- 'XGBoost'
- 'LightGBM'
- 'CatBoost'
Advanced Features
Feature Engineering
from smartpredict.feature_engineering import FeatureEngineer
# Create feature engineer
fe = FeatureEngineer(
scaler='standard',
encoder='onehot',
handle_missing='mean',
create_interactions=True,
feature_selection=5 # Keep top 5 features
)
# Fit and transform data
X_transformed = fe.fit_transform(X_train)
X_test_transformed = fe.transform(X_test)
Ensemble Methods
from smartpredict.ensemble_methods import EnsembleModel
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
# Create base models
models = [
('rf', RandomForestClassifier(n_estimators=100)),
('lr', LogisticRegression())
]
# Create ensemble with voting method
ensemble = EnsembleModel(
models=models,
method='voting' # 'voting', 'averaging', 'weighted', or 'stacking'
)
# Fit ensemble
ensemble.fit(X_train, y_train)
# Make predictions
predictions = ensemble.predict(X_test)
Hyperparameter Tuning
from smartpredict.hyperparameter_tuning import tune_hyperparameters
from sklearn.ensemble import RandomForestClassifier
# Create base model
model = RandomForestClassifier()
# Define parameter distributions to search
param_dist = {
'n_estimators': (50, 300),
'max_depth': (3, 15),
'min_samples_split': (2, 10)
}
# Tune hyperparameters
best_model = tune_hyperparameters(
model=model,
param_distributions=param_dist,
X=X_train,
y=y_train,
n_trials=100,
scoring='f1',
random_state=42
)
# Use the optimized model
predictions = best_model.predict(X_test)
Explainability
from smartpredict.explainability import ModelExplainer
# Create explainer
explainer = ModelExplainer(
model=trained_model,
feature_names=feature_names
)
# Set training data (needed for some explanation methods)
explainer.set_training_data(X_train, y_train)
# Get feature importance
importance_df = explainer.get_feature_importance()
print(importance_df)
# Explain a prediction
explanation = explainer.explain_prediction(X_test[0])
print(explanation)
Contributing
We welcome contributions! Please feel free to submit a Pull Request.
License
SmartPredict is licensed under the MIT License. See the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file smartpredict-0.1.2.tar.gz.
File metadata
- Download URL: smartpredict-0.1.2.tar.gz
- Upload date:
- Size: 22.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1e9c10f3fb25ecc0599071730b280595bface464a68cade8dc91ca9ff36132d1
|
|
| MD5 |
850656f23e7a67c29a9fe495d7156151
|
|
| BLAKE2b-256 |
ba6f746afcd0b446e3f1e13ea2113c0a89c6b6e6155442d70dc163634de47153
|
File details
Details for the file smartpredict-0.1.2-py3-none-any.whl.
File metadata
- Download URL: smartpredict-0.1.2-py3-none-any.whl
- Upload date:
- Size: 24.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.21
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
483f987250d5040b6f9f4e5c261d6396d0f900ee5ac74828e7edde3596474371
|
|
| MD5 |
a743a3bf52a650030aaa4b85b4c42a63
|
|
| BLAKE2b-256 |
164fc875514a1357da3bd063b4d7420bf0980ba294410bdbbbaec8a06c4ddfdc
|