Skip to main content

An advanced machine learning library for model training and selection

Project description

SmartPredict

PyPI version Build Status License: MIT

SmartPredict is an advanced machine learning library designed to simplify model training, evaluation, and selection. It provides a comprehensive set of tools for classification and regression tasks, including automated hyperparameter tuning, feature engineering, ensemble methods, and model explainability.

Table of Contents

Installation

You can install SmartPredict using pip:

pip install smartpredict

Features

  • Unified API for ML Models: Provides a consistent interface for both classification and regression tasks
  • Automated Feature Engineering: Handles missing values, scaling, encoding, feature interactions, and selection
  • Robust Ensemble Methods: Supports voting, averaging, weighted combining, and stacking approaches
  • Hyperparameter Tuning: Uses Optuna for efficient reproducible hyperparameter optimization
  • Model Explainability: Provides SHAP-based explanations and feature importance analysis
  • Comprehensive Error Handling: Gracefully handles common errors during model training and evaluation

Quick Start

Here's a quick example to get you started:

from smartpredict import SmartClassifier
from sklearn.datasets import load_breast_cancer
from sklearn.model_selection import train_test_split

# Load and split data
data = load_breast_cancer()
X = data.data
y = data.target

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create classifier and fit models 
try:
    clf = SmartClassifier(
        models=['Random Forest', 'Logistic Regression'], 
        verbose=1
    )
    results = clf.fit(X_train, X_test, y_train, y_test)

    # Display model performance results
    print(results)

    # Make predictions with the best model
    predictions = clf.predict(X_test)
except ValueError as e:
    print(f"Error: {e}")
    print("Please check the model names. Available classification models:")
    print("'Logistic Regression', 'Random Forest', 'Gradient Boosting', 'AdaBoost',")
    print("'Decision Tree', 'Support Vector Machine', 'K-Nearest Neighbors',")
    print("'Gaussian Naive Bayes', 'Neural Network', 'XGBoost', 'LightGBM', 'CatBoost'")

Usage

Classification

from smartpredict import SmartClassifier

# Create classifier with correct model names
try:
    clf = SmartClassifier(
        models=['Random Forest', 'Logistic Regression', 'Support Vector Machine'],
        # Pass custom parameters for each model
        Random_Forest={'n_estimators': 200, 'max_depth': 10},
        Logistic_Regression={'C': 0.1, 'max_iter': 200},
        verbose=1
    )

    # Fit and evaluate all models
    results = clf.fit(X_train, X_test, y_train, y_test)

    # The best model is automatically selected for predictions
    predictions = clf.predict(new_data)
except ValueError as e:
    print(f"Error: {e}")
    # List available classification models
    print("Available classification models:")
    print("'Logistic Regression', 'Random Forest', 'Gradient Boosting', 'AdaBoost',")
    print("'Decision Tree', 'Support Vector Machine', 'K-Nearest Neighbors',")
    print("'Gaussian Naive Bayes', 'Neural Network', 'XGBoost', 'LightGBM', 'CatBoost'")

Regression

from smartpredict import SmartRegressor

# Create regressor with correct model names
try:
    reg = SmartRegressor(
        models=['Random Forest', 'Linear Regression', 'Support Vector Machine'],
        # Pass custom parameters for a specific model
        Random_Forest={'n_estimators': 200, 'max_depth': 15},
        verbose=1
    )

    # Fit and evaluate all models
    results = reg.fit(X_train, X_test, y_train, y_test)

    # The best model is automatically selected for predictions
    predictions = reg.predict(new_data)
except ValueError as e:
    print(f"Error: {e}")
    # List available regression models
    print("Available regression models:")
    print("'Linear Regression', 'Ridge Regression', 'Lasso Regression', 'Random Forest',")
    print("'Gradient Boosting', 'AdaBoost', 'Decision Tree', 'Support Vector Machine',")
    print("'K-Nearest Neighbors', 'Neural Network', 'XGBoost', 'LightGBM', 'CatBoost'")

Available Models

Classification Models

  • 'Logistic Regression'
  • 'Random Forest'
  • 'Gradient Boosting'
  • 'AdaBoost'
  • 'Decision Tree'
  • 'Support Vector Machine'
  • 'K-Nearest Neighbors'
  • 'Gaussian Naive Bayes'
  • 'Neural Network'
  • 'XGBoost'
  • 'LightGBM'
  • 'CatBoost'

Regression Models

  • 'Linear Regression'
  • 'Ridge Regression'
  • 'Lasso Regression'
  • 'Random Forest'
  • 'Gradient Boosting'
  • 'AdaBoost'
  • 'Decision Tree'
  • 'Support Vector Machine'
  • 'K-Nearest Neighbors'
  • 'Neural Network'
  • 'XGBoost'
  • 'LightGBM'
  • 'CatBoost'

Advanced Features

Feature Engineering

from smartpredict.feature_engineering import FeatureEngineer

# Create feature engineer
fe = FeatureEngineer(
    scaler='standard',
    encoder='onehot',
    handle_missing='mean',
    create_interactions=True,
    feature_selection=5  # Keep top 5 features
)

# Fit and transform data
X_transformed = fe.fit_transform(X_train)
X_test_transformed = fe.transform(X_test)

Ensemble Methods

from smartpredict.ensemble_methods import EnsembleModel
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression

# Create base models
models = [
    ('rf', RandomForestClassifier(n_estimators=100)),
    ('lr', LogisticRegression())
]

# Create ensemble with voting method
ensemble = EnsembleModel(
    models=models,
    method='voting'  # 'voting', 'averaging', 'weighted', or 'stacking'
)

# Fit ensemble
ensemble.fit(X_train, y_train)

# Make predictions
predictions = ensemble.predict(X_test)

Hyperparameter Tuning

from smartpredict.hyperparameter_tuning import tune_hyperparameters
from sklearn.ensemble import RandomForestClassifier

# Create base model
model = RandomForestClassifier()

# Define parameter distributions to search
param_dist = {
    'n_estimators': (50, 300),
    'max_depth': (3, 15),
    'min_samples_split': (2, 10)
}

# Tune hyperparameters
best_model = tune_hyperparameters(
    model=model,
    param_distributions=param_dist,
    X=X_train,
    y=y_train,
    n_trials=100,
    scoring='f1',
    random_state=42
)

# Use the optimized model
predictions = best_model.predict(X_test)

Explainability

from smartpredict.explainability import ModelExplainer

# Create explainer
explainer = ModelExplainer(
    model=trained_model,
    feature_names=feature_names
)

# Set training data (needed for some explanation methods)
explainer.set_training_data(X_train, y_train)

# Get feature importance
importance_df = explainer.get_feature_importance()
print(importance_df)

# Explain a prediction
explanation = explainer.explain_prediction(X_test[0])
print(explanation)

Contributing

We welcome contributions! Please feel free to submit a Pull Request.

License

SmartPredict is licensed under the MIT License. See the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smartpredict-0.1.2.tar.gz (22.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

smartpredict-0.1.2-py3-none-any.whl (24.2 kB view details)

Uploaded Python 3

File details

Details for the file smartpredict-0.1.2.tar.gz.

File metadata

  • Download URL: smartpredict-0.1.2.tar.gz
  • Upload date:
  • Size: 22.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for smartpredict-0.1.2.tar.gz
Algorithm Hash digest
SHA256 1e9c10f3fb25ecc0599071730b280595bface464a68cade8dc91ca9ff36132d1
MD5 850656f23e7a67c29a9fe495d7156151
BLAKE2b-256 ba6f746afcd0b446e3f1e13ea2113c0a89c6b6e6155442d70dc163634de47153

See more details on using hashes here.

File details

Details for the file smartpredict-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: smartpredict-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 24.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.9.21

File hashes

Hashes for smartpredict-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 483f987250d5040b6f9f4e5c261d6396d0f900ee5ac74828e7edde3596474371
MD5 a743a3bf52a650030aaa4b85b4c42a63
BLAKE2b-256 164fc875514a1357da3bd063b4d7420bf0980ba294410bdbbbaec8a06c4ddfdc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page