Skip to main content

Library implementation of the Iljiceva model.

Project description

Iljicevs ML

Iljicevs ML is a Python library designed to simplify the process of selecting, training, and optimizing multiple machine learning models in an ensemble approach. It provides functionalities for dynamic model selection, hyperparameter tuning, feature importance, cross-validation with multiple metrics, and more.

Features

  • Model Hyperparameter Tuning: Automatically search for the best hyperparameters using GridSearchCV.
  • Dynamic Model Selection: Selects the best performing models based on cross-validation results.
  • Weighted Average Predictions: Combines predictions from multiple models by averaging their outputs based on accuracy weights.
  • Feature Importance Visualization: Displays the importance of features across ensemble models.
  • Class Balance Checking: Automatically checks class balance and suggests solutions for unbalanced datasets.
  • Cross-validation with Metrics: Supports evaluation with multiple metrics (Accuracy, F1, ROC AUC, etc.).
  • AutoML Support: Automatically selects models based on dataset characteristics.

Installation

To install the package, simply use pip (once it is published to PyPI):

pip install iljicevs_ml

Or, if you want to install directly from the source:

git clone https://github.com/yourusername/iljicevs_ml.git
cd iljicevs_ml
python setup.py install

Usage

Basic Example IljicevsModel

Here's an example of how you can use iljicevs_ml to select models, tune their hyperparameters, and evaluate their performance.

from iljicevs_ml import IljicevsModel
from sklearn.ensemble import RandomForestClassifier, GradientBoostingClassifier
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_classification
from sklearn.metrics import accuracy_score

# Create synthetic dataset for classification
X, y = make_classification(n_samples=1000, n_features=20, n_classes=2, random_state=42, class_sep=1.5)

# Split into training and test sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Define models and parameters for hyperparameter tuning
models = {
    'RandomForest': RandomForestClassifier(random_state=42),
    'GradientBoosting': GradientBoostingClassifier(random_state=42)
}

param_grids = {
    'RandomForest': {'n_estimators': [50, 100], 'max_depth': [3, 5, None]},
    'GradientBoosting': {'n_estimators': [50, 100], 'learning_rate': [0.01, 0.1]}
}

# Create an instance of IljicevsModel with the selected models and hyperparameter grids
iljicevs_model = IljicevsModel(models=models, param_grids=param_grids, search_method="grid")

# Check and balance class distribution using SMOTE (if necessary)
X_train_bal, y_train_bal = iljicevs_model.check_class_balance(X_train, y_train)

# Tune hyperparameters using GridSearchCV
iljicevs_model.tune_hyperparameters(X_train_bal, y_train_bal)

# Select the top two models based on cross-validation results
iljicevs_model.select_best_models(X_train_bal, y_train_bal, top_n=2)

# Train the selected models
iljicevs_model.fit(X_train_bal, y_train_bal)

# Evaluate accuracy on the test set
accuracy = iljicevs_model.score(X_test, y_test)
print(f"Test set accuracy: {accuracy:.4f}")

# Visualize feature importance for the ensemble models
iljicevs_model.feature_importance()

# Plot Precision-Recall curve
iljicevs_model.plot_precision_recall_curve(X_test, y_test)

# Compute stability of the ensemble models
stability = iljicevs_model.stability_metric(X_test)
print(f"Ensemble stability: {stability:.4f}")

# Save and load the model
iljicevs_model.save_model(iljicevs_model.selected_models[0], 'best_model.pkl')
loaded_model = iljicevs_model.load_model('best_model.pkl')

# Evaluate the loaded model on the test set
loaded_accuracy = accuracy_score(y_test, loaded_model.predict(X_test))
print(f"Loaded model accuracy: {loaded_accuracy:.4f}")

# Example of using AutoML via TPOT (optional)
# automl_pipeline = iljicevs_model.run_automl(X_train_bal, y_train_bal)

Class Balance Checking

To avoid issues with unbalanced datasets, you can use the built-in method check_class_balance() to get a summary of the class distribution and suggestions for handling imbalances:

iljicevs.check_class_balance(X_train, y_train)

Feature Importance

To visualize the importance of features across models, use:

iljicevs.feature_importance()

Cross-Validation with Metrics

For more detailed model evaluation, you can use cross-validation with custom metrics:

iljicevs.cross_validate_with_custom_metrics(X_train, y_train, custom_metrics=['accuracy', 'f1', 'roc_auc'])

Basic Example IljicevsCausalModel

Below is an example of how to use the IljicevsCausalModel for causal analysis, visualization, and report generation.

# Import necessary modules
from iljicevs_ml import IljicevsCausalModel
from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
import numpy as np

# Data generation
X, y = make_classification(n_samples=1000, n_features=5, random_state=42)
treatment = np.random.binomial(n=1, p=0.5, size=len(y))

# Split the data into training and testing sets
X_train, X_test, y_train, y_test, treatment_train, treatment_test = train_test_split(
    X, y, treatment, test_size=0.3, random_state=42)

# Initialize and train the Causal Model
causal_model = IljicevsCausalModel(model_type='meta')
causal_model.fit(X_train, treatment_train, y_train)

# Directory for saving analysis results
output_dir = "causal_analysis"

# 1. Feature Importance Visualization
causal_model.feature_importance(output_dir)

# 2. Causal Effects Visualization
causal_model.plot_causal_effects(X_test, treatment_test, y_test, output_dir)

# 3. SHAP Values Visualization
causal_model.plot_shap_values(X_test, output_dir)

# 4. Counterfactual Analysis
delta_uplift = causal_model.counterfactual_analysis(X_test, treatment_test, feature_index=0, new_value=1.0)
print(f"Change in predictions when feature 0 is altered: {delta_uplift}")

# 5. Confidence Interval Estimation for Predictions
lower_bound, upper_bound = causal_model.estimate_confidence_intervals(X_test, treatment_test)
print(f"Confidence Interval: [{lower_bound}, {upper_bound}]")

# 6. Interaction Analysis Between Features
causal_model.analyze_interactions(X_test, treatment_test, output_dir)

# 7. Report Generation (Excel and Word)
excel_report, word_report = causal_model.generate_report(X_test, treatment_test, y_test, output_dir)

print(f"Excel report saved at: {excel_report}")
print(f"Word report saved at: {word_report}")

Key Features

  1. Feature Importance Visualization: Visualizes the importance of different features based on the causal model. The results are saved as images in the specified directory.
causal_model.feature_importance(output_dir)
  1. Causal Effects Visualization: Shows the causal effect of treatments on predictions.
causal_model.plot_causal_effects(X_test, treatment_test, y_test, output_dir)
  1. SHAP Values Visualization: Computes and visualizes SHAP values to explain model predictions.
causal_model.plot_shap_values(X_test, output_dir)
  1. Counterfactual Analysis: Allows you to see how the predictions change if one of the features is altered. This can help to understand the causal relationships.
delta_uplift = causal_model.counterfactual_analysis(X_test, treatment_test, feature_index=0, new_value=1.0)
print(f"Change in predictions when feature 0 is altered: {delta_uplift}")
  1. Confidence Interval Estimation: Estimates confidence intervals for model predictions using bootstrap methods.
lower_bound, upper_bound = causal_model.estimate_confidence_intervals(X_test, treatment_test)
print(f"Confidence Interval: [{lower_bound}, {upper_bound}]")
  1. Interaction Analysis: Analyzes and visualizes feature interactions using SHAP interaction values.
causal_model.analyze_interactions(X_test, treatment_test, output_dir)
  1. Report Generation: Generates a detailed report in both Excel and Word formats, containing the analysis results, feature importance, SHAP values, and visualizations.
excel_report, word_report = causal_model.generate_report(X_test, treatment_test, y_test, output_dir)
print(f"Excel report saved at: {excel_report}")
print(f"Word report saved at: {word_report}")

Report Generation

The generated reports contain:

  • Excel Report: Includes multiple sheets with feature importance, counterfactual analysis, confidence intervals, and inserted visualizations.
  • Word Report: Contains detailed text summaries, tables of results, and visualizations as embedded images.

Contributing

Contributions are welcome! Please feel free to submit a Pull Request or open an Issue if you find bugs or want to suggest new features.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/my-feature)
  3. Commit your changes (git commit -m 'Add some feature')
  4. Push to the branch (git push origin feature/my-feature)
  5. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

iljicevs_ml-0.2.5.tar.gz (14.2 kB view details)

Uploaded Source

Built Distribution

iljicevs_ml-0.2.5-py3-none-any.whl (15.8 kB view details)

Uploaded Python 3

File details

Details for the file iljicevs_ml-0.2.5.tar.gz.

File metadata

  • Download URL: iljicevs_ml-0.2.5.tar.gz
  • Upload date:
  • Size: 14.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.6

File hashes

Hashes for iljicevs_ml-0.2.5.tar.gz
Algorithm Hash digest
SHA256 4a576bd96680aff69f8fee33911d4b799374c94ed03dd9211b1ca22eb463434e
MD5 d85e7b828cd14f40bead2b0515f9e925
BLAKE2b-256 f8f8e9211eac7b820c24184619c2ca445c0b83019d11c88dbccb4123f7d09a7a

See more details on using hashes here.

File details

Details for the file iljicevs_ml-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: iljicevs_ml-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 15.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.8.6

File hashes

Hashes for iljicevs_ml-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 77363e38254dec300208d745b2898ff96e5f64207ff937a506bb30a21289992a
MD5 4cbbaec8392a241fccedc4aaa53508a2
BLAKE2b-256 76e1197755cd1b48f141b3032a40cc17de1279e32396dcb355be09acc6ae1452

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page