Skip to main content

INformation-Theoretic model selection, multimodel inference, Machine Learning algorithms.

Project description

douroucoulis douroucoulis is a fun and practical library designed to help with model selection and building predictive models using the latest machine learning algorithms. It follows an Information-Theoretic framework, mainly focused on AIC-based model selection, and includes functionality for a variety of data science tasks like data exploration, model evaluation, and hyperparameter optimization.

This library is specifically helpful when you need to perform multi-model inference and use an ensemble of best-ranked models for more accurate estimates. It provides a complete pipeline from data cleaning to model selection and evaluation.

Features General Workflow Data Cleaning – Impute missing values, explore relationships between variables, and clean the dataset. Data Exploration – Visualize data and identify important relationships using heatmaps. Model Selection – Use AIC-based model selection to compare multiple models and find the best-fitting ones. Cross-validation – Assess model performance using cross-validation and tune hyperparameters. Model Averaging – Calculate model-averaged estimates for better predictive performance. Key Functions douroucoulis.instructions() Produces step-by-step instructions to guide you through the entire modeling process.

douroucoulis.test_dataset(n_samples, n_features, n_informative, random_state, regression) Generates a test dataset for exercises. Set regression=True for regression tasks, otherwise, it creates a classification dataset.

douroucoulis.check_data(data) Checks the dataset for missing values and provides feedback.

douroucoulis.impute_data(strategy) Imputes missing data using SimpleImputer. Choose a strategy like 'mean', 'median', or 'most_frequent' for categorical data.

douroucoulis.explore(data, cmap) Visualizes the relationships between explanatory variables (features) and the outcome variable (target) using a heatmap. The cmap argument allows you to specify color maps like 'rainbow', 'seismic', etc.

douroucoulis.aictable(model_set, model_names) Ranks a list of models based on their AIC values. Takes a list of models (model_set) and corresponding names (model_names).

douroucoulis.best_fit() Returns the name and statistics for the single best-fit model (AIC weight > 0.90). For multi-model inference, use douroucoulis.best_ranked().

douroucoulis.best_ranked() Returns the best-ranked models with cumulative AIC weight > 0.95. You can then use douroucoulis.mod_avg() for model-averaged parameter estimates.

douroucoulis.mod_avg() Computes model-averaged estimates for each parameter in the best-ranked models.

douroucoulis.cross_val(X, y, classification) Evaluates a model’s accuracy using cross-validation. Set classification=True for classification tasks.

douroucoulis.hyper(model) Tunes the hyperparameters of the provided model using GridSearchCV for the most accurate fit.

douroucoulis.best_predictions(new_data) Uses the best-fit and most hyperparameterized model to make predictions on a new dataset (new_data).

Fun Sound Functions for Debugging douroucoulis.tonalhoot(reps) Emits a tonal hoot, repeated for the specified number of times (reps). Useful for tracking model fitting progress and debugging.

douroucoulis.gruffhoot(reps) Emits a gruff hoot, repeated for the specified number of times (reps). Use for debugging and tracking model fitting.

douroucoulis.rwhoop(reps) Emits a resonant whoop, repeated for the specified number of times (reps). Also useful for debugging and progress tracking.

Installation You can install the library using pip: pip install douroucoulis

Example Usage Here's an example of how to use the douroucoulis library to perform model selection:

import douroucoulis as do

Generate test dataset

data = dc.test_dataset(n_samples=100, n_features=5, n_informative=3, random_state=42, regression=True)

Check for missing data

do.check_data(data)

Impute missing data

do.impute_data(strategy='mean')

Explore data relationships

do.explore(data, cmap='seismic')

Create a model set and rank them using AIC

model_set = [LinearRegression(), RidgeCV()] model_names = ['Linear Regression', 'RidgeCV'] aic_table = do.aictable(model_set, model_names)

Get the best-ranked models

best_models = do.best_ranked()

Model-averaged parameter estimates

averaged_params = do.mod_avg()

Perform cross-validation on the best model

X = data.drop(columns='target') y = data['target'] do.cross_val(X, y, classification=False)

License This project is licensed under the MIT License – see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

douroucoulis-0.1.0.tar.gz (3.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

douroucoulis-0.1.0-py3-none-any.whl (3.1 kB view details)

Uploaded Python 3

File details

Details for the file douroucoulis-0.1.0.tar.gz.

File metadata

  • Download URL: douroucoulis-0.1.0.tar.gz
  • Upload date:
  • Size: 3.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.5

File hashes

Hashes for douroucoulis-0.1.0.tar.gz
Algorithm Hash digest
SHA256 5c6e9c79936a6e78f0619965d94bb2eb8ef223b104527a03aa05f73c034b3ee5
MD5 bed51538cd8760cd962b67894f277749
BLAKE2b-256 09343e8541aa5c49bd07adb1fb5eb4c6daffd7b9e6fab9b275513a4c0023d7a9

See more details on using hashes here.

File details

Details for the file douroucoulis-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: douroucoulis-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 3.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.5

File hashes

Hashes for douroucoulis-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3889c1b8516531a55e7852440ab77671efc4d98942b5cf55e630b6be78f8c586
MD5 32aad1bbf7426cb0d969dfbb98860c6f
BLAKE2b-256 70df0747f903ca0c69e832aa9c377a71ea2549a5b53b02c809d8d221cb5f3641

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page