Skip to main content

INformation-Theoretic model selection, multimodel inference, Machine Learning algorithms.

Project description

douroucoulis douroucoulis is a fun and practical library designed to help with model selection and building predictive models using the latest machine learning algorithms. It follows an Information-Theoretic framework, mainly focused on AIC-based model selection, and includes functionality for a variety of data science tasks like data exploration, model evaluation, and hyperparameter optimization.

This library is specifically helpful when you need to perform multi-model inference and use an ensemble of best-ranked models for more accurate estimates. It provides a complete pipeline from data cleaning to model selection and evaluation.

Features General Workflow Data Cleaning – Impute missing values, explore relationships between variables, and clean the dataset. Data Exploration – Visualize data and identify important relationships using heatmaps. Model Selection – Use AIC-based model selection to compare multiple models and find the best-fitting ones. Cross-validation – Assess model performance using cross-validation and tune hyperparameters. Model Averaging – Calculate model-averaged estimates for better predictive performance. Key Functions douroucoulis.instructions() Produces step-by-step instructions to guide you through the entire modeling process.

douroucoulis.test_dataset(n_samples, n_features, n_informative, random_state, regression) Generates a test dataset for exercises. Set regression=True for regression tasks, otherwise, it creates a classification dataset.

douroucoulis.check_data(data) Checks the dataset for missing values and provides feedback.

douroucoulis.impute_data(strategy) Imputes missing data using SimpleImputer. Choose a strategy like 'mean', 'median', or 'most_frequent' for categorical data.

douroucoulis.explore(data, cmap) Visualizes the relationships between explanatory variables (features) and the outcome variable (target) using a heatmap. The cmap argument allows you to specify color maps like 'rainbow', 'seismic', etc.

douroucoulis.aictable(model_set, model_names) Ranks a list of models based on their AIC values. Takes a list of models (model_set) and corresponding names (model_names).

douroucoulis.best_fit() Returns the name and statistics for the single best-fit model (AIC weight > 0.90). For multi-model inference, use douroucoulis.best_ranked().

douroucoulis.best_ranked() Returns the best-ranked models with cumulative AIC weight > 0.95. You can then use douroucoulis.mod_avg() for model-averaged parameter estimates.

douroucoulis.mod_avg() Computes model-averaged estimates for each parameter in the best-ranked models.

douroucoulis.cross_val(X, y, classification) Evaluates a model’s accuracy using cross-validation. Set classification=True for classification tasks.

douroucoulis.hyper(model) Tunes the hyperparameters of the provided model using GridSearchCV for the most accurate fit.

douroucoulis.best_predictions(new_data) Uses the best-fit and most hyperparameterized model to make predictions on a new dataset (new_data).

Fun Sound Functions for Debugging douroucoulis.tonalhoot(reps) Emits a tonal hoot, repeated for the specified number of times (reps). Useful for tracking model fitting progress and debugging.

douroucoulis.gruffhoot(reps) Emits a gruff hoot, repeated for the specified number of times (reps). Use for debugging and tracking model fitting.

douroucoulis.rwhoop(reps) Emits a resonant whoop, repeated for the specified number of times (reps). Also useful for debugging and progress tracking.

Installation You can install the library using pip: pip install douroucoulis

Example Usage Here's an example of how to use the douroucoulis library to perform model selection:

import douroucoulis as do

Generate test dataset

data = dc.test_dataset(n_samples=100, n_features=5, n_informative=3, random_state=42, regression=True)

Check for missing data

do.check_data(data)

Impute missing data

do.impute_data(strategy='mean')

Explore data relationships

do.explore(data, cmap='seismic')

Create a model set and rank them using AIC

model_set = [LinearRegression(), RidgeCV()] model_names = ['Linear Regression', 'RidgeCV'] aic_table = do.aictable(model_set, model_names)

Get the best-ranked models

best_models = do.best_ranked()

Model-averaged parameter estimates

averaged_params = do.mod_avg()

Perform cross-validation on the best model

X = data.drop(columns='target') y = data['target'] do.cross_val(X, y, classification=False)

License This project is licensed under the MIT License – see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

douroucoulis-0.1.3.tar.gz (14.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

douroucoulis-0.1.3-py3-none-any.whl (12.9 kB view details)

Uploaded Python 3

File details

Details for the file douroucoulis-0.1.3.tar.gz.

File metadata

  • Download URL: douroucoulis-0.1.3.tar.gz
  • Upload date:
  • Size: 14.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.5

File hashes

Hashes for douroucoulis-0.1.3.tar.gz
Algorithm Hash digest
SHA256 d875308380db6b1932569f549a3a7e71b2fa702ef95f12a4ff4da9d219dd81e1
MD5 a68edd8a0896b53c4c3d9c5b42d18a75
BLAKE2b-256 b7eb0c007826127f40c5be2ecc4fdb3db0ecabeb2baadd02db2b1436a632aba5

See more details on using hashes here.

File details

Details for the file douroucoulis-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: douroucoulis-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 12.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.5

File hashes

Hashes for douroucoulis-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 957ecbcde525b13504293ac4841839d4e8ea9dab1cd848454695d888ce5297a8
MD5 feb4d2e88908e758ffee5e9b8c441eae
BLAKE2b-256 1c3989a3f896afe5bb6e5e7110d700b025b00b7f7166a2ec6bcb74e36453a728

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page