Skip to main content

INformation-Theoretic model selection, multimodel inference, Machine Learning algorithms.

Project description

douroucoulis douroucoulis is a fun and practical library designed to help with model selection and building predictive models using the latest machine learning algorithms. It follows an Information-Theoretic framework, mainly focused on AIC-based model selection, and includes functionality for a variety of data science tasks like data exploration, model evaluation, and hyperparameter optimization.

This library is specifically helpful when you need to perform multi-model inference and use an ensemble of best-ranked models for more accurate estimates. It provides a complete pipeline from data cleaning to model selection and evaluation.

Features General Workflow Data Cleaning – Impute missing values, explore relationships between variables, and clean the dataset. Data Exploration – Visualize data and identify important relationships using heatmaps. Model Selection – Use AIC-based model selection to compare multiple models and find the best-fitting ones. Cross-validation – Assess model performance using cross-validation and tune hyperparameters. Model Averaging – Calculate model-averaged estimates for better predictive performance. Key Functions douroucoulis.instructions() Produces step-by-step instructions to guide you through the entire modeling process.

douroucoulis.test_dataset(n_samples, n_features, n_informative, random_state, regression) Generates a test dataset for exercises. Set regression=True for regression tasks, otherwise, it creates a classification dataset.

douroucoulis.check_data(data) Checks the dataset for missing values and provides feedback.

douroucoulis.impute_data(strategy) Imputes missing data using SimpleImputer. Choose a strategy like 'mean', 'median', or 'most_frequent' for categorical data.

douroucoulis.explore(data, cmap) Visualizes the relationships between explanatory variables (features) and the outcome variable (target) using a heatmap. The cmap argument allows you to specify color maps like 'rainbow', 'seismic', etc.

douroucoulis.aictable(model_set, model_names) Ranks a list of models based on their AIC values. Takes a list of models (model_set) and corresponding names (model_names).

douroucoulis.best_fit() Returns the name and statistics for the single best-fit model (AIC weight > 0.90). For multi-model inference, use douroucoulis.best_ranked().

douroucoulis.best_ranked() Returns the best-ranked models with cumulative AIC weight > 0.95. You can then use douroucoulis.mod_avg() for model-averaged parameter estimates.

douroucoulis.mod_avg() Computes model-averaged estimates for each parameter in the best-ranked models.

douroucoulis.cross_val(X, y, classification) Evaluates a model’s accuracy using cross-validation. Set classification=True for classification tasks.

douroucoulis.hyper(model) Tunes the hyperparameters of the provided model using GridSearchCV for the most accurate fit.

douroucoulis.best_predictions(new_data) Uses the best-fit and most hyperparameterized model to make predictions on a new dataset (new_data).

Fun Sound Functions for Debugging douroucoulis.tonalhoot(reps) Emits a tonal hoot, repeated for the specified number of times (reps). Useful for tracking model fitting progress and debugging.

douroucoulis.gruffhoot(reps) Emits a gruff hoot, repeated for the specified number of times (reps). Use for debugging and tracking model fitting.

douroucoulis.rwhoop(reps) Emits a resonant whoop, repeated for the specified number of times (reps). Also useful for debugging and progress tracking.

Installation You can install the library using pip: pip install douroucoulis

Example Usage Here's an example of how to use the douroucoulis library to perform model selection:

import douroucoulis as do

Generate test dataset

data = dc.test_dataset(n_samples=100, n_features=5, n_informative=3, random_state=42, regression=True)

Check for missing data

do.check_data(data)

Impute missing data

do.impute_data(strategy='mean')

Explore data relationships

do.explore(data, cmap='seismic')

Create a model set and rank them using AIC

model_set = [LinearRegression(), RidgeCV()] model_names = ['Linear Regression', 'RidgeCV'] aic_table = do.aictable(model_set, model_names)

Get the best-ranked models

best_models = do.best_ranked()

Model-averaged parameter estimates

averaged_params = do.mod_avg()

Perform cross-validation on the best model

X = data.drop(columns='target') y = data['target'] do.cross_val(X, y, classification=False)

License This project is licensed under the MIT License – see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

douroucoulis-0.1.1.tar.gz (3.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

douroucoulis-0.1.1-py3-none-any.whl (3.1 kB view details)

Uploaded Python 3

File details

Details for the file douroucoulis-0.1.1.tar.gz.

File metadata

  • Download URL: douroucoulis-0.1.1.tar.gz
  • Upload date:
  • Size: 3.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.5

File hashes

Hashes for douroucoulis-0.1.1.tar.gz
Algorithm Hash digest
SHA256 8cd1351029a72cf58553ad25ca43014b62ec6f52dfebeabe1b33151210cf9b73
MD5 bd20c41c31591e16db14ae17e2e0e047
BLAKE2b-256 27d550d0c20bd29e37c95e575264f6d3195dd0b1526d947d01398e6830121dec

See more details on using hashes here.

File details

Details for the file douroucoulis-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: douroucoulis-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 3.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.5

File hashes

Hashes for douroucoulis-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 a600d235a4802e34f6b4536bf281d979b36264412651c8fc3c453a96d42bdb41
MD5 172a8a9d9c90cc4303b7da8cceee3b98
BLAKE2b-256 836615dfae9d22f529bc1a2faedcd92a7e4562ffb7f113ddd99cb721a0e27830

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page