INformation-Theoretic model selection, multimodel inference, Machine Learning algorithms.
Project description
douroucoulis douroucoulis is a fun and practical library designed to help with model selection and building predictive models using the latest machine learning algorithms. It follows an Information-Theoretic framework, mainly focused on AIC-based model selection, and includes functionality for a variety of data science tasks like data exploration, model evaluation, and hyperparameter optimization.
This library is specifically helpful when you need to perform multi-model inference and use an ensemble of best-ranked models for more accurate estimates. It provides a complete pipeline from data cleaning to model selection and evaluation.
Features General Workflow Data Cleaning – Impute missing values, explore relationships between variables, and clean the dataset. Data Exploration – Visualize data and identify important relationships using heatmaps. Model Selection – Use AIC-based model selection to compare multiple models and find the best-fitting ones. Cross-validation – Assess model performance using cross-validation and tune hyperparameters. Model Averaging – Calculate model-averaged estimates for better predictive performance. Key Functions douroucoulis.instructions() Produces step-by-step instructions to guide you through the entire modeling process.
douroucoulis.test_dataset(n_samples, n_features, n_informative, random_state, regression) Generates a test dataset for exercises. Set regression=True for regression tasks, otherwise, it creates a classification dataset.
douroucoulis.check_data(data) Checks the dataset for missing values and provides feedback.
douroucoulis.impute_data(strategy) Imputes missing data using SimpleImputer. Choose a strategy like 'mean', 'median', or 'most_frequent' for categorical data.
douroucoulis.explore(data, cmap) Visualizes the relationships between explanatory variables (features) and the outcome variable (target) using a heatmap. The cmap argument allows you to specify color maps like 'rainbow', 'seismic', etc.
douroucoulis.aictable(model_set, model_names) Ranks a list of models based on their AIC values. Takes a list of models (model_set) and corresponding names (model_names).
douroucoulis.best_fit() Returns the name and statistics for the single best-fit model (AIC weight > 0.90). For multi-model inference, use douroucoulis.best_ranked().
douroucoulis.best_ranked() Returns the best-ranked models with cumulative AIC weight > 0.95. You can then use douroucoulis.mod_avg() for model-averaged parameter estimates.
douroucoulis.mod_avg() Computes model-averaged estimates for each parameter in the best-ranked models.
douroucoulis.cross_val(X, y, classification) Evaluates a model’s accuracy using cross-validation. Set classification=True for classification tasks.
douroucoulis.hyper(model) Tunes the hyperparameters of the provided model using GridSearchCV for the most accurate fit.
douroucoulis.best_predictions(new_data) Uses the best-fit and most hyperparameterized model to make predictions on a new dataset (new_data).
Fun Sound Functions for Debugging douroucoulis.tonalhoot(reps) Emits a tonal hoot, repeated for the specified number of times (reps). Useful for tracking model fitting progress and debugging.
douroucoulis.gruffhoot(reps) Emits a gruff hoot, repeated for the specified number of times (reps). Use for debugging and tracking model fitting.
douroucoulis.rwhoop(reps) Emits a resonant whoop, repeated for the specified number of times (reps). Also useful for debugging and progress tracking.
Installation You can install the library using pip: pip install douroucoulis
Example Usage Here's an example of how to use the douroucoulis library to perform model selection:
import douroucoulis as do
Generate test dataset
data = dc.test_dataset(n_samples=100, n_features=5, n_informative=3, random_state=42, regression=True)
Check for missing data
do.check_data(data)
Impute missing data
do.impute_data(strategy='mean')
Explore data relationships
do.explore(data, cmap='seismic')
Create a model set and rank them using AIC
model_set = [LinearRegression(), RidgeCV()] model_names = ['Linear Regression', 'RidgeCV'] aic_table = do.aictable(model_set, model_names)
Get the best-ranked models
best_models = do.best_ranked()
Model-averaged parameter estimates
averaged_params = do.mod_avg()
Perform cross-validation on the best model
X = data.drop(columns='target') y = data['target'] do.cross_val(X, y, classification=False)
License This project is licensed under the MIT License – see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file douroucoulis-0.1.3.tar.gz.
File metadata
- Download URL: douroucoulis-0.1.3.tar.gz
- Upload date:
- Size: 14.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d875308380db6b1932569f549a3a7e71b2fa702ef95f12a4ff4da9d219dd81e1
|
|
| MD5 |
a68edd8a0896b53c4c3d9c5b42d18a75
|
|
| BLAKE2b-256 |
b7eb0c007826127f40c5be2ecc4fdb3db0ecabeb2baadd02db2b1436a632aba5
|
File details
Details for the file douroucoulis-0.1.3-py3-none-any.whl.
File metadata
- Download URL: douroucoulis-0.1.3-py3-none-any.whl
- Upload date:
- Size: 12.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
957ecbcde525b13504293ac4841839d4e8ea9dab1cd848454695d888ce5297a8
|
|
| MD5 |
feb4d2e88908e758ffee5e9b8c441eae
|
|
| BLAKE2b-256 |
1c3989a3f896afe5bb6e5e7110d700b025b00b7f7166a2ec6bcb74e36453a728
|