Machine Automated Modelling and Utility Toolkit
Project description
MAMUT
Machine Automated Modelling and Utility Toolkit for tabular classification.
Overview
MAMUT is a Python toolkit that automates model selection and evaluation for classification tasks on tabular data. It bundles preprocessing, Optuna-driven hyperparameter optimization, model comparison, and reporting into a single workflow built on scikit-learn and XGBoost.
Key Features
- End-to-end preprocessing: missing values, categorical encoding, skew correction, scaling, outlier filtering, imbalance handling (SMOTE/undersampling/SMOTETomek), optional feature selection, and PCA.
- Model search across common classifiers (LogisticRegression, RandomForestClassifier, SVC, XGBClassifier, MLPClassifier, GaussianNB, KNeighborsClassifier).
- Hyperparameter optimization with Optuna (TPE/Bayesian or random search).
- Report generation via
evaluate()with metrics, plots, and SHAP explanations. - Saved artifacts:
fit()stores fitted models;evaluate()writes an HTML report and plots to disk.
Installation
Python 3.12 is the target runtime (see .python-version).
From PyPI:
pip install mamut
From source:
pip install -e .
For development with Poetry:
poetry install
Quickstart
from sklearn.datasets import load_iris
from mamut.wrapper import Mamut
X, y = load_iris(as_frame=True, return_X_y=True)
mamut = Mamut(n_iterations=5, optimization_method="bayes")
mamut.fit(X, y)
preds = mamut.predict(X)
proba = mamut.predict_proba(X)
Configuration Notes
- With preprocessing enabled (default), pass
Xas a pandasDataFrameandyas aSeries. - Targets must be categorical (float targets raise a
ValueError). fit()performs a stratified 80/20 train/test split controlled byrandom_state.- Select the optimization strategy with
optimization_method="bayes"or"random_search". - Control the search budget with
n_iterations. - Exclude models by class name (e.g.,
exclude_models=["SVC"]). - Preprocessing options are passed directly into
Mamut(...)(e.g.,pca=True,feature_selection=True,num_imputation="knn"). score_metricexpects one of:accuracy,precision,recall,f1,balanced_accuracy,jaccard,roc_auc_score.
Outputs and Reports
mamut.best_model_: best performing pipeline afterfit.mamut.training_summary_: per-model scores and timings.mamut.optuna_studies_: Optuna studies keyed by model name.mamut.evaluate(): writes an HTML report to./mamut_report/report_<timestamp>.htmland stores plots in./mamut_report/plots/.mamut.save_best_model(path): writes the best model to an existing directory as a.joblibfile.fit()saves all fitted models to./fitted_models/<timestamp>/as.joblibfiles.
Development
poetry run pytest
poetry run pre-commit run --all-files
make -C docs html
Examples and Docs
- Notebooks:
walkthrough.ipynbanddocs/source/notebooks/walkthrough.ipynb. - Documentation site: https://mamut.readthedocs.io/en/latest/
License
MIT. See LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mamut-0.1.2.tar.gz.
File metadata
- Download URL: mamut-0.1.2.tar.gz
- Upload date:
- Size: 419.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.1 CPython/3.12.12 Linux/6.18.5-200.fc43.x86_64
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba53d367362ee0e79ffe425988231064e74667610a1d5913cd9ac1cf9342451c
|
|
| MD5 |
7fe4ad175fb55333995363093a3304e4
|
|
| BLAKE2b-256 |
f7fdbca8bf8b2e7fbd77d690cd1d8a235150145f5316e645a467dc13cbc6b758
|
File details
Details for the file mamut-0.1.2-py3-none-any.whl.
File metadata
- Download URL: mamut-0.1.2-py3-none-any.whl
- Upload date:
- Size: 418.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.1 CPython/3.12.12 Linux/6.18.5-200.fc43.x86_64
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5f252ec082525263005d9804fcf3d1d0c0c69aa49d690cca0181e4c8d0417d87
|
|
| MD5 |
b4cfdce76b2118ffadd0d48cc59a8119
|
|
| BLAKE2b-256 |
76efe946be7d8f8e3d8fb89a30cd5a0c2b5be9fd12f9c4e2d868badd93413db4
|