Transparent tabular classification workflows with model-selection evidence
Project description
MAMUT
Machine Automated Modelling and Utility Toolkit for tabular classification.
Overview
MAMUT is a Python toolkit for transparent classification workflows on tabular data. It bundles preprocessing, Optuna-driven hyperparameter optimization, model comparison, validation diagnostics, and reporting into a single workflow built on scikit-learn and XGBoost.
MAMUT is best used as a readable baseline and experiment report generator for beginners, small teams, and portfolio-scale projects. It is not positioned as a replacement for industrial AutoML systems such as AutoGluon, FLAML, or H2O AutoML; its value is in showing what was tried, how the result was validated, and whether simple baselines challenge the selected model.
Key Features
- End-to-end preprocessing: missing values, categorical encoding, skew correction, scaling, outlier filtering, imbalance handling (SMOTE/undersampling/SMOTETomek), optional feature selection, and PCA.
- Model search across common classifiers (LogisticRegression, RandomForestClassifier, SVC, XGBClassifier, MLPClassifier, GaussianNB, KNeighborsClassifier).
- Hyperparameter optimization with Optuna (TPE/Bayesian or random search).
- Validation-based model selection with optional final holdout evaluation.
- Evidence reporting: leakage checks, dummy/logistic/random-forest baselines, repeated stratified CV, and metric confidence intervals.
- Report generation via
evaluate()with metrics, plots, and SHAP explanations. - Configurable artifacts:
fit()keeps models in memory by default and saves fitted models only whensave_models=True. - Reproducible benchmark diagnostics via
scripts/benchmark_evidence.py.
Installation
Python 3.12 is the target runtime (see .python-version).
From PyPI:
pip install mamut
From source:
git clone https://github.com/przybytniowskaj/Mamut.git
cd Mamut
pip install -e .
For development with uv:
uv sync --all-groups
Quickstart
from sklearn.datasets import load_iris
from mamut import Mamut
X, y = load_iris(as_frame=True, return_X_y=True)
mamut = Mamut(n_iterations=1, optimization_method="random_search")
mamut.fit(X, y)
preds = mamut.predict(X)
proba = mamut.predict_proba(X)
Configuration Notes
- With preprocessing enabled (default), pass
Xas a pandasDataFrameandyas aSeries. - Targets must be categorical (float targets raise a
ValueError). fit()performs a stratified train/validation split controlled byvalidation_sizeandrandom_state.- Set
holdout_sizeor passX_holdout/y_holdoutto reserve final evaluation data that is not used for model or ensemble selection. - Select the optimization strategy with
optimization_method="bayes"or"random_search". - Control the search budget with
n_iterations. - Exclude models by class name (e.g.,
exclude_models=["SVC"]). - Preprocessing options are passed directly into
Mamut(...)(e.g.,pca=True,feature_selection=True,num_imputation="knn"). - Use
save_models=Trueto write fitted candidate pipelines under./fitted_models/<timestamp>/. score_metricexpects one of:accuracy,precision,recall,f1,balanced_accuracy,jaccard,roc_auc_score.- Configure evidence stability checks with
evidence_cv_splits,evidence_cv_repeats, andevidence_confidence_level.
Outputs and Reports
mamut.best_model_: validation-selected best performing pipeline afterfit.mamut.validation_summary_: per-model validation scores and timings.mamut.holdout_summary_: optional final holdout scores when holdout data is configured.mamut.evidence_report_: validation integrity, evidence-guided selection, leakage checks, baseline comparison, and score stability tables generated byevaluate()orgenerate_evidence().mamut.optuna_studies_: Optuna studies keyed by model name.mamut.evaluate(): writes an HTML report to./mamut_report/report_<timestamp>.htmland stores plots in./mamut_report/plots/. It uses holdout data automatically when available and includes evidence sections by default.mamut.save_best_model(path): writes the best model to an existing directory as a.joblibfile.
Development
uv sync --all-groups
uv run deptry .
scripts/audit_dependencies.sh
uv run pytest
uv run pre-commit run --all-files
uv run make -C docs html
uv run sphinx-build -W --keep-going -b html docs/source docs/build/html-strict
uv run python scripts/benchmark_evidence.py
uv build
uv run twine check dist/*
Documentation
- Documentation site: https://mamut.readthedocs.io/en/latest/
- Quickstart: https://mamut.readthedocs.io/en/latest/quickstart.html
- User guide: https://mamut.readthedocs.io/en/latest/user_guide.html
- Reports and artifacts: https://mamut.readthedocs.io/en/latest/reports.html
- Evidence benchmark: https://mamut.readthedocs.io/en/latest/benchmark_evidence.html
- API reference: https://mamut.readthedocs.io/en/latest/mamut.html
- Notebook walkthrough:
docs/source/notebooks/walkthrough.ipynb - Changelog:
CHANGELOG.md
License
MIT. See LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mamut-0.2.0.tar.gz.
File metadata
- Download URL: mamut-0.2.0.tar.gz
- Upload date:
- Size: 2.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Fedora Linux","version":"44","id":"","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c7a77e37209f2416f0822e2b24b9b793ec298f1d1ceb9d0c7ec1fe723cecfa0
|
|
| MD5 |
0b00e23f9429f62ca7370d2fb8964664
|
|
| BLAKE2b-256 |
df8b4fa8fd4d73ed9ac738d620d9abaf5e24731e980e28868fb8348f0e1d7532
|
File details
Details for the file mamut-0.2.0-py3-none-any.whl.
File metadata
- Download URL: mamut-0.2.0-py3-none-any.whl
- Upload date:
- Size: 369.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Fedora Linux","version":"44","id":"","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f45529e65a54f55a6b0d40b3b7d6a2ce6eb7e358e03460fa2b9722cb198010fe
|
|
| MD5 |
604a67217c1c04b3d93c68f14e653b78
|
|
| BLAKE2b-256 |
59745f19c7456164f10dd46b40a8a3446e7b1ec198062dc0fa4408360206bd69
|