Skip to main content

Transparent tabular classification workflows with model-selection evidence

Project description

MAMUT Logo

MAMUT

Machine Automated Modelling and Utility Toolkit for tabular classification.

Documentation Status Test Pipeline Pre-commit Pipeline Security Audit License

Overview

MAMUT is a Python toolkit for transparent classification workflows on tabular data. It bundles preprocessing, Optuna-driven hyperparameter optimization, model comparison, validation diagnostics, and reporting into a single workflow built on scikit-learn and XGBoost.

MAMUT is best used as a readable baseline and experiment report generator for beginners, small teams, and portfolio-scale projects. It is not positioned as a replacement for industrial AutoML systems such as AutoGluon, FLAML, or H2O AutoML; its value is in showing what was tried, how the result was validated, and whether simple baselines challenge the selected model.

Key Features

  • End-to-end preprocessing: missing values, categorical encoding, skew correction, scaling, outlier filtering, imbalance handling (SMOTE/undersampling/SMOTETomek), optional feature selection, and PCA.
  • Model search across common classifiers (LogisticRegression, RandomForestClassifier, SVC, XGBClassifier, MLPClassifier, GaussianNB, KNeighborsClassifier).
  • Hyperparameter optimization with Optuna (TPE/Bayesian or random search).
  • Validation-based model selection with optional final holdout evaluation.
  • Evidence reporting: leakage checks, dummy/logistic/random-forest baselines, repeated stratified CV, and metric confidence intervals.
  • Report generation via evaluate() with metrics, plots, and SHAP explanations.
  • Configurable artifacts: fit() keeps models in memory by default and saves fitted models only when save_models=True.
  • Reproducible benchmark diagnostics via scripts/benchmark_evidence.py.

Installation

Python 3.12 is the target runtime (see .python-version).

From PyPI:

pip install mamut

From source:

git clone https://github.com/przybytniowskaj/Mamut.git
cd Mamut
pip install -e .

For development with uv:

uv sync --all-groups

Quickstart

from sklearn.datasets import load_iris
from mamut import Mamut

X, y = load_iris(as_frame=True, return_X_y=True)

mamut = Mamut(n_iterations=1, optimization_method="random_search")
mamut.fit(X, y)

preds = mamut.predict(X)
proba = mamut.predict_proba(X)

Configuration Notes

  • With preprocessing enabled (default), pass X as a pandas DataFrame and y as a Series.
  • Targets must be categorical (float targets raise a ValueError).
  • fit() performs a stratified train/validation split controlled by validation_size and random_state.
  • Set holdout_size or pass X_holdout/y_holdout to reserve final evaluation data that is not used for model or ensemble selection.
  • Select the optimization strategy with optimization_method="bayes" or "random_search".
  • Control the search budget with n_iterations.
  • Exclude models by class name (e.g., exclude_models=["SVC"]).
  • Preprocessing options are passed directly into Mamut(...) (e.g., pca=True, feature_selection=True, num_imputation="knn").
  • Use save_models=True to write fitted candidate pipelines under ./fitted_models/<timestamp>/.
  • score_metric expects one of: accuracy, precision, recall, f1, balanced_accuracy, jaccard, roc_auc_score.
  • Configure evidence stability checks with evidence_cv_splits, evidence_cv_repeats, and evidence_confidence_level.

Outputs and Reports

  • mamut.best_model_: validation-selected best performing pipeline after fit.
  • mamut.validation_summary_: per-model validation scores and timings.
  • mamut.holdout_summary_: optional final holdout scores when holdout data is configured.
  • mamut.evidence_report_: validation integrity, evidence-guided selection, leakage checks, baseline comparison, and score stability tables generated by evaluate() or generate_evidence().
  • mamut.optuna_studies_: Optuna studies keyed by model name.
  • mamut.evaluate(): writes an HTML report to ./mamut_report/report_<timestamp>.html and stores plots in ./mamut_report/plots/. It uses holdout data automatically when available and includes evidence sections by default.
  • mamut.save_best_model(path): writes the best model to an existing directory as a .joblib file.

Development

uv sync --all-groups
uv run deptry .
scripts/audit_dependencies.sh
uv run pytest
uv run pre-commit run --all-files
uv run make -C docs html
uv run sphinx-build -W --keep-going -b html docs/source docs/build/html-strict
uv run python scripts/benchmark_evidence.py
uv build
uv run twine check dist/*

Documentation

License

MIT. See LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mamut-0.2.0.tar.gz (2.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mamut-0.2.0-py3-none-any.whl (369.4 kB view details)

Uploaded Python 3

File details

Details for the file mamut-0.2.0.tar.gz.

File metadata

  • Download URL: mamut-0.2.0.tar.gz
  • Upload date:
  • Size: 2.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Fedora Linux","version":"44","id":"","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mamut-0.2.0.tar.gz
Algorithm Hash digest
SHA256 3c7a77e37209f2416f0822e2b24b9b793ec298f1d1ceb9d0c7ec1fe723cecfa0
MD5 0b00e23f9429f62ca7370d2fb8964664
BLAKE2b-256 df8b4fa8fd4d73ed9ac738d620d9abaf5e24731e980e28868fb8348f0e1d7532

See more details on using hashes here.

File details

Details for the file mamut-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: mamut-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 369.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.11.6 {"installer":{"name":"uv","version":"0.11.6","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Fedora Linux","version":"44","id":"","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for mamut-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f45529e65a54f55a6b0d40b3b7d6a2ce6eb7e358e03460fa2b9722cb198010fe
MD5 604a67217c1c04b3d93c68f14e653b78
BLAKE2b-256 59745f19c7456164f10dd46b40a8a3446e7b1ec198062dc0fa4408360206bd69

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page