Skip to main content

Common tools for MLOps

Project description

Caketool

A Python MLOps toolkit for common machine learning and data science workflows. Provides EDA, feature engineering, model training, explainability, experiment tracking, model monitoring, calibration, and metrics — all designed for production-ready ML pipelines.


Installation

# Core (pandas, numpy, sklearn, xgboost)
pip install caketool

# With MLflow support
pip install "caketool[onprem]"

# With Google Cloud (Vertex AI, BigQuery)
pip install "caketool[gcp]"

# Everything
pip install "caketool[all]"

Quick Start

EDA

from caketool import eda

# Dataset overview
eda.profile(df)
eda.plot_correlations(df)

# Univariate
eda.plot_numeric_distribution(df["age"])
eda.plot_categorical_frequency(df["category"])

# Bivariate
eda.plot_scatter(df, x="income", y="spend", color_by="segment")
eda.plot_distribution_by_group(df, cat_col="segment", num_col="income", mode="box")

Feature Generation

from caketool.feature import generate_features_by_window

result = generate_features_by_window(
    df,
    client_id_col="user_id",
    report_date_col="event_date",
    fs_event_timestamp="snapshot_date",
    numeric_cols=("amount", "balance"),
    string_cols=("category",),
    boolean_cols=("is_active",),
    lookback_days=(0, 7, 30),   # 0 = lifetime
    backend="pandas",           # "pandas" | "polars" | "spark" | "bigframes"
)

Model Training

from caketool.model import BoostTree

model = BoostTree()
model.fit(X_train, y_train, eval_set=[(X_val, y_val)])
proba = model.predict_proba(X_test)[:, 1]
importance = model.get_feature_importance()

Out-of-Fold Cross-Validation

from caketool.model import BoostTree, EnsembleBoostTree

models, oof_preds, oof_labels = BoostTree.fit_oof(X_train, y_train, n_splits=5)
ensemble = EnsembleBoostTree(models)
proba = ensemble.predict_proba(X_test)[:, 1]

Explainability

from caketool.explainability import PermutationExplainer

explainer = PermutationExplainer(model=model)
explainer.fit(X_test)

# Global feature importance
importance = explainer.get_feature_importance()

# Local explanation for a single sample
local = explainer.get_local_explanation(row_index=0)

# Visualize
explainer.show_summary()
explainer.show_waterfall(row_index=0)
explainer.show_dependence(feature="income")

Score Calibration

from caketool.calibration import calibrate_score_to_normal

calibrated = calibrate_score_to_normal(raw_scores, standard=False)

Metrics

from caketool.metric import gini, psi

print(gini(y_true, y_pred))          # Gini coefficient
print(psi(expected, actual))          # Population Stability Index

Risk Report

from caketool.report import decribe_risk_score

report = decribe_risk_score(score_df, pred_col="score", label_col="label")

Drift Detection

from caketool.monitor import AdversarialModel

model = AdversarialModel()
model.fit(reference_df, current_df)
model.show()    # prints ROC AUC and top important features

Experiment Tracking

from caketool.experiment import create_tracker

# MLflow
with create_tracker("mlflow", experiment_name="my-exp", run_name="run-001") as tracker:
    tracker.log_params({"lr": 0.01, "depth": 6})
    tracker.log_metrics({"gini": 0.72})
    tracker.log_pickle(model, "model")

# Vertex AI
with create_tracker("vertex_ai", experiment_name="my-exp", run_name="run-001",
                    project="my-gcp-project", location="us-central1",
                    bucket_name="my-bucket") as tracker:
    tracker.log_params({"lr": 0.01})

API Overview

Module Key exports Description
caketool.eda profile, plot_correlations, plot_scatter, plot_distribution_by_group, rank_associations Exploratory data analysis with Plotly
caketool.feature generate_features_by_window Multi-backend aggregated feature engineering
caketool.model BoostTree, EnsembleBoostTree, VotingModel XGBoost training & ensemble
caketool.model FeatureEncoder, FeatureRemover, ColinearFeatureRemover, UnivariateFeatureRemover, InfinityHandler sklearn-compatible preprocessing transformers
caketool.explainability PermutationExplainer SHAP-based model-agnostic explainability
caketool.calibration calibrate_score_to_normal Normal distribution score calibration
caketool.metric gini, psi, psi_from_distribution Classification and stability metrics
caketool.report decribe_risk_score Risk score band report
caketool.monitor AdversarialModel Dataset drift detection
caketool.experiment create_tracker, MLflowTracker, VertexAITracker Experiment tracking abstraction

Development

conda create -n caketool python=3.10
conda activate caketool
pip-compile pyproject.toml --all-extras
pip install -e ".[dev,all]"
pre-commit install

Linting

Pre-commit hooks run ruff automatically on commit. To run manually:

ruff check src/ tests/ --fix  # Lint and auto-fix
ruff format src/ tests/        # Format code
pre-commit run --all-files     # Run all hooks

Tests

pytest tests/ -v --tb=short

Docs

pip install -e ".[docs]"
pdoc src/caketool   # Preview at http://localhost:8080

Docs are published automatically to GitHub Pages when a version tag is pushed.


Publishing

Version is automatically derived from git tags via setuptools-scm.

# Test on TestPyPI (RC/beta/alpha tags)
git tag v1.8.0-rc1
git push origin v1.8.0-rc1

# Publish to PyPI (stable tags)
git tag v1.8.0
git push origin v1.8.0

GitHub Actions builds and publishes automatically on tag push.


Local Development

python -m pip install -e .
python -c "from caketool import __version__; print(__version__)"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

caketool-1.9.0.tar.gz (130.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

caketool-1.9.0-py3-none-any.whl (82.0 kB view details)

Uploaded Python 3

File details

Details for the file caketool-1.9.0.tar.gz.

File metadata

  • Download URL: caketool-1.9.0.tar.gz
  • Upload date:
  • Size: 130.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for caketool-1.9.0.tar.gz
Algorithm Hash digest
SHA256 b8d4b1996487466d4531f2d219f1d5bb0c5e64ce19e80fe3a80b08c4ae1c434d
MD5 ebc0c7e1c2981c1654851521c71183e4
BLAKE2b-256 31307c8e650b08b84f704c9a37a05072a1ac8ff2642065bdba4bd133bdbb5493

See more details on using hashes here.

File details

Details for the file caketool-1.9.0-py3-none-any.whl.

File metadata

  • Download URL: caketool-1.9.0-py3-none-any.whl
  • Upload date:
  • Size: 82.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for caketool-1.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 995a741b14d136103063adda4adaaf533d0e74ab9b4068fdd13ebf7fd5d34eef
MD5 1666e2ec698816e3bae7f3767faf9fd7
BLAKE2b-256 b96293b1a43068ff0011d2ffc7bc1c0432897759722515fd0730f2f91288e465

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page