Skip to main content

Common tools for MLOps

Project description

Caketool

A Python MLOps toolkit for common machine learning and data science workflows. Provides EDA, feature engineering, model training, explainability, experiment tracking, model monitoring, calibration, and metrics — all designed for production-ready ML pipelines.


Installation

# Core (pandas, numpy, sklearn, xgboost)
pip install caketool

# With MLflow support
pip install "caketool[onprem]"

# With Google Cloud (Vertex AI, BigQuery)
pip install "caketool[gcp]"

# Everything
pip install "caketool[all]"

Quick Start

EDA

from caketool import eda

# Dataset overview
eda.profile(df)
eda.plot_correlations(df)

# Univariate
eda.plot_numeric_distribution(df["age"])
eda.plot_categorical_frequency(df["category"])

# Bivariate
eda.plot_scatter(df, x="income", y="spend", color_by="segment")
eda.plot_distribution_by_group(df, cat_col="segment", num_col="income", mode="box")

Feature Generation

from caketool.feature import generate_features_by_window

result = generate_features_by_window(
    df,
    client_id_col="user_id",
    report_date_col="event_date",
    fs_event_timestamp="snapshot_date",
    numeric_cols=("amount", "balance"),
    string_cols=("category",),
    boolean_cols=("is_active",),
    lookback_days=(0, 7, 30),   # 0 = lifetime
    backend="pandas",           # "pandas" | "polars" | "spark" | "bigframes"
)

Model Training

from caketool.model import BoostTree

model = BoostTree()
model.fit(X_train, y_train, eval_set=[(X_val, y_val)])
proba = model.predict_proba(X_test)[:, 1]
importance = model.get_feature_importance()

Out-of-Fold Cross-Validation

from caketool.model import BoostTree, EnsembleBoostTree

models, oof_preds, oof_labels = BoostTree.fit_oof(X_train, y_train, n_splits=5)
ensemble = EnsembleBoostTree(models)
proba = ensemble.predict_proba(X_test)[:, 1]

Explainability

from caketool.explainability import PermutationExplainer

explainer = PermutationExplainer(model=model)
explainer.fit(X_test)

# Global feature importance
importance = explainer.get_feature_importance()

# Local explanation for a single sample
local = explainer.get_local_explanation(row_index=0)

# Visualize
explainer.show_summary()
explainer.show_waterfall(row_index=0)
explainer.show_dependence(feature="income")

Score Calibration

from caketool.calibration import calibrate_score_to_normal

calibrated = calibrate_score_to_normal(raw_scores, standard=False)

Metrics

from caketool.metric import gini, psi

print(gini(y_true, y_pred))          # Gini coefficient
print(psi(expected, actual))          # Population Stability Index

Risk Report

from caketool.report import decribe_risk_score

report = decribe_risk_score(score_df, pred_col="score", label_col="label")

Drift Detection

from caketool.monitor import AdversarialModel

model = AdversarialModel()
model.fit(reference_df, current_df)
model.show()    # prints ROC AUC and top important features

Experiment Tracking

from caketool.experiment import create_tracker

# MLflow
with create_tracker("mlflow", experiment_name="my-exp", run_name="run-001") as tracker:
    tracker.log_params({"lr": 0.01, "depth": 6})
    tracker.log_metrics({"gini": 0.72})
    tracker.log_pickle(model, "model")

# Vertex AI
with create_tracker("vertex_ai", experiment_name="my-exp", run_name="run-001",
                    project="my-gcp-project", location="us-central1",
                    bucket_name="my-bucket") as tracker:
    tracker.log_params({"lr": 0.01})

API Overview

Module Key exports Description
caketool.eda profile, plot_correlations, plot_scatter, plot_distribution_by_group, rank_associations Exploratory data analysis with Plotly
caketool.feature generate_features_by_window Multi-backend aggregated feature engineering
caketool.model BoostTree, EnsembleBoostTree, VotingModel XGBoost training & ensemble
caketool.model FeatureEncoder, FeatureRemover, ColinearFeatureRemover, UnivariateFeatureRemover, InfinityHandler sklearn-compatible preprocessing transformers
caketool.explainability PermutationExplainer SHAP-based model-agnostic explainability
caketool.calibration calibrate_score_to_normal Normal distribution score calibration
caketool.metric gini, psi, psi_from_distribution Classification and stability metrics
caketool.report decribe_risk_score Risk score band report
caketool.monitor AdversarialModel Dataset drift detection
caketool.experiment create_tracker, MLflowTracker, VertexAITracker Experiment tracking abstraction

Development

conda create -n caketool python=3.10
conda activate caketool
pip-compile pyproject.toml --all-extras
pip install -e ".[dev,all]"
pre-commit install

Linting

Pre-commit hooks run ruff automatically on commit. To run manually:

ruff check src/ tests/ --fix  # Lint and auto-fix
ruff format src/ tests/        # Format code
pre-commit run --all-files     # Run all hooks

Tests

pytest tests/ -v --tb=short

Docs

pip install -e ".[docs]"
pdoc src/caketool   # Preview at http://localhost:8080

Docs are published automatically to GitHub Pages when a version tag is pushed.


Publishing

Version is automatically derived from git tags via setuptools-scm.

# Test on TestPyPI (RC/beta/alpha tags)
git tag v1.8.0-rc1
git push origin v1.8.0-rc1

# Publish to PyPI (stable tags)
git tag v1.8.0
git push origin v1.8.0

GitHub Actions builds and publishes automatically on tag push.


Local Development

python -m pip install -e .
python -c "from caketool import __version__; print(__version__)"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

caketool-1.9.1.tar.gz (130.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

caketool-1.9.1-py3-none-any.whl (82.0 kB view details)

Uploaded Python 3

File details

Details for the file caketool-1.9.1.tar.gz.

File metadata

  • Download URL: caketool-1.9.1.tar.gz
  • Upload date:
  • Size: 130.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for caketool-1.9.1.tar.gz
Algorithm Hash digest
SHA256 6044d6aa1f59b8733ce6bdcfec0813c0eec5852f1521e07e5e4b91501c847360
MD5 e7e4b0894dda90d63c4519adc7631fe6
BLAKE2b-256 658563bf8229c06bd0d95cb0cac35f53af87b83be86e1c8cd40fa504f2edf727

See more details on using hashes here.

File details

Details for the file caketool-1.9.1-py3-none-any.whl.

File metadata

  • Download URL: caketool-1.9.1-py3-none-any.whl
  • Upload date:
  • Size: 82.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for caketool-1.9.1-py3-none-any.whl
Algorithm Hash digest
SHA256 bda67cdc8a6712760f4a469b533ab5cbdeb09caba89ca98f9db0c23d803c3f18
MD5 ebed09252bf4b9a8eb322c3c0b65ad10
BLAKE2b-256 38e0ae0655c7a1fc2c93eb7afb485e445dcadddbdbb031fd380e5a10ffb28232

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page