Skip to main content

Common tools for MLOps

Project description

Caketool

A Python MLOps toolkit for common machine learning and data science workflows. Provides EDA, feature engineering, model training, explainability, experiment tracking, model monitoring, calibration, and metrics — all designed for production-ready ML pipelines.


Installation

# Core (pandas, numpy, sklearn, xgboost)
pip install caketool

# With MLflow support
pip install "caketool[onprem]"

# With Google Cloud (Vertex AI, BigQuery)
pip install "caketool[gcp]"

# Everything
pip install "caketool[all]"

Quick Start

EDA

from caketool import eda

# Dataset overview
eda.profile(df)
eda.plot_correlations(df)

# Univariate
eda.plot_numeric_distribution(df["age"])
eda.plot_categorical_frequency(df["category"])

# Bivariate
eda.plot_scatter(df, x="income", y="spend", color_by="segment")
eda.plot_distribution_by_group(df, cat_col="segment", num_col="income", mode="box")

Feature Generation

from caketool.feature import generate_features_by_window

result = generate_features_by_window(
    df,
    client_id_col="user_id",
    report_date_col="event_date",
    fs_event_timestamp="snapshot_date",
    numeric_cols=("amount", "balance"),
    string_cols=("category",),
    boolean_cols=("is_active",),
    lookback_days=(0, 7, 30),   # 0 = lifetime
    backend="pandas",           # "pandas" | "polars" | "spark" | "bigframes"
)

Model Training

from caketool.model import BoostTree

model = BoostTree()
model.fit(X_train, y_train, eval_set=[(X_val, y_val)])
proba = model.predict_proba(X_test)[:, 1]
importance = model.get_feature_importance()

Out-of-Fold Cross-Validation

from caketool.model import BoostTree, EnsembleBoostTree

models, oof_preds, oof_labels = BoostTree.fit_oof(X_train, y_train, n_splits=5)
ensemble = EnsembleBoostTree(models)
proba = ensemble.predict_proba(X_test)[:, 1]

Explainability

from caketool.explainability import PermutationExplainer

explainer = PermutationExplainer(model=model)
explainer.fit(X_test)

# Global feature importance
importance = explainer.get_feature_importance()

# Local explanation for a single sample
local = explainer.get_local_explanation(row_index=0)

# Visualize
explainer.show_summary()
explainer.show_waterfall(row_index=0)
explainer.show_dependence(feature="income")

Score Calibration

from caketool.calibration import calibrate_score_to_normal

calibrated = calibrate_score_to_normal(raw_scores, standard=False)

Metrics

from caketool.metric import gini, psi

print(gini(y_true, y_pred))          # Gini coefficient
print(psi(expected, actual))          # Population Stability Index

Risk Report

from caketool.report import decribe_risk_score

report = decribe_risk_score(score_df, pred_col="score", label_col="label")

Drift Detection

from caketool.monitor import AdversarialModel

model = AdversarialModel()
model.fit(reference_df, current_df)
model.show()    # prints ROC AUC and top important features

Experiment Tracking

from caketool.experiment import create_tracker

# MLflow
with create_tracker("mlflow", experiment_name="my-exp", run_name="run-001") as tracker:
    tracker.log_params({"lr": 0.01, "depth": 6})
    tracker.log_metrics({"gini": 0.72})
    tracker.log_pickle(model, "model")

# Vertex AI
with create_tracker("vertex_ai", experiment_name="my-exp", run_name="run-001",
                    project="my-gcp-project", location="us-central1",
                    bucket_name="my-bucket") as tracker:
    tracker.log_params({"lr": 0.01})

API Overview

Module Key exports Description
caketool.eda profile, plot_correlations, plot_scatter, plot_distribution_by_group, rank_associations Exploratory data analysis with Plotly
caketool.feature generate_features_by_window Multi-backend aggregated feature engineering
caketool.model BoostTree, EnsembleBoostTree, VotingModel XGBoost training & ensemble
caketool.model FeatureEncoder, FeatureRemover, ColinearFeatureRemover, UnivariateFeatureRemover, InfinityHandler sklearn-compatible preprocessing transformers
caketool.explainability PermutationExplainer SHAP-based model-agnostic explainability
caketool.calibration calibrate_score_to_normal Normal distribution score calibration
caketool.metric gini, psi, psi_from_distribution Classification and stability metrics
caketool.report decribe_risk_score Risk score band report
caketool.monitor AdversarialModel Dataset drift detection
caketool.experiment create_tracker, MLflowTracker, VertexAITracker Experiment tracking abstraction

Development

conda create -n caketool python=3.10
conda activate caketool
pip-compile pyproject.toml --all-extras
pip install -e ".[dev,all]"
pre-commit install

Linting

Pre-commit hooks run ruff automatically on commit. To run manually:

ruff check src/ tests/ --fix  # Lint and auto-fix
ruff format src/ tests/        # Format code
pre-commit run --all-files     # Run all hooks

Tests

pytest tests/ -v --tb=short

Docs

pip install -e ".[docs]"
pdoc src/caketool   # Preview at http://localhost:8080

Docs are published automatically to GitHub Pages when a version tag is pushed.


Publishing

Version is automatically derived from git tags via setuptools-scm.

# Test on TestPyPI (RC/beta/alpha tags)
git tag v1.8.0-rc1
git push origin v1.8.0-rc1

# Publish to PyPI (stable tags)
git tag v1.8.0
git push origin v1.8.0

GitHub Actions builds and publishes automatically on tag push.


Local Development

python -m pip install -e .
python -c "from caketool import __version__; print(__version__)"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

caketool-1.8.2.tar.gz (92.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

caketool-1.8.2-py3-none-any.whl (71.7 kB view details)

Uploaded Python 3

File details

Details for the file caketool-1.8.2.tar.gz.

File metadata

  • Download URL: caketool-1.8.2.tar.gz
  • Upload date:
  • Size: 92.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for caketool-1.8.2.tar.gz
Algorithm Hash digest
SHA256 683728e8d9d7d04afc01f89d33fa1bd974f826979848d4b56e6d2b75b41e214e
MD5 8418fc9e925dbe040000ea4b1ab9a478
BLAKE2b-256 b315d7ef6e27393d08ea391b09025f947951bb601eb3a03c04c774b7ad77bdd2

See more details on using hashes here.

File details

Details for the file caketool-1.8.2-py3-none-any.whl.

File metadata

  • Download URL: caketool-1.8.2-py3-none-any.whl
  • Upload date:
  • Size: 71.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for caketool-1.8.2-py3-none-any.whl
Algorithm Hash digest
SHA256 53697ad484f518860addecee91af1d93dc30204f1c5082f2778f2b9b8167b742
MD5 8a5a87e0f6cbd75c02b5a59ba0d8f074
BLAKE2b-256 d41fbaf33a45cc400875db995c21a6b9db1dd7939b42c016674f6d40a71d82ac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page