Skip to main content

Common tools for MLOps

Project description

Caketool

A Python MLOps toolkit for common machine learning and data science workflows. Provides EDA, feature engineering, model training, explainability, experiment tracking, model monitoring, calibration, and metrics — all designed for production-ready ML pipelines.


Installation

# Core (pandas, numpy, sklearn, xgboost)
pip install caketool

# With MLflow support
pip install "caketool[onprem]"

# With Google Cloud (Vertex AI, BigQuery)
pip install "caketool[gcp]"

# Everything
pip install "caketool[all]"

Quick Start

EDA

from caketool import eda

# Dataset overview
eda.profile(df)
eda.plot_correlations(df)

# Univariate
eda.plot_numeric_distribution(df["age"])
eda.plot_categorical_frequency(df["category"])

# Bivariate
eda.plot_scatter(df, x="income", y="spend", color_by="segment")
eda.plot_distribution_by_group(df, cat_col="segment", num_col="income", mode="box")

Feature Generation

from caketool.feature import generate_features_by_window

result = generate_features_by_window(
    df,
    client_id_col="user_id",
    report_date_col="event_date",
    fs_event_timestamp="snapshot_date",
    numeric_cols=("amount", "balance"),
    string_cols=("category",),
    boolean_cols=("is_active",),
    lookback_days=(0, 7, 30),   # 0 = lifetime
    backend="pandas",           # "pandas" | "polars" | "spark" | "bigframes"
)

Model Training

from caketool.model import BoostTree

model = BoostTree()
model.fit(X_train, y_train, eval_set=[(X_val, y_val)])
proba = model.predict_proba(X_test)[:, 1]
importance = model.get_feature_importance()

Out-of-Fold Cross-Validation

from caketool.model import BoostTree, EnsembleBoostTree

models, oof_preds, oof_labels = BoostTree.fit_oof(X_train, y_train, n_splits=5)
ensemble = EnsembleBoostTree(models)
proba = ensemble.predict_proba(X_test)[:, 1]

Explainability

from caketool.explainability import PermutationExplainer

explainer = PermutationExplainer(model=model)
explainer.fit(X_test)

# Global feature importance
importance = explainer.get_feature_importance()

# Local explanation for a single sample
local = explainer.get_local_explanation(row_index=0)

# Visualize
explainer.show_summary()
explainer.show_waterfall(row_index=0)
explainer.show_dependence(feature="income")

Score Calibration

from caketool.calibration import calibrate_score_to_normal

calibrated = calibrate_score_to_normal(raw_scores, standard=False)

Metrics

from caketool.metric import gini, psi

print(gini(y_true, y_pred))          # Gini coefficient
print(psi(expected, actual))          # Population Stability Index

Risk Report

from caketool.report import decribe_risk_score

report = decribe_risk_score(score_df, pred_col="score", label_col="label")

Drift Detection

from caketool.monitor import AdversarialModel

model = AdversarialModel()
model.fit(reference_df, current_df)
model.show()    # prints ROC AUC and top important features

Experiment Tracking

from caketool.experiment import create_tracker

# MLflow
with create_tracker("mlflow", experiment_name="my-exp", run_name="run-001") as tracker:
    tracker.log_params({"lr": 0.01, "depth": 6})
    tracker.log_metrics({"gini": 0.72})
    tracker.log_pickle(model, "model")

# Vertex AI
with create_tracker("vertex_ai", experiment_name="my-exp", run_name="run-001",
                    project="my-gcp-project", location="us-central1",
                    bucket_name="my-bucket") as tracker:
    tracker.log_params({"lr": 0.01})

API Overview

Module Key exports Description
caketool.eda profile, plot_correlations, plot_scatter, plot_distribution_by_group, rank_associations Exploratory data analysis with Plotly
caketool.feature generate_features_by_window Multi-backend aggregated feature engineering
caketool.model BoostTree, EnsembleBoostTree, VotingModel XGBoost training & ensemble
caketool.model FeatureEncoder, FeatureRemover, ColinearFeatureRemover, UnivariateFeatureRemover, InfinityHandler sklearn-compatible preprocessing transformers
caketool.explainability PermutationExplainer SHAP-based model-agnostic explainability
caketool.calibration calibrate_score_to_normal Normal distribution score calibration
caketool.metric gini, psi, psi_from_distribution Classification and stability metrics
caketool.report decribe_risk_score Risk score band report
caketool.monitor AdversarialModel Dataset drift detection
caketool.experiment create_tracker, MLflowTracker, VertexAITracker Experiment tracking abstraction

Development

conda create -n caketool python=3.10
conda activate caketool
pip-compile pyproject.toml --all-extras
pip install -e ".[dev,all]"
pre-commit install

Linting

Pre-commit hooks run ruff automatically on commit. To run manually:

ruff check src/ tests/ --fix  # Lint and auto-fix
ruff format src/ tests/        # Format code
pre-commit run --all-files     # Run all hooks

Tests

pytest tests/ -v --tb=short

Docs

pip install -e ".[docs]"
pdoc src/caketool   # Preview at http://localhost:8080

Docs are published automatically to GitHub Pages when a version tag is pushed.


Publishing

Version is automatically derived from git tags via setuptools-scm.

# Test on TestPyPI (RC/beta/alpha tags)
git tag v1.8.0-rc1
git push origin v1.8.0-rc1

# Publish to PyPI (stable tags)
git tag v1.8.0
git push origin v1.8.0

GitHub Actions builds and publishes automatically on tag push.


Local Development

python -m pip install -e .
python -c "from caketool import __version__; print(__version__)"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

caketool-1.8.1.tar.gz (92.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

caketool-1.8.1-py3-none-any.whl (71.7 kB view details)

Uploaded Python 3

File details

Details for the file caketool-1.8.1.tar.gz.

File metadata

  • Download URL: caketool-1.8.1.tar.gz
  • Upload date:
  • Size: 92.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for caketool-1.8.1.tar.gz
Algorithm Hash digest
SHA256 4679c2800a9fa6f594e6d58c96de2560d6bffa80f2a0dd56002bf6f366afccdf
MD5 c8d0c34dbe88c040d08e5d9d5833483c
BLAKE2b-256 acbf83b7cc5f133c5be5aa519211051ab862e61b8e72d9b969b60e8542b0dc33

See more details on using hashes here.

File details

Details for the file caketool-1.8.1-py3-none-any.whl.

File metadata

  • Download URL: caketool-1.8.1-py3-none-any.whl
  • Upload date:
  • Size: 71.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for caketool-1.8.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ed4100992323cfbf27542c0221950694e2e4384614ef9250af94d48653629090
MD5 f2cd37c89e07a25013839c2e1a4b1a3a
BLAKE2b-256 eb820cca5459aa541daba9d8955c367fbc4ca418ce866ac008b7f8e2521c240e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page