Skip to main content

Common tools for MLOps

Project description

Caketool

A Python MLOps toolkit for common machine learning and data science workflows. Provides feature engineering, model training, experiment tracking, model monitoring, calibration, and metrics — all designed for production credit-risk and ML pipelines.


Installation

# Core (pandas, numpy, sklearn, xgboost)
pip install caketool

# With MLflow support
pip install "caketool[onprem]"

# With Google Cloud (Vertex AI, BigQuery)
pip install "caketool[gcp]"

# Everything
pip install "caketool[all]"

Quick Start

Feature Generation

from caketool.feature import generate_features_by_window

result = generate_features_by_window(
    df,
    client_id_col="user_id",
    report_date_col="event_date",
    fs_event_timestamp="snapshot_date",
    numeric_cols=("amount", "balance"),
    string_cols=("category",),
    boolean_cols=("is_active",),
    lookback_days=(0, 7, 30),   # 0 = lifetime
    backend="pandas",           # "pandas" | "polars" | "spark" | "bigframes"
)

Model Training

from caketool.model import BoostTree

model = BoostTree()
model.fit(X_train, y_train, eval_set=[(X_val, y_val)])
proba = model.predict_proba(X_test)[:, 1]
importance = model.get_feature_importance()

Out-of-Fold Cross-Validation

from caketool.model import BoostTree, EnsembleBoostTree

models, oof_preds, oof_labels = BoostTree.fit_oof(X_train, y_train, n_splits=5)
ensemble = EnsembleBoostTree(models)
proba = ensemble.predict_proba(X_test)[:, 1]

Score Calibration

from caketool.calibration import calibrate_score_to_normal

calibrated = calibrate_score_to_normal(raw_scores, standard=False)

Metrics

from caketool.metric import gini, psi

print(gini(y_true, y_pred))          # Gini coefficient
print(psi(expected, actual))          # Population Stability Index

Risk Report

from caketool.report import decribe_risk_score

report = decribe_risk_score(score_df, pred_col="score", label_col="label")

Drift Detection

from caketool.monitor import AdversarialModel

model = AdversarialModel()
model.fit(reference_df, current_df)
model.show()    # prints ROC AUC and top important features

Experiment Tracking

from caketool.experiment import create_tracker

# MLflow
with create_tracker("mlflow", experiment_name="my-exp", run_name="run-001") as tracker:
    tracker.log_params({"lr": 0.01, "depth": 6})
    tracker.log_metrics({"gini": 0.72})
    tracker.log_pickle(model, "model")

# Vertex AI
with create_tracker("vertex_ai", experiment_name="my-exp", run_name="run-001",
                    project="my-gcp-project", location="us-central1",
                    bucket_name="my-bucket") as tracker:
    tracker.log_params({"lr": 0.01})

API Overview

Module Key exports Description
caketool.feature generate_features_by_window Multi-backend aggregated feature engineering
caketool.model BoostTree, EnsembleBoostTree, VotingModel XGBoost training & ensemble
caketool.model FeatureEncoder, FeatureRemover, ColinearFeatureRemover, UnivariateFeatureRemover, InfinityHandler sklearn-compatible preprocessing transformers
caketool.calibration calibrate_score_to_normal Normal distribution score calibration
caketool.metric gini, psi, psi_from_distribution Classification and stability metrics
caketool.report decribe_risk_score Risk score band report
caketool.monitor AdversarialModel Dataset drift detection
caketool.experiment create_tracker, MLflowTracker, VertexAITracker Experiment tracking abstraction

Development

conda create -n caketool python=3.10
conda activate caketool
pip-compile pyproject.toml --all-extras
pip install -e ".[dev,all]"
pre-commit install

Linting

Pre-commit hooks run ruff automatically on commit. To run manually:

ruff check src/ tests/ --fix  # Lint and auto-fix
ruff format src/ tests/        # Format code
pre-commit run --all-files     # Run all hooks

Tests

pytest tests/ -v --tb=short

Publishing

Version is automatically derived from git tags via setuptools-scm.

# Test on TestPyPI (RC/beta/alpha tags)
git tag v1.8.0-rc1
git push origin v1.8.0-rc1

# Publish to PyPI (stable tags)
git tag v1.8.0
git push origin v1.8.0

GitHub Actions builds and publishes automatically on tag push.


Local Development

python -m pip install -e .
python -c "from caketool import __version__; print(__version__)"

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

caketool-1.8.0.tar.gz (83.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

caketool-1.8.0-py3-none-any.whl (61.5 kB view details)

Uploaded Python 3

File details

Details for the file caketool-1.8.0.tar.gz.

File metadata

  • Download URL: caketool-1.8.0.tar.gz
  • Upload date:
  • Size: 83.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for caketool-1.8.0.tar.gz
Algorithm Hash digest
SHA256 59318c4fbda9433d1da43a5b0d3feaaa2663efa2f004be99a3b993ac0f435473
MD5 b2a759c4b6d8a0db625ba6e0f4aca491
BLAKE2b-256 43fb5e4841d506fba7595ca941e3d044e63297e97166ca201056675c2d18b6fe

See more details on using hashes here.

File details

Details for the file caketool-1.8.0-py3-none-any.whl.

File metadata

  • Download URL: caketool-1.8.0-py3-none-any.whl
  • Upload date:
  • Size: 61.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.20

File hashes

Hashes for caketool-1.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1a4804c84f4c4a2940b22c32ab921da9aeb50658d37de041caeeb74126d19676
MD5 f9e2def7ac95d2966127bf7ce1ca1b2a
BLAKE2b-256 d558eebf6aeb7141da4a42fac2c324d4dad2ef61ad1aec79ffcf77f84d51ad9b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page