Common tools for MLOps
Project description
Caketool
A Python MLOps toolkit for common machine learning and data science workflows. Provides EDA, feature engineering, model training, explainability, experiment tracking, model monitoring, calibration, and metrics — all designed for production-ready ML pipelines.
Installation
# Core (pandas, numpy, sklearn, xgboost)
pip install caketool
# With MLflow support
pip install "caketool[onprem]"
# With Google Cloud (Vertex AI, BigQuery)
pip install "caketool[gcp]"
# Everything
pip install "caketool[all]"
Quick Start
EDA
from caketool import eda
# Dataset overview
eda.profile(df)
eda.plot_correlations(df)
# Univariate
eda.plot_numeric_distribution(df["age"])
eda.plot_categorical_frequency(df["category"])
# Bivariate
eda.plot_scatter(df, x="income", y="spend", color_by="segment")
eda.plot_distribution_by_group(df, cat_col="segment", num_col="income", mode="box")
Feature Generation
from caketool.feature import generate_features_by_window
result = generate_features_by_window(
df,
client_id_col="user_id",
report_date_col="event_date",
fs_event_timestamp="snapshot_date",
numeric_cols=("amount", "balance"),
string_cols=("category",),
boolean_cols=("is_active",),
lookback_days=(0, 7, 30), # 0 = lifetime
backend="pandas", # "pandas" | "polars" | "spark" | "bigframes"
)
Model Training
from caketool.model import BoostTree
model = BoostTree()
model.fit(X_train, y_train, eval_set=[(X_val, y_val)])
proba = model.predict_proba(X_test)[:, 1]
importance = model.get_feature_importance()
Out-of-Fold Cross-Validation
from caketool.model import BoostTree, EnsembleBoostTree
models, oof_preds, oof_labels = BoostTree.fit_oof(X_train, y_train, n_splits=5)
ensemble = EnsembleBoostTree(models)
proba = ensemble.predict_proba(X_test)[:, 1]
Explainability
from caketool.explainability import PermutationExplainer
explainer = PermutationExplainer(model=model)
explainer.fit(X_test)
# Global feature importance
importance = explainer.get_feature_importance()
# Local explanation for a single sample
local = explainer.get_local_explanation(row_index=0)
# Visualize
explainer.show_summary()
explainer.show_waterfall(row_index=0)
explainer.show_dependence(feature="income")
Score Calibration
from caketool.calibration import calibrate_score_to_normal
calibrated = calibrate_score_to_normal(raw_scores, standard=False)
Metrics
from caketool.metric import gini, psi
print(gini(y_true, y_pred)) # Gini coefficient
print(psi(expected, actual)) # Population Stability Index
Risk Report
from caketool.report import decribe_risk_score
report = decribe_risk_score(score_df, pred_col="score", label_col="label")
Drift Detection
from caketool.monitor import AdversarialModel
model = AdversarialModel()
model.fit(reference_df, current_df)
model.show() # prints ROC AUC and top important features
Experiment Tracking
from caketool.experiment import create_tracker
# MLflow
with create_tracker("mlflow", experiment_name="my-exp", run_name="run-001") as tracker:
tracker.log_params({"lr": 0.01, "depth": 6})
tracker.log_metrics({"gini": 0.72})
tracker.log_pickle(model, "model")
# Vertex AI
with create_tracker("vertex_ai", experiment_name="my-exp", run_name="run-001",
project="my-gcp-project", location="us-central1",
bucket_name="my-bucket") as tracker:
tracker.log_params({"lr": 0.01})
API Overview
| Module | Key exports | Description |
|---|---|---|
caketool.eda |
profile, plot_correlations, plot_scatter, plot_distribution_by_group, rank_associations |
Exploratory data analysis with Plotly |
caketool.feature |
generate_features_by_window |
Multi-backend aggregated feature engineering |
caketool.model |
BoostTree, EnsembleBoostTree, VotingModel |
XGBoost training & ensemble |
caketool.model |
FeatureEncoder, FeatureRemover, ColinearFeatureRemover, UnivariateFeatureRemover, InfinityHandler |
sklearn-compatible preprocessing transformers |
caketool.explainability |
PermutationExplainer |
SHAP-based model-agnostic explainability |
caketool.calibration |
calibrate_score_to_normal |
Normal distribution score calibration |
caketool.metric |
gini, psi, psi_from_distribution |
Classification and stability metrics |
caketool.report |
decribe_risk_score |
Risk score band report |
caketool.monitor |
AdversarialModel |
Dataset drift detection |
caketool.experiment |
create_tracker, MLflowTracker, VertexAITracker |
Experiment tracking abstraction |
Development
conda create -n caketool python=3.10
conda activate caketool
pip-compile pyproject.toml --all-extras
pip install -e ".[dev,all]"
pre-commit install
Linting
Pre-commit hooks run ruff automatically on commit. To run manually:
ruff check src/ tests/ --fix # Lint and auto-fix
ruff format src/ tests/ # Format code
pre-commit run --all-files # Run all hooks
Tests
pytest tests/ -v --tb=short
Docs
pip install -e ".[docs]"
pdoc src/caketool # Preview at http://localhost:8080
Docs are published automatically to GitHub Pages when a version tag is pushed.
Publishing
Version is automatically derived from git tags via setuptools-scm.
# Test on TestPyPI (RC/beta/alpha tags)
git tag v1.8.0-rc1
git push origin v1.8.0-rc1
# Publish to PyPI (stable tags)
git tag v1.8.0
git push origin v1.8.0
GitHub Actions builds and publishes automatically on tag push.
Local Development
python -m pip install -e .
python -c "from caketool import __version__; print(__version__)"
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file caketool-1.8.1.tar.gz.
File metadata
- Download URL: caketool-1.8.1.tar.gz
- Upload date:
- Size: 92.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4679c2800a9fa6f594e6d58c96de2560d6bffa80f2a0dd56002bf6f366afccdf
|
|
| MD5 |
c8d0c34dbe88c040d08e5d9d5833483c
|
|
| BLAKE2b-256 |
acbf83b7cc5f133c5be5aa519211051ab862e61b8e72d9b969b60e8542b0dc33
|
File details
Details for the file caketool-1.8.1-py3-none-any.whl.
File metadata
- Download URL: caketool-1.8.1-py3-none-any.whl
- Upload date:
- Size: 71.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.20
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ed4100992323cfbf27542c0221950694e2e4384614ef9250af94d48653629090
|
|
| MD5 |
f2cd37c89e07a25013839c2e1a4b1a3a
|
|
| BLAKE2b-256 |
eb820cca5459aa541daba9d8955c367fbc4ca418ce866ac008b7f8e2521c240e
|