Skip to main content

UX wrapper over CatBoost: readable errors, pre-flight validation, ergonomic custom losses, sklearn pipeline compat, structured logging.

Project description

catboost_utils

A UX wrapper over CatBoost — readable errors, pre-flight data validation, ergonomic custom losses, sklearn pipeline compatibility, structured logging, lossless save/load, and exception-safe callbacks.

Not a fork. Not a replacement. A wrapper. Use it where it helps; mix freely with stock catboost.

Install

pip install catboost-utils
# or, with sklearn pipeline support:
pip install "catboost-utils[sklearn]"

Requires Python 3.10+ and CatBoost 1.2+.

Quick start

from catboost_utils import CBXClassifier

model = CBXClassifier(
    iterations=500,
    auto_cat_features=True,        # detects str/category/bool columns automatically
    nan_fill="__NA__",             # explicit handling of NaN in cat features (no magic)
    early_stopping="auto",         # enables sane defaults when eval_set is given
)
model.fit(X_train, y_train, eval_set=(X_val, y_val))

isinstance(model, CatBoostClassifier) is still True. clone(), GridSearchCV, and pickle work out of the box.

What's in the box

Every module is independent. Use only what you need.

errors — readable error messages

from catboost_utils import wrap, CBXError

m = wrap(CatBoostClassifier(iterations=10))
try:
    m.fit(X, y)   # X has a string column not declared in cat_features
except CBXError as e:
    print(e.human_message)  # e.g. "Feature 'city' (index 5) has invalid type ..."
    print(e.hint)           # e.g. "Convert float values and NaN to strings ..."

wrap() swaps the model's class to a CBX-enhanced subclass — isinstance checks keep working, and pickle round-trips correctly.

validation — pre-flight checks

from catboost_utils import validate

report = validate(X, y, cat_features=["city"])
report                  # in Jupyter: rich HTML table of issues + warnings
report.raise_if_failed()  # raises ValidationError if any blocking issue

Catches NaN-in-cat-features, inf, single-class targets, undeclared object columns, datetime columns, class-weights conflicts, and GPU/multi-thread non-determinism — before training crashes with a cryptic message.

objectives — custom losses, numba-jit'ed

import numpy as np
from catboost_utils.objectives import objective, metric

@objective(task="regression")
def my_huber(y_true: np.ndarray, y_pred: np.ndarray):
    delta = 1.0
    err = y_pred - y_true
    is_small = np.abs(err) <= delta
    grad = np.where(is_small, err, delta * np.sign(err))
    hess = np.where(is_small, 1.0, 0.0)
    return grad, hess

@metric(task="regression", name="MAE", higher_is_better=False)
def mae(y_true: np.ndarray, y_pred: np.ndarray) -> float:
    return float(np.mean(np.abs(y_true - y_pred)))

model = CatBoostRegressor(loss_function=my_huber, eval_metric=mae)

The decorator handles all CatBoost-isms (list-of-list approxes, sign convention, weights, sigmoid/softmax internal transform). Functions are JIT-compiled with numba; multiclass works at C-speed despite CatBoost's per-object API.

pipeline — sklearn-friendly classes

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from catboost_utils import CBXRegressor

pipe = Pipeline([
    ("scaler", StandardScaler()),
    ("model", CBXRegressor(iterations=100)),
])
pipe.fit(X, y)

Works inside Pipeline, GridSearchCV, cross_val_score, and clone().

logging — structured training output

import logging
from catboost_utils.logging import setup_logging, attach

setup_logging(level=logging.INFO, structured=False)
attach(model)
model.fit(X, y)
# INFO catboost_utils.training - iteration=10 learn_loss=0.423 test_loss=0.451 ...

Use structured=True for JSON output. Each parsed line carries an cbx_iteration extra dict for downstream log processors.

explain — feature importance + SHAP, named DataFrames

from catboost_utils.explain import feature_importance, shap_values, check_early_stopping

fi = feature_importance(model, X)            # sorted DataFrame with feature names
sv = shap_values(model, X)                   # DataFrame: features + expected_value
check_early_stopping(model, eval_set=eval_set)  # raise CBXError if misconfigured

io — lossless save/load

from catboost_utils.io import save, load

save(model, "artifact.cbm")              # writes artifact.cbm + artifact.cbm.meta.json
restored = load("artifact.cbm")          # restores best_iteration, feature_names, etc.

The sidecar bundles best_iteration, feature_names, cat_features, class_names, training params, and version info. load() works without a sidecar (logs a warning) so external .cbm files keep loading.

callbacks — exception-safe wrapper

from catboost_utils.callbacks import safe

cb = safe(my_callback)
model.fit(X, y, callbacks=[cb])
cb.raise_if_failed()   # surfaces any exception your callback raised, with original traceback

Principles

  • Backwards compatible — anything that works in CatBoost works through catboost_utils.
  • Opt-in — every module is independent. Use what you need; ignore the rest.
  • No magic — no silent data transformations. Auto-fixes are always parameters the user passes explicitly (nan_fill="...", auto_cat_features=True).
  • Strict typing — every public function fully annotated; mypy --strict clean.

Compatibility

  • Python: 3.10, 3.11, 3.12
  • CatBoost: ≥ 1.2, < 2.0
  • sklearn: 1.3+ (optional, only for the pipeline module)

Versioning

Pre-1.0 (0.x.y). Any minor bump may include breaking changes — see CHANGELOG.md. 1.0.0 will be cut once the public API is frozen and CI is green across the matrix.

License

Apache 2.0 — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

catboost_utils-0.1.0.tar.gz (242.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

catboost_utils-0.1.0-py3-none-any.whl (52.8 kB view details)

Uploaded Python 3

File details

Details for the file catboost_utils-0.1.0.tar.gz.

File metadata

  • Download URL: catboost_utils-0.1.0.tar.gz
  • Upload date:
  • Size: 242.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for catboost_utils-0.1.0.tar.gz
Algorithm Hash digest
SHA256 441323118626a5ca5b679c8a5bd4a30fd1e7e542bac8017e038422dfc3b0708e
MD5 022238444f4b32ebf3239eec8eb81c74
BLAKE2b-256 2f2788aaa57cb650bdb8cca65b554086238c1fd42417476c96c59473fadb7c1d

See more details on using hashes here.

File details

Details for the file catboost_utils-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: catboost_utils-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 52.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for catboost_utils-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6190f5b7fef74457796e1ff224898dd1c47c7d90507715a4b801e678ab481220
MD5 00e7a8b5b958c344a809f81235a07075
BLAKE2b-256 0078c3bb01347febd78d02b4cd77a41df40c6a7343e6e95bc01f529734c9cbea

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page