UX wrapper over CatBoost: readable errors, pre-flight validation, ergonomic custom losses, sklearn pipeline compat, structured logging.

These details have not been verified by PyPI

Project links

Project description

catboost_utils

A UX wrapper over CatBoost — readable errors, pre-flight data validation, ergonomic custom losses, sklearn pipeline compatibility, structured logging, lossless save/load, and exception-safe callbacks.

Not a fork. Not a replacement. A wrapper. Use it where it helps; mix freely with stock catboost.

Install

pip install catboost-utils
# or, with sklearn pipeline support:
pip install "catboost-utils[sklearn]"

Requires Python 3.10+ and CatBoost 1.2+.

Quick start

from catboost_utils import CBXClassifier

model = CBXClassifier(
    iterations=500,
    auto_cat_features=True,        # detects str/category/bool columns automatically
    nan_fill="__NA__",             # explicit handling of NaN in cat features (no magic)
    early_stopping="auto",         # enables sane defaults when eval_set is given
)
model.fit(X_train, y_train, eval_set=(X_val, y_val))

isinstance(model, CatBoostClassifier) is still True. clone(), GridSearchCV, and pickle work out of the box.

What's in the box

Every module is independent. Use only what you need.

`errors` — readable error messages

from catboost_utils import wrap, CBXError

m = wrap(CatBoostClassifier(iterations=10))
try:
    m.fit(X, y)   # X has a string column not declared in cat_features
except CBXError as e:
    print(e.human_message)  # e.g. "Feature 'city' (index 5) has invalid type ..."
    print(e.hint)           # e.g. "Convert float values and NaN to strings ..."

wrap() swaps the model's class to a CBX-enhanced subclass — isinstance checks keep working, and pickle round-trips correctly.

`validation` — pre-flight checks

from catboost_utils import validate

report = validate(X, y, cat_features=["city"])
report                  # in Jupyter: rich HTML table of issues + warnings
report.raise_if_failed()  # raises ValidationError if any blocking issue

Catches NaN-in-cat-features, inf, single-class targets, undeclared object columns, datetime columns, class-weights conflicts, and GPU/multi-thread non-determinism — before training crashes with a cryptic message.

`objectives` — custom losses, numba-jit'ed

import numpy as np
from catboost_utils.objectives import objective, metric

@objective(task="regression")
def my_huber(y_true: np.ndarray, y_pred: np.ndarray):
    delta = 1.0
    err = y_pred - y_true
    is_small = np.abs(err) <= delta
    grad = np.where(is_small, err, delta * np.sign(err))
    hess = np.where(is_small, 1.0, 0.0)
    return grad, hess

@metric(task="regression", name="MAE", higher_is_better=False)
def mae(y_true: np.ndarray, y_pred: np.ndarray) -> float:
    return float(np.mean(np.abs(y_true - y_pred)))

model = CatBoostRegressor(loss_function=my_huber, eval_metric=mae)

The decorator handles all CatBoost-isms (list-of-list approxes, sign convention, weights, sigmoid/softmax internal transform). Functions are JIT-compiled with numba; multiclass works at C-speed despite CatBoost's per-object API.

`pipeline` — sklearn-friendly classes

from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from catboost_utils import CBXRegressor

pipe = Pipeline([
    ("scaler", StandardScaler()),
    ("model", CBXRegressor(iterations=100)),
])
pipe.fit(X, y)

Works inside Pipeline, GridSearchCV, cross_val_score, and clone().

`logging` — structured training output

import logging
from catboost_utils.logging import setup_logging, attach

setup_logging(level=logging.INFO, structured=False)
attach(model)
model.fit(X, y)
# INFO catboost_utils.training - iteration=10 learn_loss=0.423 test_loss=0.451 ...

Use structured=True for JSON output. Each parsed line carries an cbx_iteration extra dict for downstream log processors.

`explain` — feature importance + SHAP, named DataFrames

from catboost_utils.explain import feature_importance, shap_values, check_early_stopping

fi = feature_importance(model, X)            # sorted DataFrame with feature names
sv = shap_values(model, X)                   # DataFrame: features + expected_value
check_early_stopping(model, eval_set=eval_set)  # raise CBXError if misconfigured

`io` — lossless save/load

from catboost_utils.io import save, load

save(model, "artifact.cbm")              # writes artifact.cbm + artifact.cbm.meta.json
restored = load("artifact.cbm")          # restores best_iteration, feature_names, etc.

The sidecar bundles best_iteration, feature_names, cat_features, class_names, training params, and version info. load() works without a sidecar (logs a warning) so external .cbm files keep loading.

`callbacks` — exception-safe wrapper

from catboost_utils.callbacks import safe

cb = safe(my_callback)
model.fit(X, y, callbacks=[cb])
cb.raise_if_failed()   # surfaces any exception your callback raised, with original traceback

Principles

Backwards compatible — anything that works in CatBoost works through catboost_utils.
Opt-in — every module is independent. Use what you need; ignore the rest.
No magic — no silent data transformations. Auto-fixes are always parameters the user passes explicitly (nan_fill="...", auto_cat_features=True).
Strict typing — every public function fully annotated; mypy --strict clean.

Compatibility

Python: 3.10, 3.11, 3.12
CatBoost: ≥ 1.2, < 2.0
sklearn: 1.3+ (optional, only for the pipeline module)

Versioning

Pre-1.0 (0.x.y). Any minor bump may include breaking changes — see CHANGELOG.md. 1.0.0 will be cut once the public API is frozen and CI is green across the matrix.

License

Apache 2.0 — see LICENSE.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Apr 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

catboost_utils-0.1.0.tar.gz (242.9 kB view details)

Uploaded Apr 28, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

catboost_utils-0.1.0-py3-none-any.whl (52.8 kB view details)

Uploaded Apr 28, 2026 Python 3

File details

Details for the file catboost_utils-0.1.0.tar.gz.

File metadata

Download URL: catboost_utils-0.1.0.tar.gz
Upload date: Apr 28, 2026
Size: 242.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for catboost_utils-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`441323118626a5ca5b679c8a5bd4a30fd1e7e542bac8017e038422dfc3b0708e`
MD5	`022238444f4b32ebf3239eec8eb81c74`
BLAKE2b-256	`2f2788aaa57cb650bdb8cca65b554086238c1fd42417476c96c59473fadb7c1d`

See more details on using hashes here.

File details

Details for the file catboost_utils-0.1.0-py3-none-any.whl.

File metadata

Download URL: catboost_utils-0.1.0-py3-none-any.whl
Upload date: Apr 28, 2026
Size: 52.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for catboost_utils-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6190f5b7fef74457796e1ff224898dd1c47c7d90507715a4b801e678ab481220`
MD5	`00e7a8b5b958c344a809f81235a07075`
BLAKE2b-256	`0078c3bb01347febd78d02b4cd77a41df40c6a7343e6e95bc01f529734c9cbea`

See more details on using hashes here.

catboost-utils 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

catboost_utils

Install

Quick start

What's in the box

errors — readable error messages

validation — pre-flight checks

objectives — custom losses, numba-jit'ed

pipeline — sklearn-friendly classes

logging — structured training output

explain — feature importance + SHAP, named DataFrames

io — lossless save/load

callbacks — exception-safe wrapper

Principles

Compatibility

Versioning

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`errors` — readable error messages

`validation` — pre-flight checks

`objectives` — custom losses, numba-jit'ed

`pipeline` — sklearn-friendly classes

`logging` — structured training output

`explain` — feature importance + SHAP, named DataFrames

`io` — lossless save/load

`callbacks` — exception-safe wrapper