UX wrapper over CatBoost: readable errors, pre-flight validation, ergonomic custom losses, sklearn pipeline compat, structured logging.
Project description
catboost_utils
A UX wrapper over CatBoost — readable errors, pre-flight data validation, ergonomic custom losses, sklearn pipeline compatibility, structured logging, lossless save/load, and exception-safe callbacks.
Not a fork. Not a replacement. A wrapper. Use it where it helps; mix freely with stock catboost.
Install
pip install catboost-utils
# or, with sklearn pipeline support:
pip install "catboost-utils[sklearn]"
Requires Python 3.10+ and CatBoost 1.2+.
Quick start
from catboost_utils import CBXClassifier
model = CBXClassifier(
iterations=500,
auto_cat_features=True, # detects str/category/bool columns automatically
nan_fill="__NA__", # explicit handling of NaN in cat features (no magic)
early_stopping="auto", # enables sane defaults when eval_set is given
)
model.fit(X_train, y_train, eval_set=(X_val, y_val))
isinstance(model, CatBoostClassifier) is still True. clone(), GridSearchCV, and pickle work out of the box.
What's in the box
Every module is independent. Use only what you need.
errors — readable error messages
from catboost_utils import wrap, CBXError
m = wrap(CatBoostClassifier(iterations=10))
try:
m.fit(X, y) # X has a string column not declared in cat_features
except CBXError as e:
print(e.human_message) # e.g. "Feature 'city' (index 5) has invalid type ..."
print(e.hint) # e.g. "Convert float values and NaN to strings ..."
wrap() swaps the model's class to a CBX-enhanced subclass — isinstance checks keep working, and pickle round-trips correctly.
validation — pre-flight checks
from catboost_utils import validate
report = validate(X, y, cat_features=["city"])
report # in Jupyter: rich HTML table of issues + warnings
report.raise_if_failed() # raises ValidationError if any blocking issue
Catches NaN-in-cat-features, inf, single-class targets, undeclared object columns, datetime columns, class-weights conflicts, and GPU/multi-thread non-determinism — before training crashes with a cryptic message.
objectives — custom losses, numba-jit'ed
import numpy as np
from catboost_utils.objectives import objective, metric
@objective(task="regression")
def my_huber(y_true: np.ndarray, y_pred: np.ndarray):
delta = 1.0
err = y_pred - y_true
is_small = np.abs(err) <= delta
grad = np.where(is_small, err, delta * np.sign(err))
hess = np.where(is_small, 1.0, 0.0)
return grad, hess
@metric(task="regression", name="MAE", higher_is_better=False)
def mae(y_true: np.ndarray, y_pred: np.ndarray) -> float:
return float(np.mean(np.abs(y_true - y_pred)))
model = CatBoostRegressor(loss_function=my_huber, eval_metric=mae)
The decorator handles all CatBoost-isms (list-of-list approxes, sign convention, weights, sigmoid/softmax internal transform). Functions are JIT-compiled with numba; multiclass works at C-speed despite CatBoost's per-object API.
pipeline — sklearn-friendly classes
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from catboost_utils import CBXRegressor
pipe = Pipeline([
("scaler", StandardScaler()),
("model", CBXRegressor(iterations=100)),
])
pipe.fit(X, y)
Works inside Pipeline, GridSearchCV, cross_val_score, and clone().
logging — structured training output
import logging
from catboost_utils.logging import setup_logging, attach
setup_logging(level=logging.INFO, structured=False)
attach(model)
model.fit(X, y)
# INFO catboost_utils.training - iteration=10 learn_loss=0.423 test_loss=0.451 ...
Use structured=True for JSON output. Each parsed line carries an cbx_iteration extra dict for downstream log processors.
explain — feature importance + SHAP, named DataFrames
from catboost_utils.explain import feature_importance, shap_values, check_early_stopping
fi = feature_importance(model, X) # sorted DataFrame with feature names
sv = shap_values(model, X) # DataFrame: features + expected_value
check_early_stopping(model, eval_set=eval_set) # raise CBXError if misconfigured
io — lossless save/load
from catboost_utils.io import save, load
save(model, "artifact.cbm") # writes artifact.cbm + artifact.cbm.meta.json
restored = load("artifact.cbm") # restores best_iteration, feature_names, etc.
The sidecar bundles best_iteration, feature_names, cat_features, class_names, training params, and version info. load() works without a sidecar (logs a warning) so external .cbm files keep loading.
callbacks — exception-safe wrapper
from catboost_utils.callbacks import safe
cb = safe(my_callback)
model.fit(X, y, callbacks=[cb])
cb.raise_if_failed() # surfaces any exception your callback raised, with original traceback
Principles
- Backwards compatible — anything that works in CatBoost works through
catboost_utils. - Opt-in — every module is independent. Use what you need; ignore the rest.
- No magic — no silent data transformations. Auto-fixes are always parameters the user passes explicitly (
nan_fill="...",auto_cat_features=True). - Strict typing — every public function fully annotated;
mypy --strictclean.
Compatibility
- Python: 3.10, 3.11, 3.12
- CatBoost: ≥ 1.2, < 2.0
- sklearn: 1.3+ (optional, only for the
pipelinemodule)
Versioning
Pre-1.0 (0.x.y). Any minor bump may include breaking changes — see CHANGELOG.md. 1.0.0 will be cut once the public API is frozen and CI is green across the matrix.
License
Apache 2.0 — see LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file catboost_utils-0.1.0.tar.gz.
File metadata
- Download URL: catboost_utils-0.1.0.tar.gz
- Upload date:
- Size: 242.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
441323118626a5ca5b679c8a5bd4a30fd1e7e542bac8017e038422dfc3b0708e
|
|
| MD5 |
022238444f4b32ebf3239eec8eb81c74
|
|
| BLAKE2b-256 |
2f2788aaa57cb650bdb8cca65b554086238c1fd42417476c96c59473fadb7c1d
|
File details
Details for the file catboost_utils-0.1.0-py3-none-any.whl.
File metadata
- Download URL: catboost_utils-0.1.0-py3-none-any.whl
- Upload date:
- Size: 52.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6190f5b7fef74457796e1ff224898dd1c47c7d90507715a4b801e678ab481220
|
|
| MD5 |
00e7a8b5b958c344a809f81235a07075
|
|
| BLAKE2b-256 |
0078c3bb01347febd78d02b4cd77a41df40c6a7343e6e95bc01f529734c9cbea
|