Xgboost Label Encoding
Project description
xgboost-label-encoding
xgboost-label-encoding provides small sklearn-style wrappers around
xgboost.XGBClassifier for classification workflows where the target labels are
strings or other non-numeric values.
XGBoost trains on numeric class labels. This package encodes y during fit,
trains the underlying XGBoost classifier, and decodes predictions back to the
original labels. It is intended to be used as a drop-in estimator in places where
manually applying sklearn.preprocessing.LabelEncoder to the target would be
awkward.
Installation
pip install xgboost_label_encoding
The package requires Python 3.8+ and installs against xgboost<2.
For local development:
pip install -r requirements_dev.txt
pip install -e .
make test
Usage
Use XGBoostClassifierWithLabelEncoding in place of xgboost.XGBClassifier:
from xgboost_label_encoding import XGBoostClassifierWithLabelEncoding
clf = XGBoostClassifierWithLabelEncoding(
n_estimators=100,
class_weight="balanced",
)
clf.fit(X_train, y_train) # y_train may contain labels like "Healthy" or "HIV"
labels = clf.predict(X_test)
probabilities = clf.predict_proba(X_test)
classes = clf.classes_
Most XGBoost classifier parameters are passed through unchanged. The wrapper adds these project-specific options:
class_weight: passed tosklearn.utils.class_weight.compute_sample_weight; ifsample_weightis also supplied, the two weights are multiplied.fail_if_nothing_learned: defaults toTrue; raisesValueErrorafter fitting if all feature importances are zero.
Cross-Validated Fitting
XGBoostClassifierWithLabelEncodingWithCV combines label encoding with
cross-validation over XGBoost parameters:
from sklearn.model_selection import StratifiedKFold
from xgboost_label_encoding import XGBoostClassifierWithLabelEncodingWithCV
cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)
clf = XGBoostClassifierWithLabelEncodingWithCV(
cv=cv,
max_num_trees=200,
early_stopping_patience=10,
class_weight="balanced",
)
clf.fit(X_train, y_train)
During fit, the CV wrapper:
- builds a small default grid of
learning_rateandmin_child_weightvalues unlessparam_gridis provided; - runs
xgboost.cvwith early stopping for each parameter set; - selects the best parameter set and number of boosting rounds;
- fits the final classifier on the full training data.
If the provided CV splitter accepts a groups argument, groups can be passed
to fit.
Behavior And Limitations
- Training data must contain at least two classes.
predictreturns original labels, not encoded integers.predict_probareturns one probability column per class inclf.classes_.- For pandas DataFrame inputs, feature names containing
[,], or<are renamed internally before reaching XGBoost.feature_names_in_still exposes the original feature names, and the same renaming is applied duringpredictandpredict_proba. XGBoostCVis also available as a standalone helper for numeric-label XGBoost classification with CV-selected hyperparameters and tree count.
Development
Useful local commands:
make test
make lint
make docs
make dist
Changelog
0.0.1
- First release on PyPI.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file xgboost_label_encoding-0.0.7.tar.gz.
File metadata
- Download URL: xgboost_label_encoding-0.0.7.tar.gz
- Upload date:
- Size: 19.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
29bb4cc17dbc26ce12a644eae3fd2686bce5d8ba06eb22c80c743ed7fe31393c
|
|
| MD5 |
2a14267fcd380dfa7f67d8ea0d47bb72
|
|
| BLAKE2b-256 |
199ec063be874cb0447225cacb21836da620df5a4223beb2d1f221d80e4295de
|
File details
Details for the file xgboost_label_encoding-0.0.7-py2.py3-none-any.whl.
File metadata
- Download URL: xgboost_label_encoding-0.0.7-py2.py3-none-any.whl
- Upload date:
- Size: 10.4 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.9.25
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4df30a411e15cdf9091d57a4542112330792d27e4a15d097caed705091f441f3
|
|
| MD5 |
8e215c86279e9fc4821f5ce73e8129cf
|
|
| BLAKE2b-256 |
cbf7f65347700925b2e69fa6c420d35e98f4cb9e3cdc74659803616f8719dc45
|