Xgboost Label Encoding

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language

Project description

xgboost-label-encoding

xgboost-label-encoding provides small sklearn-style wrappers around xgboost.XGBClassifier for classification workflows where the target labels are strings or other non-numeric values.

XGBoost trains on numeric class labels. This package encodes y during fit, trains the underlying XGBoost classifier, and decodes predictions back to the original labels. It is intended to be used as a drop-in estimator in places where manually applying sklearn.preprocessing.LabelEncoder to the target would be awkward.

Installation

pip install xgboost_label_encoding

The package requires Python 3.8+ and installs against xgboost<2.

For local development:

pip install -r requirements_dev.txt
pip install -e .
make test

Usage

Use XGBoostClassifierWithLabelEncoding in place of xgboost.XGBClassifier:

from xgboost_label_encoding import XGBoostClassifierWithLabelEncoding

clf = XGBoostClassifierWithLabelEncoding(
    n_estimators=100,
    class_weight="balanced",
)

clf.fit(X_train, y_train)  # y_train may contain labels like "Healthy" or "HIV"

labels = clf.predict(X_test)
probabilities = clf.predict_proba(X_test)
classes = clf.classes_

Most XGBoost classifier parameters are passed through unchanged. The wrapper adds these project-specific options:

class_weight: passed to sklearn.utils.class_weight.compute_sample_weight; if sample_weight is also supplied, the two weights are multiplied.
fail_if_nothing_learned: defaults to True; raises ValueError after fitting if all feature importances are zero.

Cross-Validated Fitting

XGBoostClassifierWithLabelEncodingWithCV combines label encoding with cross-validation over XGBoost parameters:

from sklearn.model_selection import StratifiedKFold
from xgboost_label_encoding import XGBoostClassifierWithLabelEncodingWithCV

cv = StratifiedKFold(n_splits=5, shuffle=True, random_state=42)

clf = XGBoostClassifierWithLabelEncodingWithCV(
    cv=cv,
    max_num_trees=200,
    early_stopping_patience=10,
    class_weight="balanced",
)

clf.fit(X_train, y_train)

During fit, the CV wrapper:

builds a small default grid of learning_rate and min_child_weight values unless param_grid is provided;
runs xgboost.cv with early stopping for each parameter set;
selects the best parameter set and number of boosting rounds;
fits the final classifier on the full training data.

If the provided CV splitter accepts a groups argument, groups can be passed to fit.

Behavior And Limitations

Training data must contain at least two classes.
predict returns original labels, not encoded integers.
predict_proba returns one probability column per class in clf.classes_.
For pandas DataFrame inputs, feature names containing [, ], or < are renamed internally before reaching XGBoost. feature_names_in_ still exposes the original feature names, and the same renaming is applied during predict and predict_proba.
XGBoostCV is also available as a standalone helper for numeric-label XGBoost classification with CV-selected hyperparameters and tree count.

Development

Useful local commands:

make test
make lint
make docs
make dist

Changelog

0.0.1

First release on PyPI.

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 2 - Pre-Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Natural Language
- English
Programming Language

Release history Release notifications | RSS feed

This version

0.0.7

May 30, 2026

0.0.6

Jan 4, 2024

0.0.5

Jan 2, 2024

0.0.4

Dec 25, 2023

0.0.3

Oct 25, 2023

0.0.2

Jul 23, 2023

0.0.1

Apr 12, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xgboost_label_encoding-0.0.7.tar.gz (19.8 kB view details)

Uploaded May 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

xgboost_label_encoding-0.0.7-py2.py3-none-any.whl (10.4 kB view details)

Uploaded May 30, 2026 Python 2Python 3

File details

Details for the file xgboost_label_encoding-0.0.7.tar.gz.

File metadata

Download URL: xgboost_label_encoding-0.0.7.tar.gz
Upload date: May 30, 2026
Size: 19.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for xgboost_label_encoding-0.0.7.tar.gz
Algorithm	Hash digest
SHA256	`29bb4cc17dbc26ce12a644eae3fd2686bce5d8ba06eb22c80c743ed7fe31393c`
MD5	`2a14267fcd380dfa7f67d8ea0d47bb72`
BLAKE2b-256	`199ec063be874cb0447225cacb21836da620df5a4223beb2d1f221d80e4295de`

See more details on using hashes here.

File details

Details for the file xgboost_label_encoding-0.0.7-py2.py3-none-any.whl.

File metadata

Download URL: xgboost_label_encoding-0.0.7-py2.py3-none-any.whl
Upload date: May 30, 2026
Size: 10.4 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for xgboost_label_encoding-0.0.7-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`4df30a411e15cdf9091d57a4542112330792d27e4a15d097caed705091f441f3`
MD5	`8e215c86279e9fc4821f5ce73e8129cf`
BLAKE2b-256	`cbf7f65347700925b2e69fa6c420d35e98f4cb9e3cdc74659803616f8719dc45`

See more details on using hashes here.

xgboost-label-encoding 0.0.7

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

xgboost-label-encoding

Installation

Usage

Cross-Validated Fitting

Behavior And Limitations

Development

Changelog

0.0.1

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes