A gradient boosted tree library with automatic feature engineering.
Project description
gbt is a library for gradient boosted trees with minimal coding required. It is a thin wrapper around lightgbm.
Features
- Zero feature engineering needed - automatic encoding of categorical features
- Built-in train/validation splitting
- Automatic artifact saving and loading
- Pre-configured model presets for common tasks (binary, multiclass, regression)
What you need:
- a pandas dataframe,
- the target column to predict on,
- categorical feature columns (can be empty),
- numerical feature columns (can be empty, but you should have at least one categorical or numerical feature),
- model_lib: "binary", "multiclass", "mape", "l2" to specify what type of prediction objective and default hyperparameters to use.
You don't need to (though you are welcome to):
- normalize the numerical feature values
- construct the encoder to one-hot encode categorical features
- manage saving of artifacts for above feature transformation
- implement evaluation metrics
Prerequisites
- Python 3.7+
- pandas, numpy, scikit-learn, lightgbm
Install
pip install gbt
Quickstart
import pandas as pd
from gbt import train
# Your data
df = pd.DataFrame({
"feature_1": [1, 2, 3, 4, 5],
"category": ["A", "B", "A", "C", "B"],
"target": [0, 1, 0, 1, 1]
})
# Train model
model = train(
df,
model_lib="binary", # binary, multiclass, mape, or l2
label_column="target",
categorical_feature_columns=["category"],
numerical_feature_columns=["feature_1"]
)
# Make predictions
new_data = pd.DataFrame({
"feature_1": [6, 7],
"category": ["A", "B"]
})
predictions = model.predict(new_data)
print(predictions) # [0.23, 0.78]
Save and Load Models
# Save model
model.save("my_model")
# Load model later
from gbt import load
loaded_model = load("my_model")
predictions = loaded_model.predict(new_data)
Advanced Usage
For more control over training:
from gbt import TrainingPipeline
# Custom training configuration
pipeline = TrainingPipeline(
categorical_feature_columns=["category"],
numerical_feature_columns=["feature_1"],
params_preset="binary",
params_override={"num_leaves": 50}, # Custom hyperparameters
val_size=0.3, # 30% validation split
verbose=False # Quiet training
)
# Train with custom data loader
class DatasetBuilder:
def training_dataset(self):
return pd.read_csv("train.csv")
def testing_dataset(self):
return pd.read_csv("test.csv")
pipeline.fit(DatasetBuilder())
# Get model for deployment
model = pipeline.create_model()
model.save("production_model")
API Reference
Main Functions
train(df, model_lib="l2", label_column, categorical_feature_columns, numerical_feature_columns, **kwargs)
Train a gradient boosting model.
Parameters:
df: Training dataframemodel_lib: Model type -"binary","multiclass","mape", or"l2"label_column: Target column namecategorical_feature_columns: List of categorical feature namesnumerical_feature_columns: List of numerical feature namesval_size: Validation split fraction (default 0.2)log_dir: Directory to save artifacts (optional)
Returns: Trained model ready for prediction
load(path)
Load a saved model.
Returns: Model ready for inference
Model Types
model_lib |
Use Case | Loss Function |
|---|---|---|
"binary" |
Binary classification | Log loss |
"multiclass" |
Multi-class classification | Multi-class log loss |
"l2" |
Regression | Mean squared error |
"mape" |
Regression | Mean absolute percentage error |
Model Methods
.predict(df): Make predictions.save(path): Save model to disk
For advanced usage, see TrainingPipeline class documentation.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gbt-0.3.tar.gz.
File metadata
- Download URL: gbt-0.3.tar.gz
- Upload date:
- Size: 15.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
18409fc5171be22f84cb008ce0d3a3b08de5d763566e8cc5d2c91e31ee69a58b
|
|
| MD5 |
eb62d7d69642635cce8e5707522d6ba8
|
|
| BLAKE2b-256 |
0c11c0224ea66bf363c7c7310e69a2dc9d1745644d3b57d4d693a8601985f388
|
File details
Details for the file gbt-0.3-py3-none-any.whl.
File metadata
- Download URL: gbt-0.3-py3-none-any.whl
- Upload date:
- Size: 16.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
10aa94a9e76711896b6f5ab84229b93ebfc1555f20a925d57d3fa49c13381dad
|
|
| MD5 |
e6f6e82cf63e69447a679a2d84ed8913
|
|
| BLAKE2b-256 |
ff1e761ed6a57fa3104981ba140f969c6643fa14db0686b959c10aa2bb9c3e8c
|