Skip to main content

A gradient boosted tree library with automatic feature engineering.

Project description

gbt is a library for gradient boosted trees with minimal coding required. It is a thin wrapper around lightgbm.

Features

  • Zero feature engineering needed - automatic encoding of categorical features
  • Built-in train/validation splitting
  • Automatic artifact saving and loading
  • Pre-configured model presets for common tasks (binary, multiclass, regression)

What you need:

  • a pandas dataframe,
  • the target column to predict on,
  • categorical feature columns (can be empty),
  • numerical feature columns (can be empty, but you should have at least one categorical or numerical feature),
  • model_lib: "binary", "multiclass", "mape", "l2" to specify what type of prediction objective and default hyperparameters to use.

You don't need to (though you are welcome to):

  • normalize the numerical feature values
  • construct the encoder to one-hot encode categorical features
  • manage saving of artifacts for above feature transformation
  • implement evaluation metrics

Prerequisites

  • Python 3.7+
  • pandas, numpy, scikit-learn, lightgbm

Install

pip install gbt

Quickstart

import pandas as pd
from gbt import train

# Your data
df = pd.DataFrame({
    "feature_1": [1, 2, 3, 4, 5],
    "category": ["A", "B", "A", "C", "B"], 
    "target": [0, 1, 0, 1, 1]
})

# Train model
model = train(
    df,
    model_lib="binary",  # binary, multiclass, mape, or l2
    label_column="target",
    categorical_feature_columns=["category"],
    numerical_feature_columns=["feature_1"]
)

# Make predictions
new_data = pd.DataFrame({
    "feature_1": [6, 7],
    "category": ["A", "B"]
})
predictions = model.predict(new_data)
print(predictions)  # [0.23, 0.78]

Save and Load Models

# Save model
model.save("my_model")

# Load model later
from gbt import load
loaded_model = load("my_model")
predictions = loaded_model.predict(new_data)

Advanced Usage

For more control over training:

from gbt import TrainingPipeline

# Custom training configuration  
pipeline = TrainingPipeline(
    categorical_feature_columns=["category"],
    numerical_feature_columns=["feature_1"],
    params_preset="binary",
    params_override={"num_leaves": 50},  # Custom hyperparameters
    val_size=0.3,  # 30% validation split
    verbose=False   # Quiet training
)

# Train with custom data loader
class DatasetBuilder:
    def training_dataset(self):
        return pd.read_csv("train.csv")
    
    def testing_dataset(self):
        return pd.read_csv("test.csv")

pipeline.fit(DatasetBuilder())

# Get model for deployment
model = pipeline.create_model()
model.save("production_model")

API Reference

Main Functions

train(df, model_lib="l2", label_column, categorical_feature_columns, numerical_feature_columns, **kwargs)

Train a gradient boosting model.

Parameters:

  • df: Training dataframe
  • model_lib: Model type - "binary", "multiclass", "mape", or "l2"
  • label_column: Target column name
  • categorical_feature_columns: List of categorical feature names
  • numerical_feature_columns: List of numerical feature names
  • val_size: Validation split fraction (default 0.2)
  • log_dir: Directory to save artifacts (optional)

Returns: Trained model ready for prediction

load(path)

Load a saved model.

Returns: Model ready for inference

Model Types

model_lib Use Case Loss Function
"binary" Binary classification Log loss
"multiclass" Multi-class classification Multi-class log loss
"l2" Regression Mean squared error
"mape" Regression Mean absolute percentage error

Model Methods

  • .predict(df): Make predictions
  • .save(path): Save model to disk

For advanced usage, see TrainingPipeline class documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gbt-0.3.tar.gz (15.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gbt-0.3-py3-none-any.whl (16.0 kB view details)

Uploaded Python 3

File details

Details for the file gbt-0.3.tar.gz.

File metadata

  • Download URL: gbt-0.3.tar.gz
  • Upload date:
  • Size: 15.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gbt-0.3.tar.gz
Algorithm Hash digest
SHA256 18409fc5171be22f84cb008ce0d3a3b08de5d763566e8cc5d2c91e31ee69a58b
MD5 eb62d7d69642635cce8e5707522d6ba8
BLAKE2b-256 0c11c0224ea66bf363c7c7310e69a2dc9d1745644d3b57d4d693a8601985f388

See more details on using hashes here.

File details

Details for the file gbt-0.3-py3-none-any.whl.

File metadata

  • Download URL: gbt-0.3-py3-none-any.whl
  • Upload date:
  • Size: 16.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gbt-0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 10aa94a9e76711896b6f5ab84229b93ebfc1555f20a925d57d3fa49c13381dad
MD5 e6f6e82cf63e69447a679a2d84ed8913
BLAKE2b-256 ff1e761ed6a57fa3104981ba140f969c6643fa14db0686b959c10aa2bb9c3e8c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page