Skip to main content

xgbauto: tuning xgboost with optuna, autoxgb with aucpr for binary classification

Project description

AutoXGB

XGBoost + Optuna: no brainer

  • auto train xgboost directly from CSV files
  • auto tune xgboost using optuna
  • auto serve best xgboot model using fastapi

NOTE: PRs are currently not accepted. If there are issues/problems, please create an issue.

Installation

Install using pip

pip install xgbauto

Usage

Training a model using AutoXGB is a piece of cake. All you need is some tabular data.

Parameters

###############################################################################
### required parameters
###############################################################################

# path to training data
train_filename = "data_samples/binary_classification.csv"

# path to output folder to store artifacts
output = "output"

###############################################################################
### optional parameters
###############################################################################

# path to test data. if specified, the model will be evaluated on the test data
# and test_predictions.csv will be saved to the output folder
# if not specified, only OOF predictions will be saved
# test_filename = "test.csv"
test_filename = None

# task: classification or regression
# if not specified, the task will be inferred automatically
# task = "classification"
# task = "regression"
task = None

# an id column
# if not specified, the id column will be generated automatically with the name `id`
# idx = "id"
idx = None

# target columns are list of strings
# if not specified, the target column be assumed to be named `target`
# and the problem will be treated as one of: binary classification, multiclass classification,
# or single column regression
# targets = ["target"]
# targets = ["target1", "target2"]
targets = ["income"]

# features columns are list of strings
# if not specified, all columns except `id`, `targets` & `kfold` columns will be used
# features = ["col1", "col2"]
features = None

# categorical_features are list of strings
# if not specified, categorical columns will be inferred automatically
# categorical_features = ["col1", "col2"]
categorical_features = None

# use_gpu is boolean
# if not specified, GPU is not used
# use_gpu = True
# use_gpu = False
use_gpu = True

# number of folds to use for cross-validation
# default is 5
num_folds = 5

# random seed for reproducibility
# default is 42
seed = 42

# number of optuna trials to run
# default is 1000
# num_trials = 1000
num_trials = 100

# time_limit for optuna trials in seconds
# if not specified, timeout is not set and all trials are run
# time_limit = None
time_limit = 360

# if fast is set to True, the hyperparameter tuning will use only one fold
# however, the model will be trained on all folds in the end
# to generate OOF predictions and test predictions
# default is False
# fast = False
fast = False

Python API

To train a new model, you can run:

from xgbauto import AutoXGB


# required parameters:
train_filename = "data_samples/binary_classification.csv"
output = "output"

# optional parameters
test_filename = None
task = None
idx = None
targets = ["income"]
features = None
categorical_features = None
use_gpu = True
num_folds = 5
seed = 42
num_trials = 100
time_limit = 360
fast = False

# Now its time to train the model!
axgb = AutoXGB(
    train_filename=train_filename,
    output=output,
    test_filename=test_filename,
    task=task,
    idx=idx,
    targets=targets,
    features=features,
    categorical_features=categorical_features,
    use_gpu=use_gpu,
    num_folds=num_folds,
    seed=seed,
    num_trials=num_trials,
    time_limit=time_limit,
    fast=fast,
)
axgb.train()

CLI

Train the model using the autoxgb train command. The parameters are same as above.

xgbauto train \
 --train_filename datasets/30train.csv \
 --output outputs/30days \
 --test_filename datasets/30test.csv \
 --use_gpu

You can also serve the trained model using the autoxgb serve command.

xgbauto serve --model_path outputs/mll --host 0.0.0.0 --debug

To know more about a command, run:

`xgbauto <command> --help` 
xgbauto train --help


usage: xgbauto <command> [<args>] train [-h] --train_filename TRAIN_FILENAME [--test_filename TEST_FILENAME] --output
                                        OUTPUT [--task {classification,regression}] [--idx IDX] [--targets TARGETS]
                                        [--num_folds NUM_FOLDS] [--features FEATURES] [--use_gpu] [--fast]
                                        [--seed SEED] [--time_limit TIME_LIMIT]

optional arguments:
  -h, --help            show this help message and exit
  --train_filename TRAIN_FILENAME
                        Path to training file
  --test_filename TEST_FILENAME
                        Path to test file
  --output OUTPUT       Path to output directory
  --task {classification,regression}
                        User defined task type
  --idx IDX             ID column
  --targets TARGETS     Target column(s). If there are multiple targets, separate by ';'
  --num_folds NUM_FOLDS
                        Number of folds to use
  --features FEATURES   Features to use, separated by ';'
  --use_gpu             Whether to use GPU for training
  --fast                Whether to use fast mode for tuning params. Only one fold will be used if fast mode is set
  --seed SEED           Random seed
  --time_limit TIME_LIMIT
                        Time limit for optimization

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

autoxgb-aucpr-bc-1.0.0.tar.gz (18.1 kB view details)

Uploaded Source

Built Distribution

autoxgb_aucpr_bc-1.0.0-py3-none-any.whl (21.4 kB view details)

Uploaded Python 3

File details

Details for the file autoxgb-aucpr-bc-1.0.0.tar.gz.

File metadata

  • Download URL: autoxgb-aucpr-bc-1.0.0.tar.gz
  • Upload date:
  • Size: 18.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.8

File hashes

Hashes for autoxgb-aucpr-bc-1.0.0.tar.gz
Algorithm Hash digest
SHA256 0b9a17fb497c4ae0194f3806d64d13288a98c65142f64ae243ad35c37fc1a854
MD5 180d5a9a71ad8bc693f37af674a4e892
BLAKE2b-256 47a440177103bcb63a59f5a1ecd374b4ad46efb6b0095910cacd572a1d72dff0

See more details on using hashes here.

File details

Details for the file autoxgb_aucpr_bc-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for autoxgb_aucpr_bc-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 69fc192637800697e28e1c833e4eec1619c6bcc3f1d1a993b5d31bdca223af0b
MD5 487ef24260eed5bf03892173ac92b26c
BLAKE2b-256 93e7eee89a529b6d99abab632ed49f2efb95095eff150c795049a1fcce416014

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page