Skip to main content

vespatune: no-code training for tabular models

Project description

VespaTune

Gradient Boosting + Optuna: no brainer

  • Web UI for training, monitoring, and managing models
  • Tune models directly from CSV files
  • Real-time training progress with WebSocket updates
  • Export models to ONNX format for deployment

Installation

Install using pip:

pip install vespatune

Quick Start

Web UI (Recommended)

Start the web interface:

vespatune

This launches the VespaTune UI at http://127.0.0.1:9999 where you can:

  • Upload train/validation CSV files
  • Configure model type, target columns, and hyperparameters
  • Start training with real-time progress monitoring
  • View trial results and metrics
  • Download trained models and artifacts
  • Manage multiple training runs

You can also specify host and port:

vespatune --host 0.0.0.0 --port 8080

CLI

Train a model:

vespatune train \
  --train_filename train.csv \
  --valid_filename valid.csv \
  --output outputs/my_model \
  --model xgboost

Make predictions:

vespatune predict \
  --model_path outputs/my_model \
  --test_filename test.csv \
  --output_filename predictions.csv

Serve a trained model for predictions:

vespatune serve --model_path outputs/my_model --host 0.0.0.0 --port 8000

Python API

from vespatune import VespaTune

vtune = VespaTune(
    train_filename="train.csv",
    valid_filename="valid.csv",
    output="outputs/my_model",
    model_type="xgboost",  # or "lightgbm" or "catboost"
    targets=["target"],
    num_trials=100,
    time_limit=3600,
)
vtune.train()

Web UI Features

The web interface provides:

  • File Upload: Drag and drop CSV files for training and validation
  • Auto Column Detection: Automatically detects columns for target and ID selection
  • Model Selection: Choose between XGBoost, LightGBM, or CatBoost
  • Real-time Monitoring: Watch training progress with live trial updates via WebSocket
  • Metrics Visualization: View loss curves and hyperparameter importance
  • Run Management: Start, stop, and delete training runs
  • Artifact Downloads: Download trained models, configs, and ONNX exports

Parameters

Required

Parameter Description
train_filename Path to training CSV file
valid_filename Path to validation CSV file
output Path to output directory for model artifacts

Optional

Parameter Default Description
model_type "xgboost" Model to use: "xgboost", "lightgbm", or "catboost"
test_filename None Path to test CSV file (predictions saved if provided)
task None "classification" or "regression" (auto-detected if not specified)
idx "id" Name of the ID column
targets ["target"] List of target column names
features None List of feature columns (all non-id/target columns if not specified)
categorical_features None List of categorical columns (auto-detected if not specified)
use_gpu False Whether to use GPU for training
seed 42 Random seed for reproducibility
num_trials 1000 Number of Optuna trials for hyperparameter tuning
time_limit None Time limit for optimization in seconds

Supported Models

XGBoost

  • Default model with extensive hyperparameter search
  • Supports GPU acceleration
  • Best for general-purpose tasks

LightGBM

  • Native categorical feature support
  • Fast training on large datasets
  • Supports GPU acceleration

CatBoost

  • Best native categorical feature handling
  • Robust to overfitting
  • Supports GPU acceleration

Data Splitting

VespaTune uses an explicit train/validation split. If you have a single dataset, use the splitter utility:

vespatune splitter \
  --data_filename data.csv \
  --output splits/ \
  --target target \
  --task classification \
  --num_folds 5

Or via Python:

from vespatune import VespaTuneSplitter

splitter = VespaTuneSplitter(
    data_filename="data.csv",
    output="splits/",
    target="target",
    task="classification",
    num_folds=5,
)
splitter.split()

This creates fold_0_train.csv, fold_0_valid.csv, etc. for k-fold cross-validation.

Prediction

Using the trained model

from vespatune import VespaTunePredict

predictor = VespaTunePredict(model_path="outputs/my_model")
predictions = predictor.predict_file("test.csv")

Using ONNX model

from vespatune import VespaTuneONNXPredict

predictor = VespaTuneONNXPredict(model_path="onnx_model/")
predictions = predictor.predict_file("test.csv")

CLI Reference

Default (UI)

vespatune [--host HOST] [--port PORT]

options:
  --host                Host to serve on (default: 127.0.0.1)
  --port                Port to serve on (default: 9999)
  --version, -v         Display VespaTune version

train

vespatune train --help

options:
  --train_filename      Path to training file (required)
  --valid_filename      Path to validation file (required)
  --output              Path to output directory (required)
  --model               Model type: xgboost, lightgbm, catboost (default: xgboost)
  --test_filename       Path to test file
  --task                Task type: classification, regression
  --idx                 ID column name
  --targets             Target column(s), separate multiple by ';'
  --features            Feature columns, separate by ';'
  --use_gpu             Use GPU for training
  --seed                Random seed (default: 42)
  --num_trials          Number of Optuna trials (default: 100)
  --time_limit          Time limit in seconds

predict

vespatune predict --help

options:
  --model_path          Path to trained model directory (required)
  --test_filename       Path to test file (required)
  --output_filename     Path to output predictions file (required)

export

vespatune export --help

options:
  --model_path          Path to trained model directory (required)
  --output_dir          Path to ONNX output directory

serve

vespatune serve --help

options:
  --model_path          Path to ONNX export directory
  --host                Host to bind (default: 127.0.0.1)
  --port                Port to bind (default: 9999)
  --workers             Number of workers (default: 1)
  --reload              Enable auto-reload for development

splitter

vespatune splitter --help

options:
  --data_filename       Path to data file (required)
  --output              Path to output directory (required)
  --target              Target column name (required)
  --task                Task type: classification, regression (required)
  --num_folds           Number of folds (default: 5)

Output Files

After training, the following files are created in the output directory:

File Description
vtune_model.final Trained model
vtune.config Model configuration
vtune.best_params Best hyperparameters from Optuna
vtune.categorical_encoder Categorical feature encoder
vtune.target_encoder Target encoder (for classification)
params.db Optuna study database
train.feather Processed training data
valid.feather Processed validation data
onnx/ ONNX export directory (after export)

Example

from vespatune import VespaTune

# Train with LightGBM
vtune = VespaTune(
    train_filename="data/train.csv",
    valid_filename="data/valid.csv",
    output="outputs/lgb_model",
    model_type="lightgbm",
    targets=["price"],
    task="regression",
    num_trials=200,
    time_limit=1800,
    use_gpu=False,
    seed=42,
)
vtune.train()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vespatune-0.0.2.tar.gz (58.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vespatune-0.0.2-py3-none-any.whl (59.5 kB view details)

Uploaded Python 3

File details

Details for the file vespatune-0.0.2.tar.gz.

File metadata

  • Download URL: vespatune-0.0.2.tar.gz
  • Upload date:
  • Size: 58.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for vespatune-0.0.2.tar.gz
Algorithm Hash digest
SHA256 b5940d2a8882154bb5ce06050fdc840948a40ae561a09dbe7bffa0e2121e8fce
MD5 2222e46e7d315940fdcc53e085d7528c
BLAKE2b-256 3c978b75cff152efe68726a1ec9d2937bd22d0e845891603e7941c6f06f70186

See more details on using hashes here.

File details

Details for the file vespatune-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: vespatune-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 59.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for vespatune-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 90ccb39ce4c4dbaeca07eb36a0e060ccc336a710c4173ae545b3d0ce295e510a
MD5 1ff54c5e851819c71ad61979ab1ecc0a
BLAKE2b-256 ffca88d5b05769843b9b3c52b44e898dff62916efb4999ab39212d3a3794b176

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page