Skip to main content

vespatune: no-code training for tabular models

Project description

VespaTune

Gradient Boosting + Optuna: no brainer

  • Web UI for training, monitoring, and managing models
  • Tune models directly from CSV files
  • Real-time training progress with WebSocket updates
  • Export models to ONNX format for deployment

Installation

Install using pip:

pip install vespatune

Quick Start

Web UI (Recommended)

Start the web interface:

vespatune

This launches the VespaTune UI at http://127.0.0.1:9999 where you can:

  • Upload train/validation CSV files
  • Configure model type, target columns, and hyperparameters
  • Start training with real-time progress monitoring
  • View trial results and metrics
  • Download trained models and artifacts
  • Manage multiple training runs

You can also specify host and port:

vespatune --host 0.0.0.0 --port 8080

CLI

Train a model:

vespatune train \
  --train_filename train.csv \
  --valid_filename valid.csv \
  --output outputs/my_model \
  --model xgboost

Make predictions:

vespatune predict \
  --model_path outputs/my_model \
  --test_filename test.csv \
  --output_filename predictions.csv

Serve a trained model for predictions:

vespatune serve --model_path outputs/my_model --host 0.0.0.0 --port 8000

Python API

from vespatune import VespaTune

vtune = VespaTune(
    train_filename="train.csv",
    valid_filename="valid.csv",
    output="outputs/my_model",
    model_type="xgboost",  # or "lightgbm" or "catboost"
    targets=["target"],
    num_trials=100,
    time_limit=3600,
)
vtune.train()

Web UI Features

The web interface provides:

  • File Upload: Drag and drop CSV files for training and validation
  • Auto Column Detection: Automatically detects columns for target and ID selection
  • Model Selection: Choose between XGBoost, LightGBM, or CatBoost
  • Real-time Monitoring: Watch training progress with live trial updates via WebSocket
  • Metrics Visualization: View loss curves and hyperparameter importance
  • Run Management: Start, stop, and delete training runs
  • Artifact Downloads: Download trained models, configs, and ONNX exports

Parameters

Required

Parameter Description
train_filename Path to training CSV file
valid_filename Path to validation CSV file
output Path to output directory for model artifacts

Optional

Parameter Default Description
model_type "xgboost" Model to use: "xgboost", "lightgbm", "catboost", or "logreg"
test_filename None Path to test CSV file (predictions saved if provided)
task None "classification" or "regression" (auto-detected if not specified)
idx "id" Name of the ID column
targets ["target"] List of target column names
features None List of feature columns (all non-id/target columns if not specified)
categorical_features None List of categorical columns (auto-detected if not specified)
use_gpu False Whether to use GPU for training
seed 42 Random seed for reproducibility
num_trials 1000 Number of Optuna trials for hyperparameter tuning
time_limit None Time limit for optimization in seconds

Supported Models

XGBoost

  • Default model with extensive hyperparameter search
  • Supports GPU acceleration
  • Best for general-purpose tasks

LightGBM

  • Native categorical feature support
  • Fast training on large datasets
  • Supports GPU acceleration

CatBoost

  • Best native categorical feature handling
  • Robust to overfitting
  • Supports GPU acceleration

Logistic Regression

  • Linear model for classification tasks only
  • Searches over preprocessing (imputation, scaling) and regularization
  • Fast training, interpretable coefficients

Data Splitting

VespaTune uses an explicit train/validation split. If you have a single dataset, use the splitter utility:

vespatune splitter \
  --data_filename data.csv \
  --output splits/ \
  --target target \
  --task classification \
  --num_folds 5

Or via Python:

from vespatune import VespaTuneSplitter

splitter = VespaTuneSplitter(
    data_filename="data.csv",
    output="splits/",
    target="target",
    task="classification",
    num_folds=5,
)
splitter.split()

This creates fold_0_train.csv, fold_0_valid.csv, etc. for k-fold cross-validation.

Prediction

Using the trained model

from vespatune import VespaTunePredict

predictor = VespaTunePredict(model_path="outputs/my_model")

# Predict on file
predictor.predict_file("test.csv", "predictions.csv")

# Predict single sample
prediction = predictor.predict_single({"feature1": 1.0, "feature2": "A"})

Using ONNX model

from vespatune import VespaTuneONNXPredict

predictor = VespaTuneONNXPredict(model_path="onnx_model/")

# Predict on file
predictor.predict_file("test.csv", "predictions.csv")

# Predict single sample
prediction = predictor.predict_single({"feature1": 1.0, "feature2": "A"})

Standalone Preprocessing

Use VespaTuneProcessor when you want to preprocess data independently and pass it to an external ONNX runtime or inference system:

from vespatune import VespaTuneProcessor
import onnxruntime as ort

# Load preprocessor from model or ONNX export directory
processor = VespaTuneProcessor(model_path="outputs/my_model")

# Transform DataFrame
processed = processor.transform(df)  # Returns float32 numpy array

# Transform single sample
processed = processor.transform_single({"feature1": 1.0, "feature2": "A"})

# Get feature metadata
processor.get_feature_names()        # Input feature names
processor.get_categorical_features() # Categorical feature names
processor.get_feature_names_out()    # Output feature names after transform
processor.get_input_schema()         # Pydantic schema for API validation

# Pass to ONNX runtime
session = ort.InferenceSession("model.onnx")
predictions = session.run(None, {"input": processed})

CLI Reference

Default (UI)

vespatune [--host HOST] [--port PORT]

options:
  --host                Host to serve on (default: 127.0.0.1)
  --port                Port to serve on (default: 9999)
  --version, -v         Display VespaTune version

train

vespatune train --help

options:
  --train_filename      Path to training file (required)
  --valid_filename      Path to validation file (required)
  --output              Path to output directory (required)
  --model               Model type: xgboost, lightgbm, catboost, logreg (default: xgboost)
  --test_filename       Path to test file
  --task                Task type: classification, regression
  --idx                 ID column name
  --targets             Target column(s), separate multiple by ';'
  --features            Feature columns, separate by ';'
  --use_gpu             Use GPU for training
  --seed                Random seed (default: 42)
  --num_trials          Number of Optuna trials (default: 100)
  --time_limit          Time limit in seconds

predict

vespatune predict --help

options:
  --model_path          Path to trained model directory (required)
  --test_filename       Path to test file (required)
  --output_filename     Path to output predictions file (required)

export

vespatune export --help

options:
  --model_path          Path to trained model directory (required)
  --output_dir          Path to ONNX output directory

serve

vespatune serve --help

options:
  --model_path          Path to ONNX export directory
  --host                Host to bind (default: 127.0.0.1)
  --port                Port to bind (default: 9999)
  --workers             Number of workers (default: 1)
  --reload              Enable auto-reload for development

splitter

vespatune splitter --help

options:
  --data_filename       Path to data file (required)
  --output              Path to output directory (required)
  --target              Target column name (required)
  --task                Task type: classification, regression (required)
  --num_folds           Number of folds (default: 5)

Output Files

After training, the following files are created in the output directory:

File Description
vtune_model.final Trained model
vtune.config Model configuration
vtune.best_params Best hyperparameters from Optuna
vtune.preprocessor.joblib Fitted preprocessor (encoding, scaling, imputation)
vtune.target_encoder Target encoder (for classification)
params.db Optuna study database
train.feather Processed training data
valid.feather Processed validation data
onnx/ ONNX export directory (after export)

Example

from vespatune import VespaTune

# Train with LightGBM
vtune = VespaTune(
    train_filename="data/train.csv",
    valid_filename="data/valid.csv",
    output="outputs/lgb_model",
    model_type="lightgbm",
    targets=["price"],
    task="regression",
    num_trials=200,
    time_limit=1800,
    use_gpu=False,
    seed=42,
)
vtune.train()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vespatune-0.0.4.tar.gz (74.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vespatune-0.0.4-py3-none-any.whl (68.6 kB view details)

Uploaded Python 3

File details

Details for the file vespatune-0.0.4.tar.gz.

File metadata

  • Download URL: vespatune-0.0.4.tar.gz
  • Upload date:
  • Size: 74.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for vespatune-0.0.4.tar.gz
Algorithm Hash digest
SHA256 f6e8d6ff092242d3522adc495b3a756a56c0833ee836ae73b0da58122699903b
MD5 07b5cc04be359b52ea563366d05af864
BLAKE2b-256 bf91c075f7dd21b84f380b2b4556387b114f11b1928ac5e44f0ec866afd07207

See more details on using hashes here.

File details

Details for the file vespatune-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: vespatune-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 68.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for vespatune-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 e62d352119be810cfd7cf9d64be49f9cd16e952e7fe13a50b18c0ee15f67ca47
MD5 59b6f63bc257f720a53186423d2810e0
BLAKE2b-256 62d214edd13c01d05d94310f88dd3e8f69e7faed989967e8001bc51c8e024ac2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page