vespatune: no-code training for tabular models
Project description
VespaTune
Gradient Boosting + Optuna: no brainer
- Web UI for training, monitoring, and managing models
- Tune models directly from CSV files
- Real-time training progress with WebSocket updates
- Export models to ONNX format for deployment
Installation
Install using pip:
pip install vespatune
Quick Start
Web UI (Recommended)
Start the web interface:
vespatune
This launches the VespaTune UI at http://127.0.0.1:9999 where you can:
- Upload train/validation CSV files
- Configure model type, target columns, and hyperparameters
- Start training with real-time progress monitoring
- View trial results and metrics
- Download trained models and artifacts
- Manage multiple training runs
You can also specify host and port:
vespatune --host 0.0.0.0 --port 8080
CLI
Train a model with explicit train/valid split:
vespatune train \
--train_filename train.csv \
--valid_filename valid.csv \
--output outputs/my_model \
--model xgboost
Or let VespaTune auto-split your data:
vespatune train \
--train_filename data.csv \
--output outputs/my_model \
--model xgboost
Make predictions:
vespatune predict \
--model_path outputs/my_model \
--test_filename test.csv \
--output_filename predictions.csv
Serve a trained model for predictions:
vespatune serve --model_path outputs/my_model --host 0.0.0.0 --port 8000
Python API
from vespatune import VespaTune
# With explicit validation file
vtune = VespaTune(
train_filename="train.csv",
valid_filename="valid.csv",
output="outputs/my_model",
model_type="xgboost", # or "lightgbm" or "catboost"
targets=["target"],
num_trials=100,
time_limit=3600,
)
vtune.train()
# Or with auto-split (no validation file needed)
vtune = VespaTune(
train_filename="data.csv",
output="outputs/my_model",
model_type="xgboost",
targets=["target"],
num_trials=100,
)
vtune.train()
Web UI Features
The web interface provides:
- File Upload: Drag and drop CSV files for training (validation file is optional)
- Auto-Split: If no validation file is provided, automatically splits training data
- Auto Column Detection: Automatically detects columns for target and ID selection
- Model Selection: Choose between XGBoost, LightGBM, or CatBoost
- Real-time Monitoring: Watch training progress with live trial updates via WebSocket
- Metrics Visualization: View loss curves and hyperparameter importance
- Run Management: Start, stop, and delete training runs
- Artifact Downloads: Download trained models, configs, and ONNX exports
Parameters
Required
| Parameter | Description |
|---|---|
train_filename |
Path to training CSV file |
output |
Path to output directory for model artifacts |
Optional
| Parameter | Default | Description |
|---|---|---|
valid_filename |
None |
Path to validation CSV file (auto-splits training data if not provided) |
model_type |
"xgboost" |
Model to use: "xgboost", "lightgbm", "catboost", or "logreg" |
test_filename |
None |
Path to test CSV file (predictions saved if provided) |
task |
None |
"classification" or "regression" (auto-detected if not specified) |
idx |
"id" |
Name of the ID column |
targets |
["target"] |
List of target column names |
features |
None |
List of feature columns (all non-id/target columns if not specified) |
categorical_features |
None |
List of categorical columns (auto-detected if not specified) |
use_gpu |
False |
Whether to use GPU for training |
seed |
42 |
Random seed for reproducibility |
num_trials |
1000 |
Number of Optuna trials for hyperparameter tuning |
time_limit |
None |
Time limit for optimization in seconds |
Supported Models
XGBoost
- Default model with extensive hyperparameter search
- Supports GPU acceleration
- Best for general-purpose tasks
LightGBM
- Native categorical feature support
- Fast training on large datasets
- Supports GPU acceleration
CatBoost
- Best native categorical feature handling
- Robust to overfitting
- Supports GPU acceleration
Logistic Regression
- Linear model for classification tasks only
- Searches over preprocessing (imputation, scaling) and regularization
- Fast training, interpretable coefficients
Data Splitting
VespaTune supports two modes:
- Explicit split: Provide both
train_filenameandvalid_filename - Auto-split: Provide only
train_filename- VespaTune automatically creates a 5-fold split and uses fold 0 (80% train, 20% valid)
For manual control over splits, use the splitter utility:
vespatune splitter \
--data_filename data.csv \
--output splits/ \
--target target \
--task classification \
--num_folds 5
Or via Python:
from vespatune import VespaTuneSplitter
splitter = VespaTuneSplitter(
data_filename="data.csv",
output="splits/",
target="target",
task="classification",
num_folds=5,
)
splitter.split()
This creates fold_0_train.csv, fold_0_valid.csv, etc. for k-fold cross-validation.
Prediction
Using the trained model
from vespatune import VespaTunePredict
predictor = VespaTunePredict(model_path="outputs/my_model")
# Predict on file
predictor.predict_file("test.csv", "predictions.csv")
# Predict single sample
prediction = predictor.predict_single({"feature1": 1.0, "feature2": "A"})
Using ONNX model
from vespatune import VespaTuneONNXPredict
predictor = VespaTuneONNXPredict(model_path="onnx_model/")
# Predict on file
predictor.predict_file("test.csv", "predictions.csv")
# Predict single sample
prediction = predictor.predict_single({"feature1": 1.0, "feature2": "A"})
Standalone Preprocessing
Use VespaTuneProcessor when you want to preprocess data independently and pass it to an external ONNX runtime or inference system:
from vespatune import VespaTuneProcessor
import onnxruntime as ort
# Load preprocessor from model or ONNX export directory
processor = VespaTuneProcessor(model_path="outputs/my_model")
# Transform DataFrame
processed = processor.transform(df) # Returns float32 numpy array
# Transform single sample
processed = processor.transform_single({"feature1": 1.0, "feature2": "A"})
# Get feature metadata
processor.get_feature_names() # Input feature names
processor.get_categorical_features() # Categorical feature names
processor.get_feature_names_out() # Output feature names after transform
processor.get_input_schema() # Pydantic schema for API validation
# Pass to ONNX runtime
session = ort.InferenceSession("model.onnx")
predictions = session.run(None, {"input": processed})
CLI Reference
Default (UI)
vespatune [--host HOST] [--port PORT]
options:
--host Host to serve on (default: 127.0.0.1)
--port Port to serve on (default: 9999)
--version, -v Display VespaTune version
train
vespatune train --help
options:
--train_filename Path to training file (required)
--valid_filename Path to validation file (optional, auto-splits if not provided)
--output Path to output directory (required)
--model Model type: xgboost, lightgbm, catboost, logreg (default: xgboost)
--test_filename Path to test file
--task Task type: classification, regression
--idx ID column name
--targets Target column(s), separate multiple by ';'
--features Feature columns, separate by ';'
--use_gpu Use GPU for training
--seed Random seed (default: 42)
--num_trials Number of Optuna trials (default: 100)
--time_limit Time limit in seconds
predict
vespatune predict --help
options:
--model_path Path to trained model directory (required)
--test_filename Path to test file (required)
--output_filename Path to output predictions file (required)
export
vespatune export --help
options:
--model_path Path to trained model directory (required)
--output_dir Path to ONNX output directory
serve
vespatune serve --help
options:
--model_path Path to ONNX export directory
--host Host to bind (default: 127.0.0.1)
--port Port to bind (default: 9999)
--workers Number of workers (default: 1)
--reload Enable auto-reload for development
splitter
vespatune splitter --help
options:
--data_filename Path to data file (required)
--output Path to output directory (required)
--target Target column name (required)
--task Task type: classification, regression (required)
--num_folds Number of folds (default: 5)
Output Files
After training, the following files are created in the output directory:
| File | Description |
|---|---|
vtune_model.final |
Trained model |
vtune.config |
Model configuration |
vtune.best_params |
Best hyperparameters from Optuna |
vtune.preprocessor.joblib |
Fitted preprocessor (encoding, scaling, imputation) |
vtune.target_encoder |
Target encoder (for classification) |
params.db |
Optuna study database |
train.feather |
Processed training data |
valid.feather |
Processed validation data |
onnx/ |
ONNX export directory (after export) |
_splits/ |
Auto-generated train/valid splits (only if no validation file provided) |
Example
from vespatune import VespaTune
# Train with LightGBM
vtune = VespaTune(
train_filename="data/train.csv",
valid_filename="data/valid.csv",
output="outputs/lgb_model",
model_type="lightgbm",
targets=["price"],
task="regression",
num_trials=200,
time_limit=1800,
use_gpu=False,
seed=42,
)
vtune.train()
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vespatune-0.0.5.tar.gz.
File metadata
- Download URL: vespatune-0.0.5.tar.gz
- Upload date:
- Size: 76.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2449aa79669910fc80968ae45c95c306b71618f47f77a931b140d793ab10458f
|
|
| MD5 |
7e9f1a8cd0a961023c5952e6a2d4cad5
|
|
| BLAKE2b-256 |
b8cc17197dcafc46b88c3aaeba5ce4ba8124c9d544886606a7d2a1bb12360c4c
|
File details
Details for the file vespatune-0.0.5-py3-none-any.whl.
File metadata
- Download URL: vespatune-0.0.5-py3-none-any.whl
- Upload date:
- Size: 69.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e0197b9b6507eda185acb8410cd08568957d4670c48d8f2ef223bf7f8f4b354b
|
|
| MD5 |
fe2259a9b98f212483f5da76b46c5c8b
|
|
| BLAKE2b-256 |
7f7ed78b0d48af1b68f757085bfbd687329944a8be3e6ab53b76fdf957f5ad25
|