Neo LS-SVM
Project description
Neo LS-SVM
Neo LS-SVM is a modern Least-Squares Support Vector Machine implementation in Python that offers several benefits over sklearn's classic sklearn.svm.SVC
classifier and sklearn.svm.SVR
regressor:
- โก Linear complexity in the number of training examples with Orthogonal Random Features.
- ๐ Hyperparameter free: zero-cost optimization of the regularisation parameter ฮณ and kernel parameter ฯ.
- ๐๏ธ Adds a new tertiary objective that minimizes the complexity of the prediction surface.
- ๐ Returns the leave-one-out residuals and error for free after fitting.
- ๐ Learns an affine transformation of the feature matrix to optimally separate the target's bins.
- ๐ช Can solve the LS-SVM both in the primal and dual space.
- ๐ก๏ธ Isotonically calibrated
predict_proba
based on the leave-one-out predictions. - ๐ฒ Asymmetric conformal Bayesian confidence intervals for classification and regression.
Using
Installing
First, install this package with:
pip install neo-ls-svm
Classification and regression
Then, you can import neo_ls_svm.NeoLSSVM
as an sklearn-compatible binary classifier and regressor. Example usage:
from neo_ls_svm import NeoLSSVM
from pandas import get_dummies
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
# Binary classification example:
X, y = fetch_openml("churn", version=3, return_X_y=True, as_frame=True, parser="auto")
X_train, X_test, y_train, y_test = train_test_split(get_dummies(X), y, test_size=0.15, random_state=42)
model = NeoLSSVM().fit(X_train, y_train)
model.score(X_test, y_test) # 93.1% (compared to sklearn.svm.SVC's 89.6%)
# Regression example:
X, y = fetch_openml("ames_housing", version=1, return_X_y=True, as_frame=True, parser="auto")
X_train, X_test, y_train, y_test = train_test_split(get_dummies(X), y, test_size=0.15, random_state=42)
model = NeoLSSVM().fit(X_train, y_train)
model.score(X_test, y_test) # 82.4% (compared to sklearn.svm.SVR's -11.8%)
Confidence intervals
Neo LS-SVM implements conformal prediction with a Bayesian nonconformity estimate to compute confidence intervals for both classification and regression. Example usage:
from neo_ls_svm import NeoLSSVM
from pandas import get_dummies
from sklearn.datasets import fetch_openml
from sklearn.model_selection import train_test_split
# Load a regression problem and split in train and test.
X, y = fetch_openml("ames_housing", version=1, return_X_y=True, as_frame=True, parser="auto")
X_train, X_test, y_train, y_test = train_test_split(get_dummies(X), y, test_size=50, random_state=42)
# Fit a Neo LS-SVM model.
model = NeoLSSVM().fit(X_train, y_train)
# Predict the house prices and confidence intervals on the test set.
ลท = model.predict(X_test)
ลท_conf = model.predict_proba(X_test, confidence_interval=True, confidence_level=0.95)
# ลท_conf[:, 0] and ลท_conf[:, 1] are the lower and upper bound of the confidence interval for the predictions ลท, respectively
Let's visualize the confidence intervals on the test set:
Expand to see the code that generated the above graph.
import matplotlib.pyplot as plt
import matplotlib.ticker as ticker
import numpy as np
idx = np.argsort(-ลท)
y_ticks = np.arange(1, len(X_test) + 1)
plt.figure(figsize=(4, 5))
plt.barh(y_ticks, ลท_conf[idx, 1] - ลท_conf[idx, 0], left=ลท_conf[idx, 0], label="95% Confidence interval", color="lightblue")
plt.plot(y_test.iloc[idx], y_ticks, "s", markersize=3, markerfacecolor="none", markeredgecolor="cornflowerblue", label="Actual value")
plt.plot(ลท[idx], y_ticks, "s", color="mediumblue", markersize=0.6, label="Predicted value")
plt.xlabel("House price")
plt.ylabel("Test house index")
plt.yticks(y_ticks, y_ticks)
plt.tick_params(axis="y", labelsize=6)
plt.grid(axis="x", color="lightsteelblue", linestyle=":", linewidth=0.5)
plt.gca().xaxis.set_major_formatter(ticker.StrMethodFormatter('${x:,.0f}'))
plt.gca().spines["top"].set_visible(False)
plt.gca().spines["right"].set_visible(False)
plt.legend()
plt.tight_layout()
plt.show()
Benchmarks
We select all binary classification and regression datasets below 1M entries from the AutoML Benchmark. Each dataset is split into 85% for training and 15% for testing. We apply skrub.TableVectorizer
as a preprocessing step for neo_ls_svm.NeoLSSVM
and sklearn.svm.SVC,SVR
to vectorize the pandas DataFrame training data into a NumPy array. Models are fitted only once on each dataset, with their default settings and no hyperparameter tuning.
Binary classification
ROC-AUC on 15% test set:
dataset | LGBMClassifier | NeoLSSVM | SVC |
---|---|---|---|
ada | ๐ฅ 90.9% (0.1s) | ๐ฅ 90.9% (0.8s) | 83.1% (1.0s) |
adult | ๐ฅ 93.0% (0.5s) | ๐ฅ 89.1% (6.0s) | / |
amazon_employee_access | ๐ฅ 85.6% (0.5s) | ๐ฅ 64.5% (2.8s) | / |
arcene | ๐ฅ 78.0% (0.6s) | 70.0% (4.4s) | ๐ฅ 82.0% (3.4s) |
australian | ๐ฅ 88.3% (0.2s) | 79.9% (0.4s) | ๐ฅ 81.9% (0.0s) |
bank-marketing | ๐ฅ 93.5% (0.3s) | ๐ฅ 91.0% (4.1s) | / |
blood-transfusion-service-center | 62.0% (0.1s) | ๐ฅ 71.0% (0.5s) | ๐ฅ 69.7% (0.0s) |
churn | ๐ฅ 91.7% (0.4s) | ๐ฅ 81.0% (0.8s) | 70.6% (0.8s) |
click_prediction_small | ๐ฅ 67.7% (0.4s) | ๐ฅ 66.6% (3.3s) | / |
jasmine | ๐ฅ 86.1% (0.3s) | 79.5% (1.2s) | ๐ฅ 85.3% (1.8s) |
kc1 | ๐ฅ 78.9% (0.2s) | ๐ฅ 76.6% (0.5s) | 45.7% (0.2s) |
kr-vs-kp | ๐ฅ 100.0% (0.2s) | 99.2% (0.8s) | ๐ฅ 99.4% (0.6s) |
madeline | ๐ฅ 93.1% (0.4s) | 65.6% (0.8s) | ๐ฅ 82.5% (4.5s) |
ozone-level-8hr | ๐ฅ 91.2% (0.3s) | ๐ฅ 91.6% (0.7s) | 72.8% (0.2s) |
pc4 | ๐ฅ 95.3% (0.3s) | ๐ฅ 90.9% (0.5s) | 25.7% (0.1s) |
phishingwebsites | ๐ฅ 99.5% (0.3s) | ๐ฅ 98.9% (1.3s) | 98.7% (2.6s) |
phoneme | ๐ฅ 95.6% (0.2s) | ๐ฅ 93.5% (0.8s) | 91.2% (0.7s) |
qsar-biodeg | ๐ฅ 92.7% (0.2s) | ๐ฅ 91.1% (1.2s) | 86.8% (0.1s) |
satellite | ๐ฅ 98.7% (0.2s) | ๐ฅ 99.5% (0.8s) | 98.5% (0.1s) |
sylvine | ๐ฅ 98.5% (0.2s) | ๐ฅ 97.1% (0.8s) | 96.5% (1.0s) |
wilt | ๐ฅ 99.5% (0.2s) | ๐ฅ 99.8% (0.9s) | 98.9% (0.2s) |
Regression
Rยฒ on 15% test set:
dataset | LGBMRegressor | NeoLSSVM | SVR |
---|---|---|---|
abalone | ๐ฅ 56.2% (0.1s) | ๐ฅ 59.5% (1.1s) | 51.3% (0.2s) |
boston | ๐ฅ 91.7% (0.2s) | ๐ฅ 89.3% (0.4s) | 35.1% (0.0s) |
brazilian_houses | ๐ฅ 55.9% (0.4s) | ๐ฅ 88.3% (1.5s) | 5.4% (2.0s) |
colleges | ๐ฅ 58.5% (0.4s) | ๐ฅ 43.7% (4.1s) | 40.2% (5.1s) |
diamonds | ๐ฅ 98.2% (0.7s) | ๐ฅ 95.2% (4.5s) | / |
elevators | ๐ฅ 87.7% (0.4s) | ๐ฅ 82.6% (2.6s) | / |
house_16h | ๐ฅ 67.7% (0.3s) | ๐ฅ 52.8% (2.4s) | / |
house_prices_nominal | ๐ฅ 89.0% (0.6s) | ๐ฅ 78.2% (1.3s) | -2.9% (0.3s) |
house_sales | ๐ฅ 89.2% (1.3s) | ๐ฅ 77.8% (2.2s) | / |
mip-2016-regression | ๐ฅ 59.2% (0.4s) | ๐ฅ 34.9% (2.6s) | -27.3% (0.1s) |
moneyball | ๐ฅ 93.2% (0.2s) | ๐ฅ 91.2% (0.6s) | 0.8% (0.1s) |
pol | ๐ฅ 98.7% (0.3s) | ๐ฅ 75.2% (1.7s) | / |
quake | -10.7% (0.2s) | ๐ฅ -0.1% (0.5s) | ๐ฅ -10.7% (0.0s) |
sat11-hand-runtime-regression | ๐ฅ 78.3% (0.5s) | ๐ฅ 61.7% (1.0s) | -56.3% (1.0s) |
sensory | ๐ฅ 29.2% (0.2s) | 3.8% (0.4s) | ๐ฅ 16.4% (0.0s) |
socmob | ๐ฅ 79.6% (0.2s) | ๐ฅ 72.5% (1.5s) | 30.8% (0.0s) |
space_ga | ๐ฅ 70.3% (0.2s) | ๐ฅ 43.7% (0.6s) | 35.9% (0.1s) |
tecator | ๐ฅ 98.3% (0.1s) | ๐ฅ 99.4% (0.2s) | 78.5% (0.0s) |
us_crime | ๐ฅ 62.8% (0.4s) | ๐ฅ 63.0% (0.8s) | 6.7% (0.2s) |
wine_quality | ๐ฅ 45.6% (0.6s) | -8.0% (0.9s) | ๐ฅ 16.4% (0.5s) |
Contributing
Prerequisites
1. Set up Git to use SSH
- Generate an SSH key and add the SSH key to your GitHub account.
- Configure SSH to automatically load your SSH keys:
cat << EOF >> ~/.ssh/config Host * AddKeysToAgent yes IgnoreUnknown UseKeychain UseKeychain yes EOF
2. Install Docker
- Install Docker Desktop.
- Enable Use Docker Compose V2 in Docker Desktop's preferences window.
- Linux only:
- Export your user's user id and group id so that files created in the Dev Container are owned by your user:
cat << EOF >> ~/.bashrc export UID=$(id --user) export GID=$(id --group) EOF
- Export your user's user id and group id so that files created in the Dev Container are owned by your user:
3. Install VS Code or PyCharm
- Install VS Code and VS Code's Dev Containers extension. Alternatively, install PyCharm.
- Optional: install a Nerd Font such as FiraCode Nerd Font and configure VS Code or configure PyCharm to use it.
Development environments
The following development environments are supported:
- โญ๏ธ GitHub Codespaces: click on Code and select Create codespace to start a Dev Container with GitHub Codespaces.
- โญ๏ธ Dev Container (with container volume): click on Open in Dev Containers to clone this repository in a container volume and create a Dev Container with VS Code.
- Dev Container: clone this repository, open it with VS Code, and run Ctrl/โ + โง + P โ Dev Containers: Reopen in Container.
- PyCharm: clone this repository, open it with PyCharm, and configure Docker Compose as a remote interpreter with the
dev
service. - Terminal: clone this repository, open it with your terminal, and run
docker compose up --detach dev
to start a Dev Container in the background, and then rundocker compose exec dev zsh
to open a shell prompt in the Dev Container.
Developing
- This project follows the Conventional Commits standard to automate Semantic Versioning and Keep A Changelog with Commitizen.
- Run
poe
from within the development environment to print a list of Poe the Poet tasks available to run on this project. - Run
poetry add {package}
from within the development environment to install a run time dependency and add it topyproject.toml
andpoetry.lock
. Add--group test
or--group dev
to install a CI or development dependency, respectively. - Run
poetry update
from within the development environment to upgrade all dependencies to the latest versions allowed bypyproject.toml
. - Run
cz bump
to bump the package's version, update theCHANGELOG.md
, and create a git tag.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for neo_ls_svm-0.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 12bd05696fa52733f10d807c8361ca36153d370218ad4dcc6421ff7d424c54a6 |
|
MD5 | 75aa04d5848a38d48c342d70d093b71f |
|
BLAKE2b-256 | 7d3468dead35e7fb9e84d400ff3000a542b0eb5495ea7ee88c39e5adb81bdcf0 |