Fast computation of possibly centered/scaled training set kernel matrices in a cross-validation setting.
Project description
CVMatrix
The cvmatrix
package implements the fast algorithms by Engstrøm [1] for computation of training set $\mathbf{X}^{\mathbf{T}}\mathbf{X}$ and $\mathbf{X}^{\mathbf{T}}\mathbf{Y}$ in a cross-validation setting. In addition to correctly handling arbitrary row-wise pre-processing, the algorithms allow for and efficiently and correctly handle any combination of column-wise centering and scaling of X
and Y
based on training set statistics.
For an implementation of the fast cross-validation algorithms combined with Improved Kernel Partial Least Squares [2], see the Python package ikpls
.
Installation
-
Install the package for Python3 using the following command:
pip3 install cvmatrix
-
Now you can import the class implementing all the algorithms with:
from cvmatrix.cvmatrix import CVMatrix
Quick Start
Use the cvmatrix package for fast computation of training set kernel matrices
import numpy as np from cvmatrix.cvmatrix import CVMatrix N = 100 # Number of samples. K = 50 # Number of features. M = 10 # Number of targets. X = np.random.uniform(size=(N, K)) # Random X data Y = np.random.uniform(size=(N, M)) # Random Y data cv_splits = np.arange(100) % 5 # 5-fold cross-validation # Instantiate CVMatrix cvm = CVMatrix( cv_splits=cv_splits, center_X=True, center_Y=True, scale_X=True, scale_Y=True, ) # Fit on X and Y cvm.fit(X=X, Y=Y) # Compute training set XTX and/or XTY for each fold for val_split in cvm.val_folds_dict.keys(): # Get both XTX and XTY training_XTX, training_XTY = cvm.training_XTX_XTY(val_split) # Get only XTX training_XTX = cvm.training_XTX(val_split) # Get only XTY training_XTY = cvm.training_XTY(val_split)
Examples
In examples, you will find:
Benchmarks
In benchmarks, we have benchmarked the fast algorithms in cvmatrix
against the straight-forward, naive algorithms implemented in NaiveCVMatrix.
Left: Benchmarking the CVMatrix implementation versus the straight-forward, naive implementation (NaiveCVMatrix) using three common combinations of centering and scaling. Right: Benchmarking the CVMatrix implementation for all possible combinations of centering and scaling.
Contribute
To contribute, please read the Contribution Guidelines.
References
- Engstrøm, O.-C. G. (2024). Shortcutting Cross-Validation: Efficiently Deriving Column-Wise Centered and Scaled Training Set $\mathbf{X}^\mathbf{T}\mathbf{X}$ and $\mathbf{X}^\mathbf{T}\mathbf{Y}$ Without Full Recomputation of Matrix Products or Statistical Moments
- Dayal, B. S., & MacGregor, J. F. (1997). Improved PLS algorithms. Journal of Chemometrics, 11(1), 73-85.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.