Skip to main content

Improved Kernel PLS and Fast Cross-Validation.

Project description

Improved Kernel Partial Least Squares (IKPLS) and Fast Cross-Validation

PyPI Version

PyPI - Downloads

Python Versions

License

Documentation Status

Tests Status

Package Status

JOSS Status

The ikpls software package provides fast and efficient tools for PLS (Partial Least Squares) modeling. This package is designed to help researchers and practitioners handle PLS modeling faster than previously possible - particularly on large datasets.

Citation

If you use the ikpls software package for your work, please cite this Journal of Open Source Software article. If you use the fast cross-validation algorithm implemented in ikpls.fast_cross_validation.numpy_ikpls, please also cite this arXiv preprint.

Unlock the Power of Fast and Stable Partial Least Squares Modeling with IKPLS

Dive into cutting-edge Python implementations of the IKPLS (Improved Kernel Partial Least Squares) Algorithms #1 and #2 [1] for CPUs, GPUs, and TPUs. IKPLS is both fast [2] and numerically stable [3] making it optimal for PLS modeling.

  • Use our NumPy [4] based CPU implementations for seamless integration with scikit-learn's [5] ecosystem of machine learning algorithms and pipelines. As the implementations subclass scikit-learn's BaseEstimator, they can be used with scikit-learn's cross_validate.
  • Use our JAX [6] implementations on CPUs or leverage powerful GPUs and TPUs for PLS modelling. Our JAX implementations are end-to-end differentaible allowing gradient propagation when using PLS as a layer in a deep learning model.
  • Use our combination of IKPLS with Engstrøm's unbelievably fast cross-validation algorithm [7] to quickly determine the optimal combination of preprocessing and number of PLS components.

The documentation is available at https://ikpls.readthedocs.io/en/latest/; examples can be found at https://github.com/Sm00thix/IKPLS/tree/main/examples.

Fast Cross-Validation

In addition to the standalone IKPLS implementations, this package contains an implementation of IKPLS combined with the novel, fast cross-validation by Engstrøm [7]. The fast cross-validation algorithm benefit both IKPLS Algorithms and especially Algorithm #2. The fast cross-validation algorithm is mathematically equivalent to the classical cross-validation algorithm. Still, it is much quicker. The fast cross-validation algorithm correctly handles (column-wise) centering and scaling of the X and Y input matrices using training set means and standard deviations to avoid data leakage from the validation set. This centering and scaling can be enabled or disabled independently from eachother and for X and Y by setting the parameters center_X, center_Y, scale_X, and scale_Y, respectively. In addition to correctly handling (column-wise) centering and scaling, the fast cross-validation algorithm correctly handles row-wise preprocessing that operates independently on each sample such as (row-wise) centering and scaling of the X and Y input matrices, convolution, or other preprocessing. Row-wise preprocessing can safely be applied before passing the data to the fast cross-validation algorithm.

Prerequisites

The JAX implementations support running on both CPU, GPU, and TPU.

  • To enable NVIDIA GPU execution, install JAX and CUDA with:

    pip3 install -U "jax[cuda12]"
    
  • To enable Google Cloud TPU execution, install JAX with:

    pip3 install -U "jax[tpu]" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html
    

These are typical installation instructions that will be what most users are looking for. For customized installations, follow the instructions from the JAX Installation Guide.

To ensure that JAX implementations use Float64, set the environment variable JAX_ENABLE_X64=True as per the Current Gotchas.

Installation

  • Install the package for Python3 using the following command:

    pip3 install ikpls
    
  • Now you can import the NumPy and JAX implementations with:

    from ikpls.numpy_ikpls import PLS as NpPLS
    from ikpls.jax_ikpls_alg_1 import PLS as JAXPLS_Alg_1
    from ikpls.jax_ikpls_alg_2 import PLS as JAXPLS_Alg_2
    from ikpls.fast_cross_validation.numpy_ikpls import PLS as NpPLS_FastCV
    

Quick Start

Use the ikpls package for PLS modeling

import numpy as np

from ikpls.numpy_ikpls import PLS


 N = 100  # Number of samples.
 K = 50  # Number of features.
 M = 10  # Number of targets.
 A = 20  # Number of latent variables (PLS components).

 # Using float64 is important for numerical stability.
 X = np.random.uniform(size=(N, K)).astype(np.float64)
 Y = np.random.uniform(size=(N, M)).astype(np.float64)

 # The other PLS algorithms and implementations have the same interface for fit()
 # and predict(). The fast cross-validation implementation with IKPLS has a
 # different interface.
 np_ikpls_alg_1 = PLS(algorithm=1)
 np_ikpls_alg_1.fit(X, Y, A)

 # Has shape (A, N, M) = (20, 100, 10). Contains a prediction for all possible
 # numbers of components up to and including A.
 y_pred = np_ikpls_alg_1.predict(X)

 # Has shape (N, M) = (100, 10).
 y_pred_20_components = np_ikpls_alg_1.predict(X, n_components=20)
 (y_pred_20_components == y_pred[19]).all()  # True

 # The internal model parameters can be accessed as follows:

 # Regression coefficients tensor of shape (A, K, M) = (20, 50, 10).
 np_ikpls_alg_1.B

 # X weights matrix of shape (K, A) = (50, 20).
 np_ikpls_alg_1.W

 # X loadings matrix of shape (K, A) = (50, 20).
 np_ikpls_alg_1.P

 # Y loadings matrix of shape (M, A) = (10, 20).
 np_ikpls_alg_1.Q

 # X rotations matrix of shape (K, A) = (50, 20).
 np_ikpls_alg_1.R

 # X scores matrix of shape (N, A) = (100, 20).
 # This is only computed for IKPLS Algorithm #1.
 np_ikpls_alg_1.T

Examples

In examples, you will find:

Contribute

To contribute, please read the Contribution Guidelines.

References

  1. Dayal, B. S., & MacGregor, J. F. (1997). Improved PLS algorithms. Journal of Chemometrics, 11(1), 73-85.
  2. Alin, A. (2009). Comparison of PLS algorithms when the number of objects is much larger than the number of variables. Statistical Papers, 50, 711-720.
  3. Andersson, M. (2009). A comparison of nine PLS1 algorithms. Journal of Chemometrics, 23(10), 518-529.
  4. NumPy
  5. scikit-learn
  6. JAX
  7. Engstrøm, O.-C. G. (2024). Shortcutting Cross-Validation: Efficiently Deriving Column-Wise Centered and Scaled Training Set $\mathbf{X}^\mathbf{T}\mathbf{X}$ and $\mathbf{X}^\mathbf{T}\mathbf{Y}$ Without Full Recomputation of Matrix Products or Statistical Moments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ikpls-1.2.5.tar.gz (26.7 kB view details)

Uploaded Source

Built Distribution

ikpls-1.2.5-py3-none-any.whl (32.3 kB view details)

Uploaded Python 3

File details

Details for the file ikpls-1.2.5.tar.gz.

File metadata

  • Download URL: ikpls-1.2.5.tar.gz
  • Upload date:
  • Size: 26.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for ikpls-1.2.5.tar.gz
Algorithm Hash digest
SHA256 76cf45220b574c1d295d4d63a0dfbdd0290a888b41ab7f2970d69e7ca1d70c37
MD5 68d9b0c3b05f83a53db8c838ae5cf0b1
BLAKE2b-256 943135db9fef38ac8baa5a9a1e1f344ce6421cde7025b4af46f6648caa4c17a1

See more details on using hashes here.

File details

Details for the file ikpls-1.2.5-py3-none-any.whl.

File metadata

  • Download URL: ikpls-1.2.5-py3-none-any.whl
  • Upload date:
  • Size: 32.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.4

File hashes

Hashes for ikpls-1.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 ff459de74c5501503c6784cdb1c53d7e13c0af269c78e1f88eeab47c2b6b54f7
MD5 46162e6eb9966c293c19b33eed3493b1
BLAKE2b-256 dc9e9f142738e9924603a3b0e8a2220f4605383317bff403af09ee391dd20706

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page