Skip to main content

Efficient approximate Bayesian machine learning

Project description

xGPR

xGPR is a library for fitting approximate Gaussian process regression models and approximate kernel classification models to datasets ranging in size from thousands to millions of datapoints. It can also be used for efficient approximate kernel k-means and approximate kernel PCA.

The docs provide a number of examples for how to use xGPR for fitting protein sequences, small molecule structures, and tabular data for regression (classification is also available in v0.4.8).

xGPR uses a fast Hadamard transform-based implementation of the random features approximation (aka random Fourier features). It is designed to run on either CPU or GPU (GPU is better for training, either is fine for inference), to model tabular data, sequence & time series data and graph data, and to fit datasets too large to load into memory in a straightforward way.

Unlike exact Gaussian processes, which exhibit O(N^2) scaling and are impractical for large datasets, xGPR can scale easily; it is straightforward to quickly fit a few million datapoints on a GPU. The approximation we use provides improved accuracy compared with variational or sparse GP approximations. Unlike other libraries for Gaussian processes, which only provide kernels for fixed-vector data, xGPR provides convolution kernels for variable-length time series, sequences and graphs.

What's new in v0.4.8

An approximate kernel classifier is now included. Unlike the xGPRegression object, this does not currently however compute marginal likelihood, so to tune hyperparameters for this you will have to evaluate performance on a validation set. We hope to implement approximate marginal likelihood calculations for this soon.

You can now build custom Datasets (similar to the Dataloader in PyTorch) so that you can use any kind of data (SQLite db, HDF5 etc.) as input when training with minor tweaks.

In most cases, installation should typically be as simple as:

pip install xGPR

See the documentation for important information about installation and requirements.

Documentation

The documentation covers a variety of use cases, including tabular data, sequences and graphs, installation requirements and much more.

Citations

If using xGPR for research intended for publication, please cite either:

Linear-Scaling Kernels for Protein Sequences and Small Molecules Outperform Deep Learning While Providing Uncertainty Quantitation and Improved Interpretability Jonathan Parkinson and Wei Wang Journal of Chemical Information and Modeling 2023 63 (15), 4589-4601 DOI: 10.1021/acs.jcim.3c00601

The preprint is available at:

Jonathan Parkinson, & Wei Wang. (2023). Linear Scaling Kernels for Protein Sequences and Small Molecules Outperform Deep Learning while Providing Uncertainty Quantitation and Improved Interpretability https://arxiv.org/abs/2302.03294

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xgpr-0.4.8.5.tar.gz (5.1 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

xgpr-0.4.8.5-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (665.0 kB view details)

Uploaded CPython 3.12+manylinux: glibc 2.17+ x86-64

xgpr-0.4.8.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (666.4 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

xgpr-0.4.8.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (666.7 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

xgpr-0.4.8.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (666.9 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

File details

Details for the file xgpr-0.4.8.5.tar.gz.

File metadata

  • Download URL: xgpr-0.4.8.5.tar.gz
  • Upload date:
  • Size: 5.1 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.12

File hashes

Hashes for xgpr-0.4.8.5.tar.gz
Algorithm Hash digest
SHA256 f682e78f211f1836ae0558bb8ad17b02a8da563809122fcc643f3a336c4c79cb
MD5 15562cd22d6ec6710c5ea9b70cb28eb5
BLAKE2b-256 9a8c92acaed8859cba39a5bd7b765a8778e040abd65b8f428bca31c169b22643

See more details on using hashes here.

File details

Details for the file xgpr-0.4.8.5-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for xgpr-0.4.8.5-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 28c920381ddc0d0728aae050e0f77585f1854beb3c49995491a8f6778854cafa
MD5 aa3e802f39baee8ea50cb2eacc518773
BLAKE2b-256 6f12a6ea33e40a0a1bf670507e12c6028cfb5c9a71d34940e3a871de3442ccb4

See more details on using hashes here.

File details

Details for the file xgpr-0.4.8.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for xgpr-0.4.8.5-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 02fcb7870c8d182098523012a1f2e4e7446e8bf86332cbc8d5d33c1a5a156e03
MD5 1bc70aef516bca4df0b1a6a97692bf6c
BLAKE2b-256 cf6d8c87378c36b5f23b985f8c0aa91057aa68e3bccfa744817d4d4732eb3f5f

See more details on using hashes here.

File details

Details for the file xgpr-0.4.8.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for xgpr-0.4.8.5-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 fb708346730d59fba3611cd9d75a56b4e9d3bc7689e820d95382f7d362bb106c
MD5 e11a14927e00a2b337709f4f086faa66
BLAKE2b-256 447808620ae5add850e966e8bab9355caa8b2546b221060dfbaccaaf3777b434

See more details on using hashes here.

File details

Details for the file xgpr-0.4.8.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for xgpr-0.4.8.5-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 ed9c8fcad702307e53899d71e05bb29b29f1d2db0601e479dcebd1f6a377eeda
MD5 756f1ae987724734058df9d6732b1c2d
BLAKE2b-256 166fdb845182da146ec1ae7843f19acfe15364556922b802dc36a3160d8a46d4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page