Skip to main content

Efficient approximate Bayesian machine learning

Project description

xGPR

xGPR is a library for fitting approximate Gaussian process regression models and approximate kernel classification models to datasets ranging in size from thousands to millions of datapoints. It can also be used for efficient approximate kernel k-means and approximate kernel PCA.

The docs provide a number of examples for how to use xGPR for fitting protein sequences, small molecule structures, and tabular data for regression (classification is also available in v0.4.8).

xGPR uses a fast Hadamard transform-based implementation of the random features approximation (aka random Fourier features). It is designed to run on either CPU or GPU (GPU is better for training, either is fine for inference), to model tabular data, sequence & time series data and graph data, and to fit datasets too large to load into memory in a straightforward way.

Unlike exact Gaussian processes, which exhibit O(N^2) scaling and are impractical for large datasets, xGPR can scale easily; it is straightforward to quickly fit a few million datapoints on a GPU. The approximation we use provides improved accuracy compared with variational or sparse GP approximations. Unlike other libraries for Gaussian processes, which only provide kernels for fixed-vector data, xGPR provides convolution kernels for variable-length time series, sequences and graphs.

What's new in v0.4.8

An approximate kernel classifier is now included. Unlike the xGPRegression object, this does not currently however compute marginal likelihood, so to tune hyperparameters for this you will have to evaluate performance on a validation set. We hope to implement approximate marginal likelihood calculations for this soon.

You can now build custom Datasets (similar to the Dataloader in PyTorch) so that you can use any kind of data (SQLite db, HDF5 etc.) as input when training with minor tweaks.

In most cases, installation should typically be as simple as:

pip install xGPR

See the documentation for important information about installation and requirements.

Documentation

The documentation covers a variety of use cases, including tabular data, sequences and graphs, installation requirements and much more.

Citations

If using xGPR for research intended for publication, please cite either:

Linear-Scaling Kernels for Protein Sequences and Small Molecules Outperform Deep Learning While Providing Uncertainty Quantitation and Improved Interpretability Jonathan Parkinson and Wei Wang Journal of Chemical Information and Modeling 2023 63 (15), 4589-4601 DOI: 10.1021/acs.jcim.3c00601

The preprint is available at:

Jonathan Parkinson, & Wei Wang. (2023). Linear Scaling Kernels for Protein Sequences and Small Molecules Outperform Deep Learning while Providing Uncertainty Quantitation and Improved Interpretability https://arxiv.org/abs/2302.03294

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xgpr-0.4.8.tar.gz (2.5 MB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

xgpr-0.4.8-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (664.7 kB view details)

Uploaded CPython 3.12+manylinux: glibc 2.17+ x86-64

xgpr-0.4.8-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (666.1 kB view details)

Uploaded CPython 3.11manylinux: glibc 2.17+ x86-64

xgpr-0.4.8-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (666.4 kB view details)

Uploaded CPython 3.10manylinux: glibc 2.17+ x86-64

xgpr-0.4.8-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (666.6 kB view details)

Uploaded CPython 3.9manylinux: glibc 2.17+ x86-64

File details

Details for the file xgpr-0.4.8.tar.gz.

File metadata

  • Download URL: xgpr-0.4.8.tar.gz
  • Upload date:
  • Size: 2.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.10.12

File hashes

Hashes for xgpr-0.4.8.tar.gz
Algorithm Hash digest
SHA256 cb2dd9938417955f0739b3658a48af684f14346fad28b498b9d874f352e3cf5a
MD5 e6b557f13d7afaa3e9402af7f64e6b1f
BLAKE2b-256 41540d39ac11623091d013bf01ae6b8e83dba62012275e981ae89430a5fe2527

See more details on using hashes here.

File details

Details for the file xgpr-0.4.8-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for xgpr-0.4.8-cp312-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 584f88bff3bc45ee2a7741e377b0b483e35d323c8b29f4e23c31d90c3a0089cb
MD5 e5194a06ecd448fca8dfaf556d6c4fac
BLAKE2b-256 b03daf650917d3aa15448aeaa4c9d092fbe1f6f481d9b51896a201f08be927d1

See more details on using hashes here.

File details

Details for the file xgpr-0.4.8-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for xgpr-0.4.8-cp311-cp311-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 c05da65e3d81298e345a727aacbf4092fb3cdecf029a7b268054367a5c193c8a
MD5 044f207732dbbb2d3b37237366d751e5
BLAKE2b-256 66c1a3ee5c88ed4c66c4bf7a765d685b2dc3df421ee9b4a2b816e56b6efa2813

See more details on using hashes here.

File details

Details for the file xgpr-0.4.8-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for xgpr-0.4.8-cp310-cp310-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 639aed4af711e182e3bcf452885c7123eae22b344987e22bca27f37076f24d2c
MD5 c79807d8c3826a664c10711b3fd0bae7
BLAKE2b-256 7ca208f8f146e060f1797512cce099dd2d32906408f9be5094cfc73de2e226fb

See more details on using hashes here.

File details

Details for the file xgpr-0.4.8-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for xgpr-0.4.8-cp39-cp39-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 3f7ae579c8ffdc2fc62f2936e33d5431dc70a4431519c2f18551d4f757089cf0
MD5 2e2911cabdf797e67259ff0b93861940
BLAKE2b-256 56ab00d5d4211e9ca6cbc1181938e0df14ccc8b487f3b701ef32e30de5f15160

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page