Skip to main content

PLS and OPLS regressors

Project description

OPLS-MD

OPLS and PLS regressors with some utility to help with molecular dynamics (MD) simulation analysis.

Installation

Run

pip install OPLS-MD

or to get the latest development commits clone the github repo and run

pip install .

General usage

Given a structure file and trajectory as struct.pdb and traj.xtc we can e.g. define a function as the distance of two specific C alpha atoms:

import MDAnalysis as mda
import numpy as np

# Make universe and load trajectory to memory
u = mda.Universe("struct.pdb","traj.xtc")
coords = np.array([u.atoms.positions.copy() for ts in u.trajectory])
# Calculate function as distance between res 2 and 150 CA
sel = u.select_atoms("resid 2 150 and name CA")
y = np.linalg.norm(coords[:,sel.indices[0],:]-coords[:,sel.indices[1],:], axis=-1)
# Flatten from N by M by 3 dimensions to N by 3M
X = coords.reshape((coords.shape[0],-1))

Now that we have shape N by 3M array of X values and a one dimensional y, we can start running the OPLS

from OPLS_MD import PLS, OPLS

opls = OPLS(n_components=1).fit(X,y)

This will fit a one component OPLS model on the data. To get a new X array, where we have filtered out the orthogonal components, we can use the transform function

X_filt = opls.transform(X)

Finally to get our model, we can fit the PLS with the filtered data.

pls = PLS(n_components=5).fit(X_filt,y)

If we have a new set of data, called X_new, we can predict the y values for it as

X_new_filt = opls.transform(X_new)
y_new_predicted = pls.predict(X_new_filt)

or if the corresponding y_new are known, we can estimate the coefficient of determination (r2) with

X_new_filt = opls.transform(X_new)
r2 = pls.score(X_new_filt, y_new)

Having to run two separate models one after the other can be tiresome, so the package also includes as OPLS_PLS object, which does the above much easier:

from OPLS import OPLS_PLS

# Fitting:
opls_pls = OPLS_PLS(n_components=1,pls_components=5).fit(X,y)

# Predicting:
y_new_predicted = opls_pls.predict(X_new)

# Scoring:
r2 = opls_pls.score(X_new, y_new)

MD-simulation utility

In the previous example we had to flatten the MD coordinates before running the regressor. The three regressors have MD-utility counterparts with a _MD-suffix that can take the coordinates as input. The first two blocks of the example then become

import MDAnalysis as mda
import numpy as np

# Make universe and load trajectory to memory
u = mda.Universe("struct.pdb","traj.xtc")
coords = np.array([u.atoms.positions.copy() for ts in u.trajectory])
# Calculate function as distance between res 2 and 150 CA
sel = u.select_atoms("resid 2 150 and name CA")
y = np.linalg.norm(coords[:,sel.indices[0],:]-coords[:,sel.indices[1],:], axis=-1)


from OPLS_MD import PLS_MD, OPLS_MD

opls = OPLS_MD(n_components=1).fit(coord,y)

Instead of a coordinate array, the X can also be given as an MDAnalysis Universe or AtomGroup, with a trajectory length matching the y-array.

Visualising the model

The regressors include an inverse_predict-function, which takes in the y-values and outputs the interpolated corresponding X-structures. This can be useful with univariate y to visualize the coefficent vector of the model. With the *_MD variants the output will have the correct shape of (y.shape[0], natoms, ndim). With a trained PLS_MD model pls, we can write a trajectory with

nsteps=101
u = mda.Universe("struct.pdb")
with mda.Writer("pls_coeff.xtc", u.atoms.n_atoms) as w:
        X_interp = pls.inverse_predict(np.linspace(y.min(),y.max(),nsteps))
        for crd in X_interp:
            u.atoms.positions = crd
            w.write(u)

Testing the number of components

Normally to test the different numbers of components you would simply retrain the models with different numbers. The PLS model saves the coefficients of each previous component up to n_components. The predict, inverse_predict and score -functions take ndim as an optional argument to set with how many components the calculation should be made.

To get the score over each number of components up to 10 you can run

maxcomp = 10
pls = PLS(n_components=maxcomp).fit(X_train, y_train)
score = []
for k in range(1,maxcomp+1):
    score.append(pls.score(X_test, y_test, ncomp=k))

The same works with PLS_MD and OPLS_PLS (and of course OPLS_PLS_MD). With the latter it of course only affects the underlying PLS model.

References

Main references for OPLS:

[1] Wold S, et al. PLS-regression: a basic tool of chemometrics.
    Chemometr Intell Lab Sys 2001, 58, 109–130.
[2] Bylesjo M, et al. Model Based Preprocessing and Background
    Elimination: OSC, OPLS, and O2PLS. in Comprehensive Chemometrics.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

OPLS-MD-0.0.2.tar.gz (16.5 kB view details)

Uploaded Source

Built Distribution

OPLS_MD-0.0.2-py3-none-any.whl (19.9 kB view details)

Uploaded Python 3

File details

Details for the file OPLS-MD-0.0.2.tar.gz.

File metadata

  • Download URL: OPLS-MD-0.0.2.tar.gz
  • Upload date:
  • Size: 16.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.8

File hashes

Hashes for OPLS-MD-0.0.2.tar.gz
Algorithm Hash digest
SHA256 f7902add4c6ed9bc0701688a526ede34d418f0fdbb635908bf7cb0e6089ea1d5
MD5 425fdc406e45ab93c721a4f7b06626e0
BLAKE2b-256 2e2331d5dd7042e70f7c1d731a39828f65770664dfd517a9662c774794c6ced8

See more details on using hashes here.

File details

Details for the file OPLS_MD-0.0.2-py3-none-any.whl.

File metadata

  • Download URL: OPLS_MD-0.0.2-py3-none-any.whl
  • Upload date:
  • Size: 19.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.8

File hashes

Hashes for OPLS_MD-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 2a0854e42d88662c7850539cb3eee991550f1acdecb42a5bc388b4d3999fa03e
MD5 0b4a5c60b22109411819528ce3e96765
BLAKE2b-256 5cdca5c3daa5dc11bb7926f003d9f07ed04081c10880a336604cbb8e38b6fc30

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page