Skip to main content

PLS and OPLS regressors

Project description

OPLS-MD

OPLS and PLS regressors with some utility to help with molecular dynamics (MD) simulation analysis.

Installation

Run

pip install OPLS-MD

or to get the latest development commits (if any) clone the github repo and run

pip install .

General usage

Given a structure file and trajectory as struct.pdb and traj.xtc we can e.g. define a function as the distance of two specific C alpha atoms:

import MDAnalysis as mda
import numpy as np

# Make universe and load trajectory to memory
u = mda.Universe("struct.pdb","traj.xtc")
coords = np.array([u.atoms.positions.copy() for ts in u.trajectory])
# Calculate function as distance between res 2 and 150 CA
sel = u.select_atoms("resid 2 150 and name CA")
y = np.linalg.norm(coords[:,sel.indices[0],:]-coords[:,sel.indices[1],:], axis=-1)
# Flatten from N by M by 3 dimensions to N by 3M
X = coords.reshape((coords.shape[0],-1))

Now that we have shape N by 3M array of X values and a one dimensional y, we can start running the OPLS

from OPLS_MD import PLS, OPLS

opls = OPLS(n_components=1).fit(X,y)

This will fit a one component OPLS model on the data. To get a new X array, where we have filtered out the orthogonal components, we can use the transform function

X_filt = opls.transform(X)

Finally to get our model, we can fit the PLS with the filtered data.

pls = PLS(n_components=5).fit(X_filt,y)

If we have a new set of data, called X_new, we can predict the y values for it as

X_new_filt = opls.transform(X_new)
y_new_predicted = pls.predict(X_new_filt)

or if the corresponding y_new are known, we can estimate the coefficient of determination (r2) with

X_new_filt = opls.transform(X_new)
r2 = pls.score(X_new_filt, y_new)

Having to run two separate models one after the other can be tiresome, so the package also includes as OPLS_PLS object, which does the above much easier:

from OPLS import OPLS_PLS

# Fitting:
opls_pls = OPLS_PLS(n_components=1,pls_components=5).fit(X,y)

# Predicting:
y_new_predicted = opls_pls.predict(X_new)

# Scoring:
r2 = opls_pls.score(X_new, y_new)

MD-simulation utility

In the previous example we had to flatten the MD coordinates before running the regressor. The three regressors have MD-utility counterparts with a _MD-suffix that can take the coordinates as input. The first two blocks of the example then become

import MDAnalysis as mda
import numpy as np

# Make universe and load trajectory to memory
u = mda.Universe("struct.pdb","traj.xtc")
coords = np.array([u.atoms.positions.copy() for ts in u.trajectory])
# Calculate function as distance between res 2 and 150 CA
sel = u.select_atoms("resid 2 150 and name CA")
y = np.linalg.norm(coords[:,sel.indices[0],:]-coords[:,sel.indices[1],:], axis=-1)


from OPLS_MD import PLS_MD, OPLS_MD

opls = OPLS_MD(n_components=1).fit(coord,y)

Instead of a coordinate array, the X can also be given as an MDAnalysis Universe or AtomGroup, with a trajectory length matching the y-array.

Visualising the model

The regressors include an inverse_predict-function, which takes in the y-values and outputs the interpolated corresponding X-structures. This can be useful with univariate y to visualize the coefficent vector of the model. With the *_MD variants the output will have the correct shape of (y.shape[0], natoms, ndim). With a trained PLS_MD model pls, we can write a trajectory with

nsteps=101
u = mda.Universe("struct.pdb")
with mda.Writer("pls_coeff.xtc", u.atoms.n_atoms) as w:
        X_interp = pls.inverse_predict(np.linspace(y.min(),y.max(),nsteps))
        for crd in X_interp:
            u.atoms.positions = crd
            w.write(u)

Testing the number of components

Normally to test the different numbers of components you would simply retrain the models with different numbers. The PLS model saves the coefficients of each previous component up to n_components. The predict, inverse_predict and score -functions take ndim as an optional argument to set with how many components the calculation should be made.

To get the score over each number of components up to 10 you can run

maxcomp = 10
pls = PLS(n_components=maxcomp).fit(X_train, y_train)
score = []
for k in range(1,maxcomp+1):
    score.append(pls.score(X_test, y_test, ncomp=k))

The same works with PLS_MD and OPLS_PLS_MD. With the latter it only affects the underlying PLS model, not the OPLS filter.

References

Main references for OPLS:

[1] Wold S, et al. PLS-regression: a basic tool of chemometrics.
    Chemometr Intell Lab Sys 2001, 58, 109–130.
[2] Bylesjo M, et al. Model Based Preprocessing and Background
    Elimination: OSC, OPLS, and O2PLS. in Comprehensive Chemometrics.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

OPLS-MD-0.0.5.tar.gz (15.9 kB view details)

Uploaded Source

Built Distribution

OPLS_MD-0.0.5-py3-none-any.whl (19.3 kB view details)

Uploaded Python 3

File details

Details for the file OPLS-MD-0.0.5.tar.gz.

File metadata

  • Download URL: OPLS-MD-0.0.5.tar.gz
  • Upload date:
  • Size: 15.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for OPLS-MD-0.0.5.tar.gz
Algorithm Hash digest
SHA256 3ee6ee18ee748aa2ffca189d42a1fcd612e044ba7f82f2bffbfa27d8d320b72e
MD5 846118b741a536e0de9c4f1765f2cce3
BLAKE2b-256 973f2496bf4b2c1e8a28bead4d3bb73b97e60b02868580589f206462e251619c

See more details on using hashes here.

File details

Details for the file OPLS_MD-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: OPLS_MD-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 19.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for OPLS_MD-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 5c727214c5869913a2ab3066e605ff60fff19364ec6e7d60be83a3e5802049cd
MD5 8c747d61838acdaa5c5f5b3135d8f11a
BLAKE2b-256 8a623fd0a02d1f797382adb3c4b25e9b01a96ffa4ff4057cfcdb9378bb505f9e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page