Merging, linking and placing compounds by stitching them together like a reanimated corpse

These details have not been verified by PyPI

Project links

Homepage

Project description

Fragmenstein

Fragmenstein: Merging, linking and placing compounds by stitching bound compounds together like a reanimated corpse.

Name	Colab Link	PyRosetta	Description
Pipeline		✔	Given a template and a some hits, merge them and place the most similar purchasable analogues from Enamine REAL
Light		❌	Generate molecules and see how they merge and how a placed compound fairs

For manuscript data see manuscript data repository For authors see Authors For command line interface see Command line interface

Stitched molecules

Fragmenstein can perform two different tasks.

Combine hits
Place a given followup molecule (SMILES) based on series of hits

overview

Like Frankenstein's creation it may violate the laws of chemistry. Trigonal planar topologies may be tetrahedral, bonds unnaturally long etc. This monstrosity is therefore then energy minimised with strong constraints within the protein.

Classes

There are four main classes —named after characters from the Fragmenstein book and movies:

Monster makes the stitched together molecules indepent of the protein — documentation
Igor uses PyRosetta to minimise in the protein the fragmenstein monster followup — documentation
Victor is a pipeline that calls the parts, with several features, such as warhead switching —documentation
Laboratory does all the combinatorial operations with Victor (specific case)

NB. In the absence of pyrosetta (which requires an academic licence), all bar Igor work and alternative Victor classes need to be used, for example Wictor (RDkit minimisation only), `OpenVictor (using OpenMM).

Additionally, there are a few minor classes.

One of these is mRMSD, a multiple RMSD variant which does not superpose/align and bases which atoms to use on coordinates —documentation

The class Walton performs geometric manipulations of compounds, to set them up to demonstrate features of Fragmenstein (like captain Walton, it does not partake in the plot, but is key to the narration)

There are two module hosted elsewhere:

Rectifier from molecular_rectifier is a class that corrects mistakes in the molecule automatically merged by Monster.
Params from rdkit to params module parameterises the ligands

Combine

It can also merge and link fragment hits by itself and find the best scoring mergers. For details about linking see linking notes. It uses the same overlapping position clustering, but also has a decent amount of impossible/uncommon chemistry prevention.

Monster:

from fragmenstein import Monster
monster = Monster(hits=[hits_a, hit_b])
monster.combine()
monster.positioned_mol #: RDKit.Chem.Mol

Victor:

from fragmenstein import Victor
import pyrosetta
pyrosetta.init( extra_options='-no_optH false -mute all -ex1 -ex2 -ignore_unrecognized_res false -load_PDB_components false -ignore_waters false')

victor = Victor(hits=[hits_a, hit_b], 
                pdb_filename='foo.pdb',  # or pdb_block='ATOM 1 MET ...'
                covalent_resi=1) # if not covalent, just put the first residue or something.
victor.combine()
victor.minimized_mol

The PyRosetta init step can be done with the helper function:

Igor.init_pyrosetta()

The two seem similar, but Victor places with Monster and minimises with Igor. As a result it has energy scores

victor.ddG

Fragmenstein is not really a docking algorithm as it does not find the pose with the lowest energy within a given volume. Consequently, it is a method to find how faithful is a given followup to the hits provided. Hence the minimised pose should be assessed by the RMSD metric or similar and the ∆∆G score used solely as a cutoff —lower than zero.

For a large number of combination:

from fragmenstein import Laboratory

lab = Laboratory(pdbblock=pdbblock, covalent_resi=None)
combinations:pd.DataFrame = lab.combine(hits, n_cores=28)

Place

Here is an interactive example of placed molecules.

It is rather tolerant to erroneous/excessive submissions (by automatically excluding them) and can energy minimise strained conformations. summary

Three mapping approaches were tested, but the key is that hits are pairwise mapped to each other by means of one-to-one atom matching based upon position as opposed to similarity which is easily led astray. For example, note here that the benzene and the pyridine rings overlap, not the two pyridine rings:

RDkit only and OpenMM

PyRosetta is needed for the pocket-centric minimisation. Two alternatives are available:

Wictor (without): stops at the RDKit minimisation
OpenVictor (with OpenMM): uses OpenMM to minimise in the protein

Whereas the PyRosetta steps operate via Igor, OpenVictor uses Fritz. OpenMM is a lot slower than PyRosetta on CPU only, but is free, open source and potentially more accurate.

Igor is a much larger class as it needs to disable rotamer sampling and other things, which is not an issue in OpenMM.

A further detail is that openMM is already parallel, therefore when using with Laboratory request only one core.

from fragmenstein import Laboratory, OpenVictor
Laboratory.Victor = OpenVictor
lab = Laboratory(pdbblock=MPro.get_template())
combinations: pd.DataFrame = lab.combine(hits,
                                         n_cores=1,  # 1 core unless $OPENMM_CPU_THREADS is set
                                         timeout=600,  # 2 minutes
                                         combination_size=2,  # pairwise
                                         max_tasks=0)  # 0 is no chunking

Examples

Monster:

from fragmenstein import Monster
monster = Monster(hits=[hits_a, hit_b])
monster.place_smiles('CCO')
monster.positioned_mol

Victor:

from fragmenstein import Victor, Igor
    Igor.init_pyrosetta()
    victor = Victor(hits=[hits_a, hit_b], pdb_filename='foo.pdb')
    victor.place('CCO')
    victor.minimized_mol

For a lengthier example see example notes or documentation.

Demo data

Some demo data is provided in the demo submodule.

from fragmenstein.demo import MPro, Mac1

pdbblock: str = Mac1.get_template()
for hitname in Mac1.get_hit_list():
    Mac1.get_hit(hitname)
    ...

To use SAR-COV-2 MPro as a test bed, the following may be helpful:

fragmenstein.MProVictor, a derived class (of Victor), with various presents specific for MPro.
fragemenstein.get_mpro_template(), returns the PDB block (str) of MPro
fragemenstein.get_mpro_molblock(xnumber), returns the mol block (str) of a MPro hit from Fragalysis
fragemenstein.get_mpro_mol(xnumber), as above but returns a Chem.Mol instance.

For the matched sets of derivative hits to reference hits see the manuscript's data repository.

Other features

Installation

Fragmenstein and dependencies

Python 3.6 or above. Install from pipy

python -m pip install fragmenstein

Requires Pyrosetta

:warning: PyRosetta no longer runs on CentOS 7 due to old kernel headers (cf. blog post).

Pyrosetta requires a password to be downloaded (academic licence) obtained by https://els2.comotion.uw.edu/product/pyrosetta. This is a different licence from the Rosetta one. The username of the Rosetta binaries is formatted variant of "academic user", while the PyRosetta is the name of a researcher whose name bares an important concept in protein folding, like boltzmann + constant (but is not that). Pyrosetta can be downloaded via a browser from http://www.pyrosetta.org/dow. Or in the terminal via:

curl -u 👾👾👾:👾👾👾https://graylab.jhu.edu/download/PyRosetta4/archive/release/PyRosetta4.Release.python38.linux/PyRosetta4.Release.python38.linux.release-NNN.tar.bz2 -o a.tar.bz2
tar -xf a.tar.bz2
cd PyRosetta4.Release.python38.linux
sudo pip3 install .

or using conda

or using install_pyrosetta from the pyrosetta-help package.

pip install pyrosetta-help
PYROSETTA_USERNAME=👾👾👾 PYROSETTA_PASSWORD=👾👾👾 install_pyrosetta

The PYROSETTA_USERNAME and PYROSETTA_PASSWORD are environment variables, which should not be shared publicly (i.e. store them as private environmental variables in your target application).

Origin

See Fragmenstein and COVID moonshot.

Fragmenstein was created to see how reasonable are the molecules of fragment mergers submitted in the COVID moonshot project, because after all the underlying method is fragment based screening. This dataset has some unique peculiarities that potentially are not encountered in other projects.

Command line interface

The strength of Fragmenstein is as a python module, but there is a command line interface. This allows different levels of usage. The top level is the fragmestein pipeline, which does the whole thing, namely it

place the reference hits against themselves and gets the PLIP interactions
combines the hits in given combination size, while skipping blacklisted named compounds.
searches in SmallWorld the top N mergers
places them and
ranks them based on a customisable multiobjective function, which takes into account the PLIP interactions along with number of novel atoms (increase in risk & novelty).

This in effect reflects the pipeline I commonly use.

pipeline

usage: fragmenstein pipeline [-h] -t TEMPLATE -i INPUT [-o OUTPUT] [-r RANKING] [-c CUTOFF] [-q QUICK] [-d SW_DIST] [-l SW_LENGTH] [-b SW_DATABASES [SW_DATABASES ...]] [-s SUFFIX] [-n N_CORES] [-m COMBINATION_SIZE] [-k TOP_MERGERS] [-e TIMEOUT] [-x MAX_TASKS] [-z BLACKLIST] [-j WEIGHTS] [-v]

export N_CORES=$(cat /proc/cpuinfo | grep processor | wc -l);
fragmenstein pipeline \
                      --template reference.pdb \
                      --hits filtered.sdf \
                      --n_cores $(($N_CORES - 1)) \
                      --suffix _pairs \
                      --max_tasks 5000 \
                      --sw_databases REAL-Database-22Q1.smi.anon MculeUltimate-20Q2.smi.anon \
                      --combination_size 2 \
                      --timeout 600;

template: The template, preferably a polished PDB
hits: The hits in sdf format. These need to have unique names.
output: The output folder
suffix: The suffix for the output files. Note that due to max_tasks there will be multiple sequential files for some steps.
quick: Does not reattempt "reanimation" if it failed as the constraints are relaxed more and more the more deviation happens.
blacklist: A file with a lines for each molecule name to not perform (say hitA–hitZ)
cutoff: The joining cutoff in Ångström after which linkages will not be attempted (default is 5Å)
sw_databases: See SmallWold or the SmallWorld API in Python for what datasets are available (e.g. 'Enamine-BB-Stock-Mar2022.smi.anon').
sw_length: How many analogues for each query to keep
sw_dist: The distance cutoff for the SmallWorld search
max_tasks: To avoid memory issues, the pipeline performs a number of tasks (controlled via max_tasks) before processing them, to disable this use --max_tasks 0.
weights: This is a JSON file that controls the ranking

Specific cases

fragmenstein monster combine -i hit1.mol hit2.mol >> combo.mol
fragmenstein monster place -i hit1.mol hit2.mol -s 'CCO' >> placed.mol
fragmenstein victor combine -i hit1.mol hit2.mol -t protein.pdb -o output >> combo.mol
fragmenstein victor combine -i hit1.mol hit2.mol -s 'NCO' -n molname -t protein.pdb -o output >> placed.mol
fragmenstein laboratory combine -i hits.sdf -o output -d output.csv -s output.sdf -c 24

Authors

Author	Role	Homepage	Department
Matteo Ferla	main developer	WCHG	Wellcome Centre for Human Genetics, University of Oxford
Rubén Sánchez-Garcia	discussion/code	Stats	Department of Statistics, University of Oxford
Rachael Skyner	discussion/editing/code
Stefan Gahbauer	discussion
Jenny Taylor	PI	WCHG	Wellcome Centre for Human Genetics, University of Oxford
Charlotte Deane	PI
Frank von Delft	PI	CMD	Diamond Lightsource / CMD, Oxford
Brian Marsden	PI	CMD	CMD, Oxford

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

1.0.7

Jun 29, 2024

1.0.6

May 15, 2024

1.0.5

May 15, 2024

1.0.4

May 2, 2024

1.0.3

Apr 5, 2024

1.0.2

Mar 13, 2024

1.0.1

Feb 22, 2024

1.0.0

Feb 21, 2024

0.15.0

Feb 21, 2024

0.14.7

Feb 13, 2024

0.14.4

Jan 17, 2024

0.14.2

Dec 18, 2023

0.14.1

Dec 14, 2023

0.14.0

Dec 12, 2023

0.13.38

Dec 8, 2023

0.13.36

Nov 23, 2023

0.13.35

Nov 23, 2023

0.13.34

Nov 14, 2023

0.13.33

Nov 13, 2023

0.13.32

Nov 7, 2023

0.13.31

Oct 27, 2023

0.13.30

Oct 24, 2023

This version

0.13.20

Oct 16, 2023

0.13.8

Oct 12, 2023

0.13.7

Oct 12, 2023

0.12.12

Sep 21, 2023

0.12.11

Aug 30, 2023

0.12.10

Aug 22, 2023

0.12.9

Aug 10, 2023

0.12.8

Aug 1, 2023

0.12.7

Jul 24, 2023

0.12.6

Jul 23, 2023

0.12.5

Jul 23, 2023

0.12.4

Jul 22, 2023

0.12.3

Jul 22, 2023

0.12.2

Jul 21, 2023

0.12.1

Jul 13, 2023

0.12.0

Jul 12, 2023

0.11.0

Jul 7, 2023

0.10.7

Jul 5, 2023

0.10.6

Jun 30, 2023

0.10.5

May 15, 2023

0.10.4

Apr 28, 2023

0.10.3

Apr 26, 2023

0.10.2

Apr 25, 2023

0.10.1

Mar 9, 2023

0.10

Mar 8, 2023

0.9.13

Mar 6, 2023

0.9.12.6

Feb 8, 2023

0.9.12.4

Jan 25, 2023

0.9.12.3

Jan 17, 2023

0.9.12.2

Jan 11, 2023

0.9.12.1

Jan 11, 2023

0.9.12

Jan 5, 2023

0.9.11

Nov 30, 2022

0.9.10

Aug 3, 2022

0.9.9

Jun 8, 2022

0.9.8

Jun 8, 2022

0.9.7

Jun 1, 2022

0.9.6

Jun 1, 2022

0.9.5

Jun 1, 2022

0.9.4

Jun 1, 2022

0.9.3

Jun 1, 2022

0.9.2

May 31, 2022

0.9.0

May 25, 2022

0.8.5

May 9, 2022

0.8.4

May 9, 2022

0.8.3

Apr 26, 2022

0.8.2

Apr 22, 2022

0.8.0

Apr 21, 2022

0.7.3

Apr 8, 2022

0.7.2

Apr 8, 2022

0.7.1

Apr 8, 2022

0.7.0

Jan 14, 2022

0.6.13

Dec 21, 2021

0.6.12

Dec 21, 2021

0.6.11

Dec 20, 2021

0.6.10

Dec 20, 2021

0.6.8

Dec 17, 2021

0.6.7

Dec 14, 2021

0.6.5

Mar 3, 2021

0.6.4

Mar 3, 2021

0.6.3

Feb 18, 2021

0.6.2

Feb 14, 2021

0.6.1

Feb 13, 2021

0.6

Feb 13, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Fragmenstein-0.13.20.tar.gz (610.2 kB view hashes)

Uploaded Oct 16, 2023 Source

Hashes for Fragmenstein-0.13.20.tar.gz

Hashes for Fragmenstein-0.13.20.tar.gz
Algorithm	Hash digest
SHA256	`ce3a517cca174dfe62d4ea6830986b0a02f3496db786e23b5e92d7ebf0ab5c43`
MD5	`35ae2f0d189619199e7d6e08d3eb766a`
BLAKE2b-256	`b2f51a9531c5757d6d4c82c2236d2dc919228c71f33e7229a70e70c959b0e751`