pmapper: 3D pharmacophore hashes and fingerprints
Project description
Pmapper - 3D pharmacophore signatures and fingerprints
Pmapper is a Python module to generate 3D pharmacophore signatures and fingerprints. Signatures uniquely encode 3D pharmacophores with hashes suitable for fast identification of identical pharmacophores.
Dependency
rdkit >= 2017.09
networkx >= 2
Installation
pip install pmapper
Changelog
1.0.0
-
added functionality to calculate 3D pharmacophore descriptors for molecules with exclusion of single atoms (for the purpose of model interpretation)
-
added convenience function get_feature_ids
-
added function add_feature to manually edit/construct a pharmacophore
-
added save/load of pharmit pharmacophore models
-
IMPORTANT: changed the hashing procedure to make it more stable (pickle dependency was removed). This breaks compatibility with previously generated md5 hashes with
get_signature_md5
,iterate_pharm
anditerate_pharm1
functions, all other functionality was not affected.
1.0.1
fit_model
function can return rms by request
1.0.2
fit_model
function now returns a dict of mapped feature ids
1.0.3
- add
get_subpharmacophore
function - fix
get_mirror_pharmacophore
function to use the same bin step and cached args as for the source pharmacophore instance
1.0.4
- fix installation of dependency
networkx
- add citations on examples of
pmapper
descriptors used for machine learning
1.1
- change SMARTS pattern to avoid matching positively charge nitrogen atoms as H-bond acceptors
1.1.1
- fix missing aromatic features when reading LigandScout models
Examples
Load modules
from pmapper.pharmacophore import Pharmacophore as P
from rdkit import Chem
from rdkit.Chem import AllChem
from pprint import pprint
Create pharmacophore from a single conformer using default feature definitions
# load a molecule from SMILES and generate 3D coordinates
mol = Chem.MolFromSmiles('C1CC(=O)NC(=O)C1N2C(=O)C3=CC=CC=C3C2=O') # talidomide
mol = Chem.AddHs(mol)
AllChem.EmbedMolecule(mol, randomSeed=42)
# create pharmacophore
p = P()
p.load_from_mol(mol)
Get 3D pharmacophore signature
# get 3D pharmacophore signature
sig = p.get_signature_md5()
print(sig)
Output:
98504647beeb143ae50bb6b7798ca0f0
Get 3D pharmacophore signature with non-zero tolerance
sig = p.get_signature_md5(tol=5)
print(sig)
Output:
bc54806ba01bf59736a7b62b017d6e1d
Create pharmacophores for a multiple conformer compound
from pmapper.utils import load_multi_conf_mol
# create multiple conformer molecule
AllChem.EmbedMultipleConfs(mol, numConfs=10, randomSeed=1024)
ps = load_multi_conf_mol(mol)
sig = [p.get_signature_md5() for p in ps]
pprint(sig) # identical signatures occur
Output:
['d5f5f9d65e39cb8605f1fa9db5b2fbb0',
'6204791002d1e343b2bde323149fa780',
'abfabd8a4fcf5719ed6bf2c71a60852c',
'dfe9f17d30210cb94b8dd7acf77feae9',
'abfabd8a4fcf5719ed6bf2c71a60852c',
'e739fb5f9985ce0c65a16da41da4a33f',
'2297ddf0e437b7fc32077f75e3924dcd',
'e739fb5f9985ce0c65a16da41da4a33f',
'182a00bd9057abd0c455947d9cfa457c',
'68f226d474808e60ab1256245f64c2b7']
Identical hashes should correspond to pharmacophores with low RMSD. Pharmacophores #2 and #4 have identical hash abfabd8a4fcf5719ed6bf2c71a60852c
. Let's check RMSD.
from pmapper.utils import get_rms
for i in range(len(ps)):
print("rmsd bewteen 2 and %i pharmacophore:" % i, round(get_rms(ps[2], ps[i]), 2))
Output
rmsd bewteen 2 and 0 pharmacophore: 0.63
rmsd bewteen 2 and 1 pharmacophore: 0.99
rmsd bewteen 2 and 2 pharmacophore: 0.0
rmsd bewteen 2 and 3 pharmacophore: 0.41
rmsd bewteen 2 and 4 pharmacophore: 0.18
rmsd bewteen 2 and 5 pharmacophore: 0.19
rmsd bewteen 2 and 6 pharmacophore: 1.15
rmsd bewteen 2 and 7 pharmacophore: 0.32
rmsd bewteen 2 and 8 pharmacophore: 0.69
rmsd bewteen 2 and 9 pharmacophore: 0.36
They really have RMSD < binning step (1A by default). However, other pharmacophores with distinct hashes also have low RMSD to #2. Identical hashes guarantee low RMSD between corresponding pharmacophores, but not vice versa.
Pharmacophore match
Create a two-point pharmacophore model and match with a pharmacophore of a molecule (both pharmacophores should have identical binning steps)
q = P()
q.load_from_feature_coords([('a', (3.17, -0.23, 0.24)), ('D', (-2.51, -1.28, -1.14))])
p.fit_model(q)
Output
(0, 1)
If they do not match None
will be returned
Generate 3D pharmacophore fingerprint
# generate 3D pharmacophore fingerprint which takes into account stereoconfiguration
b = p.get_fp(min_features=4, max_features=4) # set of activated bits
print(b)
Output (a set of activated bit numbers):
{259, 1671, 521, 143, 912, 402, 278, 406, 1562, 1692, 1835, 173, 558, 1070, 942, 1202, 1845, 823, 1476, 197, 968, 1355, 845, 1741, 1364, 87, 1881, 987, 1515, 378, 628, 1141, 1401, 1146, 2043}
Change settings:
b = p.get_fp(min_features=4, max_features=4, nbits=4096, activate_bits=2)
print(b)
Output (a set of activated bit numbers):
{389, 518, 2821, 1416, 2952, 395, 3339, 511, 3342, 1937, 1042, 2710, 1817, 1690, 3482, 3737, 286, 1824, 1700, 804, 1318, 2729, 3114, 812, 556, 175, 3763, 2356, 3124, 1077, 1975, 3384, 1081, 185, 65, 1223, 713, 1356, 1998, 1487, 2131, 85, 3670, 1877, 3030, 2395, 1116, 2141, 1885, 347, 2404, 1382, 1257, 3049, 2795, 3691, 2541, 1646, 2283, 241, 113, 3698, 756, 2548, 4086, 2293, 1528, 2802, 127}
Save/load pharmacophore
p.save_to_pma('filename.pma')
Output is a text file having json format.
p = P()
p.load_from_pma('filename.pma')
Support other formats
Pharmacophores can be saved/loaded from LigandScout pml-files. Also pharmacophores can be read from xyz-files.
Caching
Pharmacophores can be created with enabled cached
argument. This will speed up all futher repeated calls to retrive hash, fingerprints or descriptors.
p = P(cached=True)
Speed tests
Generation of pharmacophore signatures (hashes) is a CPU-bound task. The computation speed depends on the number of features in pharmacophores.
Tests were run on a random subset of compounds from Drugbank. Up to 50 conformers were generated for each compound.
Laptop configuration:
- Intel(R) Core(TM) i7-5500U CPU @ 2.40GHz
- 12 GB RAM
- calculation was run in 1 thread (the module is thread safe and calculations can be parallelized)
To run the speed test use pmapper_speed_test
command line tool
========== Reading of conformers of molecules ==========
329 molecules were read in 0.0134 s
========== Creation of pharmacophores (with enabled caching) ==========
1938 pharmacophores were created in 3.17065 s
========== First calculation of hashes ==========
2 pharmacophores with 0 features - 0.00014s or 7e-05s per pharmacophore
2 pharmacophores with 1 features - 0.0001s or 5e-05s per pharmacophore
12 pharmacophores with 2 features - 0.00042s or 3e-05s per pharmacophore
44 pharmacophores with 3 features - 0.00212s or 5e-05s per pharmacophore
100 pharmacophores with 4 features - 0.00933s or 9e-05s per pharmacophore
103 pharmacophores with 5 features - 0.05155s or 0.0005s per pharmacophore
105 pharmacophores with 6 features - 0.10857s or 0.00103s per pharmacophore
109 pharmacophores with 7 features - 0.25322s or 0.00232s per pharmacophore
117 pharmacophores with 8 features - 0.59508s or 0.00509s per pharmacophore
101 pharmacophores with 9 features - 0.8795s or 0.00871s per pharmacophore
105 pharmacophores with 10 features - 1.61349s or 0.01537s per pharmacophore
100 pharmacophores with 11 features - 2.24937s or 0.02249s per pharmacophore
103 pharmacophores with 12 features - 3.53308s or 0.0343s per pharmacophore
117 pharmacophores with 13 features - 6.49837s or 0.05554s per pharmacophore
103 pharmacophores with 14 features - 7.54796s or 0.07328s per pharmacophore
142 pharmacophores with 15 features - 14.92654s or 0.10512s per pharmacophore
104 pharmacophores with 16 features - 13.86378s or 0.13331s per pharmacophore
100 pharmacophores with 17 features - 17.94023s or 0.1794s per pharmacophore
120 pharmacophores with 18 features - 28.01455s or 0.23345s per pharmacophore
136 pharmacophores with 19 features - 42.53481s or 0.31276s per pharmacophore
113 pharmacophores with 20 features - 45.88228s or 0.40604s per pharmacophore
========== Second calculation of hashes of the same pharmacophores ==========
2 pharmacophores with 0 features - 5e-05s or 2e-05s per pharmacophore
2 pharmacophores with 1 features - 3e-05s or 1e-05s per pharmacophore
12 pharmacophores with 2 features - 0.00012s or 1e-05s per pharmacophore
44 pharmacophores with 3 features - 0.00041s or 1e-05s per pharmacophore
100 pharmacophores with 4 features - 0.00089s or 1e-05s per pharmacophore
103 pharmacophores with 5 features - 0.00166s or 2e-05s per pharmacophore
105 pharmacophores with 6 features - 0.00316s or 3e-05s per pharmacophore
109 pharmacophores with 7 features - 0.00707s or 6e-05s per pharmacophore
117 pharmacophores with 8 features - 0.0166s or 0.00014s per pharmacophore
101 pharmacophores with 9 features - 0.02005s or 0.0002s per pharmacophore
105 pharmacophores with 10 features - 0.03527s or 0.00034s per pharmacophore
100 pharmacophores with 11 features - 0.05271s or 0.00053s per pharmacophore
103 pharmacophores with 12 features - 0.08097s or 0.00079s per pharmacophore
117 pharmacophores with 13 features - 0.13274s or 0.00113s per pharmacophore
103 pharmacophores with 14 features - 0.1588s or 0.00154s per pharmacophore
142 pharmacophores with 15 features - 0.32687s or 0.0023s per pharmacophore
104 pharmacophores with 16 features - 0.29255s or 0.00281s per pharmacophore
100 pharmacophores with 17 features - 0.38286s or 0.00383s per pharmacophore
120 pharmacophores with 18 features - 0.61327s or 0.00511s per pharmacophore
136 pharmacophores with 19 features - 0.93486s or 0.00687s per pharmacophore
113 pharmacophores with 20 features - 0.94041s or 0.00832s per pharmacophore
Documentation
More documentation can be found here - https://pmapper.readthedocs.io/en/latest/
Citation
Ligand-Based Pharmacophore Modeling Using Novel 3D Pharmacophore Signatures
Alina Kutlushina, Aigul Khakimova, Timur Madzhidov, Pavel Polishchuk
Molecules 2018, 23(12), 3094
https://doi.org/10.3390/molecules23123094
Further publications
MD pharmacophores
Virtual Screening Using Pharmacophore Models Retrieved from Molecular Dynamic Simulations
Pavel Polishchuk, Alina Kutlushina, Dayana Bashirova, Olena Mokshyna, Timur Madzhidov
Int. J. Mol. Sci. 2019, 20(23), 5834
https://doi.org/10.3390/ijms20235834
Pmapper descriptors in machine learning
QSAR Modeling Based on Conformation Ensembles Using a Multi-Instance Learning Approach
Zankov, D. V.; Matveieva, M.; Nikonenko, A. V.; Nugmanov, R. I.; Baskin, I. I.; Varnek, A.; Polishchuk, P.; Madzhidov, T. I.
J. Chem. Inf. Model. 2021, 61 (10), 4913-4923
https://doi.org/10.1021/acs.jcim.1c00692
Multi-Instance Learning Approach to the Modeling of Enantioselectivity of Conformationally Flexible Organic Catalysts
Zankov, D.; Madzhidov, T.; Polishchuk, P.; Sidorov, P.; Varnek, A.
J. Chem. Inf. Model. 2023, 63 (21), 6629-6641
https://doi.org/10.1021/acs.jcim.3c00393
License
BSD-3 clause
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pmapper-1.1.1.tar.gz
.
File metadata
- Download URL: pmapper-1.1.1.tar.gz
- Upload date:
- Size: 570.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.0.post20200528 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f918c46dfbc3fa8e5c28296b259c84eb784fa08fff2b79a03117a847d07987f4 |
|
MD5 | 13b058ca69a59b4b851f5d769ea841ef |
|
BLAKE2b-256 | 5b61f75cfcad4f3e18218bce9849ad8c412095bb1a099cc050a2f99b7df9a507 |
File details
Details for the file pmapper-1.1.1-py3-none-any.whl
.
File metadata
- Download URL: pmapper-1.1.1-py3-none-any.whl
- Upload date:
- Size: 567.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/2.0.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.0.post20200528 requests-toolbelt/0.9.1 tqdm/4.47.0 CPython/3.6.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 79922063b6862b535f1d2c48e28936bec7b6604060f62a766b9a6ac2aaf75416 |
|
MD5 | 7b9348397e161ecaa9e9183461e0bb28 |
|
BLAKE2b-256 | 8b689bc65d65168ac360758af6a0e921c294e14c60ff3af37b40cfa9082c0525 |