ChEMBL Structure Pipeline
Project description
ChEMBL Structure Pipeline
ChEMBL protocols used to standardise and salt strip molecules. First used in ChEMBL 26.
Check the wiki and paper[1] for a detailed description of the different processes.
Installation
From source:
git clone https://github.com/chembl/ChEMBL_Structure_Pipeline.git
pip install ./ChEMBL_Structure_Pipeline
with pip:
pip install chembl_structure_pipeline
with conda:
conda install -c conda-forge chembl_structure_pipeline
Usage
Standardise a compound (info)
from chembl_structure_pipeline import standardizer
o_molblock = """
Mrv1810 07121910172D
4 3 0 0 0 0 999 V2000
-2.5038 0.4060 0.0000 C 0 0 3 0 0 0 0 0 0 0 0 0
-2.5038 1.2310 0.0000 O 0 5 0 0 0 0 0 0 0 0 0 0
-3.2182 -0.0065 0.0000 N 0 3 0 0 0 0 0 0 0 0 0 0
-1.7893 -0.0065 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
1 3 1 0 0 0 0
1 4 1 4 0 0 0
M CHG 2 2 -1 3 1
M END
"""
std_molblock = standardizer.standardize_molblock(o_molblock)
Get the parent compound (info)
from chembl_structure_pipeline import standardizer
o_molblock = """
Mrv1810 07121910262D
3 1 0 0 0 0 999 V2000
-5.2331 1.1053 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-4.5186 1.5178 0.0000 N 0 3 0 0 0 0 0 0 0 0 0 0
-2.8647 1.5789 0.0000 Cl 0 5 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
M CHG 2 2 1 3 -1
M END
"""
parent_molblock, _ = standardizer.get_parent_molblock(o_molblock)
Check a compound (info)
The checker assesses the quality of a structure. It highlights specific features or issues in the structure that may need to be revised. Together with the description of the issue, the checker process returns a penalty score (between 0-9) which reflects the seriousness of the issue (the higher the score, the more critical is the issue)
from chembl_structure_pipeline import checker
o_molblock = """
Mrv1810 02151908462D
4 3 0 0 0 0 999 V2000
2.2321 4.4196 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
3.0023 4.7153 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.4117 4.5059 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1.9568 3.6420 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 1 0 0 0
1 3 1 0 0 0 0
1 4 1 0 0 0 0
M END
"""
issues = checker.check_molblock(o_molblock)
References
[1] Bento, A.P., Hersey, A., Félix, E. et al. An open source chemical structure curation pipeline using RDKit. J Cheminform 12, 51 (2020). https://doi.org/10.1186/s13321-020-00456-1
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for chembl_structure_pipeline-1.2.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 874f83f79a672853c6267bdc902db01b61986109d9535549529ab0ebe35c2b3d |
|
MD5 | 7479cd6d611802c7ada104e21bc33ada |
|
BLAKE2b-256 | fb1615e308f46f3b1991ec672b53f3dff85d6ff47e4c8bcd4bc61dce7ab0b189 |
Hashes for chembl_structure_pipeline-1.2.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | e45d7775073025d91def90e0b8a11ea7793e550ba10956cda3a253f4202683f7 |
|
MD5 | 9af4ab6a522ee70ffd0ccbb633ce9077 |
|
BLAKE2b-256 | db5444dff855e912c13a8974c73f7ecc0bb5239c34d02fc7563078e8392f13e6 |