ChEMBL Structure Pipeline
Project description
ChEMBL Structure Pipeline
ChEMBL protocols used to standardise and salt strip molecules. First used in ChEMBL 26.
Check the wiki and paper[1] for a detailed description of the different processes.
Installation
From source:
git clone https://github.com/chembl/ChEMBL_Structure_Pipeline.git
pip install ./ChEMBL_Structure_Pipeline
with pip:
pip install chembl_structure_pipeline
with conda:
conda install -c conda-forge chembl_structure_pipeline
Usage
Standardise a compound (info)
from chembl_structure_pipeline import standardizer
o_molblock = """
Mrv1810 07121910172D
4 3 0 0 0 0 999 V2000
-2.5038 0.4060 0.0000 C 0 0 3 0 0 0 0 0 0 0 0 0
-2.5038 1.2310 0.0000 O 0 5 0 0 0 0 0 0 0 0 0 0
-3.2182 -0.0065 0.0000 N 0 3 0 0 0 0 0 0 0 0 0 0
-1.7893 -0.0065 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
1 3 1 0 0 0 0
1 4 1 4 0 0 0
M CHG 2 2 -1 3 1
M END
"""
std_molblock = standardizer.standardize_molblock(o_molblock)
Get the parent compound (info)
from chembl_structure_pipeline import standardizer
o_molblock = """
Mrv1810 07121910262D
3 1 0 0 0 0 999 V2000
-5.2331 1.1053 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-4.5186 1.5178 0.0000 N 0 3 0 0 0 0 0 0 0 0 0 0
-2.8647 1.5789 0.0000 Cl 0 5 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
M CHG 2 2 1 3 -1
M END
"""
parent_molblock, _ = standardizer.get_parent_molblock(o_molblock)
Check a compound (info)
The checker assesses the quality of a structure. It highlights specific features or issues in the structure that may need to be revised. Together with the description of the issue, the checker process returns a penalty score (between 0-9) which reflects the seriousness of the issue (the higher the score, the more critical is the issue)
from chembl_structure_pipeline import checker
o_molblock = """
Mrv1810 02151908462D
4 3 0 0 0 0 999 V2000
2.2321 4.4196 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
3.0023 4.7153 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.4117 4.5059 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1.9568 3.6420 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 1 0 0 0
1 3 1 0 0 0 0
1 4 1 0 0 0 0
M END
"""
issues = checker.check_molblock(o_molblock)
References
[1] Bento, A.P., Hersey, A., Félix, E. et al. An open source chemical structure curation pipeline using RDKit. J Cheminform 12, 51 (2020). https://doi.org/10.1186/s13321-020-00456-1
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for chembl_structure_pipeline-1.2.2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 625ffc25f86d7d6fe1b459a381b2a52fb1df5fe0c8bbfe9c615e8b7f75c301c8 |
|
MD5 | eef7d3b1e6644cbbf39917c6840fb3de |
|
BLAKE2b-256 | a8a8b37b0d4373e534c036a4e201f72e47e928c2a806f9699af60d126699b9b9 |
Hashes for chembl_structure_pipeline-1.2.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | fe7fdcc87e276223af721b0111503c4f168b36fae92257420c00f73515c6485d |
|
MD5 | b627769d9816f2520879059acc44de19 |
|
BLAKE2b-256 | 97576813202cfe8afe187ac01c3e96396fca5727309de016309b3d1fda351b84 |