ChEMBL Structure Pipeline
Project description
ChEMBL Structure Pipeline
ChEMBL protocols used to standardise and salt strip molecules. First used in ChEMBL 26.
Check the wiki and paper[1] for a detailed description of the different processes.
Installation
From source:
git clone https://github.com/chembl/ChEMBL_Structure_Pipeline.git
pip install ./ChEMBL_Structure_Pipeline
with pip:
pip install chembl_structure_pipeline
with conda:
conda install -c conda-forge chembl_structure_pipeline
Usage
Standardise a compound (info)
from chembl_structure_pipeline import standardizer
o_molblock = """
Mrv1810 07121910172D
4 3 0 0 0 0 999 V2000
-2.5038 0.4060 0.0000 C 0 0 3 0 0 0 0 0 0 0 0 0
-2.5038 1.2310 0.0000 O 0 5 0 0 0 0 0 0 0 0 0 0
-3.2182 -0.0065 0.0000 N 0 3 0 0 0 0 0 0 0 0 0 0
-1.7893 -0.0065 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
1 3 1 0 0 0 0
1 4 1 4 0 0 0
M CHG 2 2 -1 3 1
M END
"""
std_molblock = standardizer.standardize_molblock(o_molblock)
Get the parent compound (info)
from chembl_structure_pipeline import standardizer
o_molblock = """
Mrv1810 07121910262D
3 1 0 0 0 0 999 V2000
-5.2331 1.1053 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
-4.5186 1.5178 0.0000 N 0 3 0 0 0 0 0 0 0 0 0 0
-2.8647 1.5789 0.0000 Cl 0 5 0 0 0 0 0 0 0 0 0 0
1 2 1 0 0 0 0
M CHG 2 2 1 3 -1
M END
"""
parent_molblock, _ = standardizer.get_parent_molblock(o_molblock)
Check a compound (info)
The checker assesses the quality of a structure. It highlights specific features or issues in the structure that may need to be revised. Together with the description of the issue, the checker process returns a penalty score (between 0-9) which reflects the seriousness of the issue (the higher the score, the more critical is the issue)
from chembl_structure_pipeline import checker
o_molblock = """
Mrv1810 02151908462D
4 3 0 0 0 0 999 V2000
2.2321 4.4196 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
3.0023 4.7153 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1.4117 4.5059 0.0000 O 0 0 0 0 0 0 0 0 0 0 0 0
1.9568 3.6420 0.0000 C 0 0 0 0 0 0 0 0 0 0 0 0
1 2 1 1 0 0 0
1 3 1 0 0 0 0
1 4 1 0 0 0 0
M END
"""
issues = checker.check_molblock(o_molblock)
References
[1] Bento, A.P., Hersey, A., Félix, E. et al. An open source chemical structure curation pipeline using RDKit. J Cheminform 12, 51 (2020). https://doi.org/10.1186/s13321-020-00456-1
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file chembl_structure_pipeline-1.2.2.tar.gz
.
File metadata
- Download URL: chembl_structure_pipeline-1.2.2.tar.gz
- Upload date:
- Size: 16.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 625ffc25f86d7d6fe1b459a381b2a52fb1df5fe0c8bbfe9c615e8b7f75c301c8 |
|
MD5 | eef7d3b1e6644cbbf39917c6840fb3de |
|
BLAKE2b-256 | a8a8b37b0d4373e534c036a4e201f72e47e928c2a806f9699af60d126699b9b9 |
File details
Details for the file chembl_structure_pipeline-1.2.2-py3-none-any.whl
.
File metadata
- Download URL: chembl_structure_pipeline-1.2.2-py3-none-any.whl
- Upload date:
- Size: 17.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | fe7fdcc87e276223af721b0111503c4f168b36fae92257420c00f73515c6485d |
|
MD5 | b627769d9816f2520879059acc44de19 |
|
BLAKE2b-256 | 97576813202cfe8afe187ac01c3e96396fca5727309de016309b3d1fda351b84 |