Skip to main content

ChEMBL Structure Pipeline

Project description

CI Testing License: MIT

ChEMBL Structure Pipeline

ChEMBL protocols used to standardise and salt strip molecules. First used in ChEMBL 26.

Check the wiki and paper[1] for a detailed description of the different processes.

Installation

From source:

git clone https://github.com/chembl/ChEMBL_Structure_Pipeline.git
pip install ./ChEMBL_Structure_Pipeline

with pip:

pip install chembl_structure_pipeline

with conda:

conda install -c conda-forge chembl_structure_pipeline

Usage

Standardise a compound (info)

from chembl_structure_pipeline import standardizer

o_molblock = """
  Mrv1810 07121910172D          

  4  3  0  0  0  0            999 V2000
   -2.5038    0.4060    0.0000 C   0  0  3  0  0  0  0  0  0  0  0  0
   -2.5038    1.2310    0.0000 O   0  5  0  0  0  0  0  0  0  0  0  0
   -3.2182   -0.0065    0.0000 N   0  3  0  0  0  0  0  0  0  0  0  0
   -1.7893   -0.0065    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
  1  3  1  0  0  0  0
  1  4  1  4  0  0  0
M  CHG  2   2  -1   3   1
M  END
"""

std_molblock = standardizer.standardize_molblock(o_molblock)

Get the parent compound (info)

from chembl_structure_pipeline import standardizer

o_molblock = """
  Mrv1810 07121910262D          

  3  1  0  0  0  0            999 V2000
   -5.2331    1.1053    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0
   -4.5186    1.5178    0.0000 N   0  3  0  0  0  0  0  0  0  0  0  0
   -2.8647    1.5789    0.0000 Cl  0  5  0  0  0  0  0  0  0  0  0  0
  1  2  1  0  0  0  0
M  CHG  2   2   1   3  -1
M  END
"""

parent_molblock, _ = standardizer.get_parent_molblock(o_molblock)

Check a compound (info)

The checker assesses the quality of a structure. It highlights specific features or issues in the structure that may need to be revised. Together with the description of the issue, the checker process returns a penalty score (between 0-9) which reflects the seriousness of the issue (the higher the score, the more critical is the issue)

from chembl_structure_pipeline import checker

o_molblock = """ 
  Mrv1810 02151908462D           
 
  4  3  0  0  0  0            999 V2000 
    2.2321    4.4196    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0 
    3.0023    4.7153    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0 
    1.4117    4.5059    0.0000 O   0  0  0  0  0  0  0  0  0  0  0  0 
    1.9568    3.6420    0.0000 C   0  0  0  0  0  0  0  0  0  0  0  0 
  1  2  1  1  0  0  0 
  1  3  1  0  0  0  0 
  1  4  1  0  0  0  0 
M  END 
"""

issues = checker.check_molblock(o_molblock)

References

[1] Bento, A.P., Hersey, A., Félix, E. et al. An open source chemical structure curation pipeline using RDKit. J Cheminform 12, 51 (2020). https://doi.org/10.1186/s13321-020-00456-1

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chembl_structure_pipeline-1.2.2.tar.gz (16.0 kB view details)

Uploaded Source

Built Distribution

chembl_structure_pipeline-1.2.2-py3-none-any.whl (17.1 kB view details)

Uploaded Python 3

File details

Details for the file chembl_structure_pipeline-1.2.2.tar.gz.

File metadata

File hashes

Hashes for chembl_structure_pipeline-1.2.2.tar.gz
Algorithm Hash digest
SHA256 625ffc25f86d7d6fe1b459a381b2a52fb1df5fe0c8bbfe9c615e8b7f75c301c8
MD5 eef7d3b1e6644cbbf39917c6840fb3de
BLAKE2b-256 a8a8b37b0d4373e534c036a4e201f72e47e928c2a806f9699af60d126699b9b9

See more details on using hashes here.

File details

Details for the file chembl_structure_pipeline-1.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for chembl_structure_pipeline-1.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 fe7fdcc87e276223af721b0111503c4f168b36fae92257420c00f73515c6485d
MD5 b627769d9816f2520879059acc44de19
BLAKE2b-256 97576813202cfe8afe187ac01c3e96396fca5727309de016309b3d1fda351b84

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page