Skip to main content

Chemical Checker Package.

Project description

The Chemical Checker (CC) is a data-driven resource of small molecule bioactivity data. The main goal of the CC is to express data in a format that can be used off-the-shelf in daily computational drug discovery tasks. The resource is organized in 5 levels of increasing complexity, ranging from the chemical properties of the compounds to their clinical outcomes. In between, we consider targets, off-targets, perturbed biological networks and several cell-based assays, including gene expression, growth inhibition, and morphological profiles. The CC is different to other integrative compounds database in almost every aspect. The classical, relational representation of the data is surpassed here by a less explicit, more machine-learning-friendly abstraction of the data.

The CC resource is ever-growing and maintained by the Structural Bioinformatics & Network Biology Laboratory at the Institute for Research in Biomedicine (IRB Barcelona). Should you have any questions, please send an email to miquel.duran@irbbarcelona.org or patrick.aloy@irbbarcelona.org.

This project was first presented to the scientific community in the following paper:

Duran-Frigola M, et al “Extending the small-molecule similarity principle to all levels of biology with the Chemical Checker.” Nature Biotechnology (2020) [link]

and has since produced a number of related publications.

Source data and datasets

The CC is built from public bioactivity data. We are committed to updating the resource every 6 months (versions named accordingly, e.g. chemical_checker_2019_01). New datasets may be incorporated upon request.

The basic data unit of the CC is the dataset. There are 5 data levels (A Chemistry, B Targets, C Networks, D Cells and E Clinics) and, in turn, each level is divided into 5 sublevels or coordinates (A1-E5). Each dataset belongs to one and only one of the 25 coordinates, and each coordinate can have a finite number of datasets (e.g. A1.001), one of which is selected as being exemplary.

The CC is a chemistry-first biomedical resource and, as such, it contains several predefined compound collections that are of interest to drug discoverers, including approved drugs, natural products, and commercial screening libraries.

Signaturization of the data

The main task of the CC is to convert raw data into formats that are suitable inputs for machine-learning toolkits such as scikit-learn.

Accordingly, the backbone pipeline of the CC is devoted to processing every dataset and converting it to a series of formats that may be readily useful for machine learning. The main assets of the CC are the so-called CC signatures:

Signature

Abbreviation

Description

Advantages

Disadvantages

Type 0

sign0

Raw dataset data, expressed in a matrix format.

Explicit data.

Possibly sparse, het erogeneous, u nprocessed.

Type 1

sign1

PCA/LSI projections of the data, accounting for 90% of the data.

Biological signatures of this type can be obtained by simple projection. Easy to compute and require no f ine-tuning.

Variables dimensions, they may still be sparse.

Type 2

sign2

Networ k-embedding of the similarity network.

Fixed -length, usually acceptably short. Suitable for machine learning. Capture global properties of the similarity network.

Information leak due to similarity measures. Hype r-parameter tunning.

Type 3

sign3

Networ k-embedding of the inferred similarity network.

Fixed dimension and available for any molecule.

Possibly very noisy, hence useless, especially for low-data datasets.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chemicalchecker-1.0.6.tar.gz (29.9 MB view details)

Uploaded Source

Built Distribution

chemicalchecker-1.0.6-py3-none-any.whl (8.7 MB view details)

Uploaded Python 3

File details

Details for the file chemicalchecker-1.0.6.tar.gz.

File metadata

  • Download URL: chemicalchecker-1.0.6.tar.gz
  • Upload date:
  • Size: 29.9 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.9.13

File hashes

Hashes for chemicalchecker-1.0.6.tar.gz
Algorithm Hash digest
SHA256 a864989a18420a4c3e5b685fe600f59986e091cbb50a45ffdcd27fa6b28cc510
MD5 d7f2bd21116f6bc0b78847c10b4bef6b
BLAKE2b-256 617c957cec3a6f86f400cd00b106d87fbfae0535c146760787b38115d21164e8

See more details on using hashes here.

File details

Details for the file chemicalchecker-1.0.6-py3-none-any.whl.

File metadata

File hashes

Hashes for chemicalchecker-1.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 8121b7f9bff908d08cefcaa90b25e322567b8c296a0b7942573e82297d914c3b
MD5 c882d2b337be2f04239ce43321ddfc26
BLAKE2b-256 9076c6b77646f2fea5e2acbdbc738c6d4cf9e5c3125769a62f4f6c1d6b06af5b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page