Skip to main content

A Python package to process & model ChEMBL data.

Project description

insilico: A Python package to process & model ChEMBL data.

PyPI version License: MIT

ChEMBL is a manually curated chemical database of bioactive molecules with drug-like properties. It is maintained by the European Bioinformatics Institute (EBI), of the European Molecular Biology Laboratory (EMBL) based in Hinxton, UK.

insilico helps drug researchers find promising compounds for drug discovery. It preprocesses ChEMBL molecular data and outputs Lapinski's descriptors and chemical fingerprints using popular bioinformatic libraries. Additionally, this package can be used to make a decision tree model that predicts drug efficacy.

About the package name

The term in silico is a neologism used to mean pharmacology hypothesis development & testing performed via computer (silicon), and is related to the more commonly known biological terms in vivo ("within the living") and in vitro ("within the glass".)

Installation

Installation via pip:

$ pip install insilico

Installation via cloned repository:

$ git clone https://github.com/konstanzer/insilico
$ cd insilico
$ python setup.py install

Python dependencies

For preprocessing, rdkit-pypi, padelpy, and chembl_webresource_client and for modeling, sklearn and seaborn

Basic Usage

insilico offers two primary functions: one to search the ChEMBL database and a second to output preprocessed ChEMBL data based on the molecular ID, which saves the chemical fingerprint in the data folder.

Using the chemical fingerprint, the ModelChembl class creates a decision tree and outputs residual plots and metrics. When declaring the modeling class, you may specify a test set size and a variance threshold, which sets the minimum variance allowed for each column. This optional step can eliminate hundreds of features unhelpful for modeling.

When calling the tree function, you may specify max tree depth and cost-complexity alpha, hyperparameters to control overfitting.

from insilico import target_search, process_target_data, Model

# return search results for 'P. falciparum D6'
result = target_search('P. falciparum D6')

# return molecular data for CHEMBL2367107 (P. falciparum D6)
df = process_target_data('CHEMBL2367107')

# display molecular descriptor plots
plot_descriptors(df)

model = ModelChembl(df, test_size=0.2, var_threshold=0.15)

# return a fitted decision tree & test set predictions
tree, predictions = model.tree(max_depth=50, ccp_alpha=0.)

# return metrics (R^2 and MAE) & display plots for test set
metrics = model.evaluate(predictions)

# return split data for other modeling
X_train, X_test, y_train, y_test = model.get_data()

Advanced option: Use optional 'fp' parameter to specify fingerprinter

Valid fingerprinters are "PubchemFingerprinter" (default), "ExtendedFingerprinter", "EStateFingerprinter", "GraphOnlyFingerprinter", "MACCSFingerprinter", "SubstructureFingerprinter", "SubstructureFingerprintCount", "KlekotaRothFingerprinter", "KlekotaRothFingerprintCount", "AtomPairs2DFingerprinter", and "AtomPairs2DFingerprintCount".

df = process_target_data('CHEMBL2367107', fp='SubstructureFingerprinter')

Contributing, Reporting Issues & Support

Make a pull request if you'd like to contribute to insilico. Contributions should include tests for new features added and documentation. File an issue to report problems with the software or feature requests. Include information such as error messages, your OS/environment and Python version.

Questions may be sent to Steven Newton (steven.j.newton99@gmail.com).

References

Bioinformatics Project from Scratch: Drug Discovery by Chanin Nantasenamat

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

insilico-0.1.2.tar.gz (11.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

insilico-0.1.2-py3-none-any.whl (19.8 kB view details)

Uploaded Python 3

File details

Details for the file insilico-0.1.2.tar.gz.

File metadata

  • Download URL: insilico-0.1.2.tar.gz
  • Upload date:
  • Size: 11.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.10

File hashes

Hashes for insilico-0.1.2.tar.gz
Algorithm Hash digest
SHA256 74a821d32f6039f328998c853b046b69873579780f75f27f7158647e9bb01b85
MD5 ae37ac3183f29e9388b7c021d657ae5f
BLAKE2b-256 482cbd293892baa4da1c84e55fed27045fcf2c524b34e6b6971d7012976d4d25

See more details on using hashes here.

File details

Details for the file insilico-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: insilico-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 19.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.6.0 importlib_metadata/4.8.2 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.8.10

File hashes

Hashes for insilico-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3cf06f8a2075aea06cb2c53fea2562054a013b5ca8d4aa555ebaa87211762e8e
MD5 70c4d06a585e6935a75036fce7b73dcd
BLAKE2b-256 069237db5fb0c5d9a6e7b65c54e5bf532bb7cb8d6aa5827cf0a308cbf18baf7f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page