A package to characterize small ligands
Project description
DrugTax
Categorize small ligands according to chemical properties. Derive simple and explainable features. Only requires SMILEs as inputs. For a more detailed description of the reasoning behind DrugTax address the scientific paper DrugTax: package for drug taxonomy identification and explainable feature extraction, Preto, AJ et al., 2022.
Installation
To install DrugTax, first make sure you have python 3.9.x installed. Then, run:
pip install drugtax
Usage
Firstly, import DrugTax
import drugtax
The basic usage of DrugTax stems from the DrugTax class, which takes as input a single SMILE.
molecule = drugtax.DrugTax("OC(=O)C1=C(C(O)=O)C(C(O)=O)=C(C(O)=O)C(C(O)=O)=C1C(O)=O")
The molecule object now has a series of useful properties such as:
molecule.smile: allows the user to check the SMILE at any timemolecule.superclasses: displays all the superclasses to which the input SMILE belongsmolecule.features: retrieves simple and explainable features that can be used on prediction tasks or dataset characterizationmolecule.kingdom: informs on whether the molecule is organic or inorganic
Bulk analysis
For superclass computation, instead of directly invoking the DrugTax class, it is possible to use retrieve_taxonomic_class on several different inputs. The example below shows an example using only a SMILEs list, however, it is also possible to feed a drugs list - in which case DrugTax leverages pubchempy to retrieve the isomeric SMILEs - or a file. This function outputs a table with the SMILEs and their respective taxonomy, as well as a summary table detailing how many of which superclass combinations are present on the dataset.
smiles_table, summary_table = superclasses.retrieve_taxonomic_class(["CCNO","CCC"], input_mode = "smiles_list", output_name = "testing", write_values = True)
The retrieve_taxonomic_class function has different arguments that can be used to pick input and output information for bulk analysis submission. These are:
input_data: this is the only mandatory argument, corresponding to either a smiles list, a drugs names list or a file.input_mode: depending on theinput_datathis argument needs to be changed. The default mode isfile, which requires an input.csvfile. This argument needs to be coupled withtarget_column, specifying the name of the column containing the input SMILEs. To input a list of SMILE,input_modeneeds to be changed tosmiles_list. To input a list of drug names from which isomeric SMILEs are to be retrieved with the aid ofpubchempy, the user needs to changeinput_modetodrugs_list.target_column: this argument needs to be specified when using thefileinput mode.output_name: if the user wishes to save the file, he should specify this argument.write_values: with defaultFalse, the user needs to change this argument toTrueif he wishes to save the output tables.input_sep: if the input mode isfile, the user can change this argument depending on the column separator on the input file.
Plotting
In order to visualize the data retrieved from bulk analysis, DrugTax leverages UpSetPlot, a package designed to allow the visualization of a large number of intersecting sets. This computation requires a file generated in the above Bulk Analysis sections. When writing the files, the summary table will be the one with the termination *_assess,csv, the beginning of the name depends on the users chosen output_name. This file is the one that can be fed to the plot_categories function.
plotting.plot_categories("testing_assess.csv", output_name = "plot")
The plot_categories function has three arguments:
input_file: the name of the*_assess.csvpreviously retrieved.output_name: a name for the output*.pngfile.threshold: with default 1, this function triggers an aggregation of low populated superclass combinations to their above counterpart.thresholdis the minimum number of entries in the file for it to be aggregated.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file drugtax-0.0.13.tar.gz.
File metadata
- Download URL: drugtax-0.0.13.tar.gz
- Upload date:
- Size: 37.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2d3691ace595fcd075d0e69536ab72168aef788410d6879c99c77846ac2df066
|
|
| MD5 |
8fd1e9e7be78f05d0e21138583969f36
|
|
| BLAKE2b-256 |
be42cd3a69c93e5f0a17efaec7f5d2395e44871f5e283d48a2289a830a8c92d7
|
File details
Details for the file drugtax-0.0.13-py3-none-any.whl.
File metadata
- Download URL: drugtax-0.0.13-py3-none-any.whl
- Upload date:
- Size: 39.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ab894ea952a2b77e5c8e76d30fe52193ec422eb1ccd3cde59285aa0402b03b12
|
|
| MD5 |
a663bf108bed46a590bb13a9421ca8e9
|
|
| BLAKE2b-256 |
b767cd5f4cfd08e5cfa02020a867f2da38970f0c13b1ef6765007e4d5bc3e098
|