Skip to main content

Tools and helpers for RDKit.

Project description

RDKIT-TOOLS

Tools for use with RDKit.

See also:

RDKit:

Dependencies

  • RDKit Python package (via conda recommended).
$ conda create -n rdkit -c conda-forge rdkit ipykernel
$ conda activate rdkit
(rdkit) $ conda install -c conda-forge pyvis 
(rdkit) $ conda install -c conda-forge networkx=2.5 

See also: conda/environment.yml

Contents

  • Formats - chemical file format conversion
  • Depictions - 2D molecular depictions
  • Standardization - molecular standardization
  • Fingerprints - molecular path and pattern based binary feature vectors, similarity, and clustering tools
  • Conformations - distance geometry based 3D conformation generation
  • Properties - molecular property calculation: Lipinsky, Wildman-Crippen LogP, Kier-Hall electrotopological descriptors, solvent accessible surface area (SASA), and more.
  • Scaffolds - Bemis-Murcko and BRICS scaffold analysis, rdScaffoldNetworks.
  • SMARTS - molecular pattern matching (subgraph isomorphism)
  • Reactions - SMIRKS based reaction transforms

Formats

(rdkit) $ python3 -m rdktools.formats.App -h
usage: App.py [-h] [--i IFILE] [--o OFILE] [--kekulize] [--sanitize] [--header]
              [--delim DELIM] [--smilesColumn SMILESCOLUMN] [--nameColumn NAMECOLUMN]
              [-v]
              {mdl2smi,mdl2tsv,smi2mdl,smiclean,mdlclean,mol2inchi,mol2inchikey,demo}

RDKit chemical format utility

positional arguments:
  {mdl2smi,mdl2tsv,smi2mdl,smiclean,mdlclean,mol2inchi,mol2inchikey,demo}
                        operation

optional arguments:
  -h, --help            show this help message and exit
  --i IFILE             input file (SMILES/TSV or SDF)
  --o OFILE             output file (specify '-' for stdout)
  --kekulize            Kekulize
  --sanitize            Sanitize
  --header              input SMILES/TSV file has header line
  --delim DELIM         delimiter for SMILES/TSV
  --smilesColumn SMILESCOLUMN
                        input SMILES column
  --nameColumn NAMECOLUMN
                        input name column
  -v, --verbose

Depictions

(rdkit) $ python3 -m rdktools.depict.App -h
usage: App.py [-h] [--i IFILE] [--ifmt {AUTO,SMI,MDL}] [--ofmt {PNG,JPEG,PDF}]
              [--smilesColumn SMILESCOLUMN] [--nameColumn NAMECOLUMN] [--header]
              [--delim DELIM] [--height HEIGHT] [--width WIDTH] [--kekulize]
              [--wedgebonds] [--pdf_title PDF_TITLE] [--batch_dir BATCH_DIR]
              [--batch_prefix BATCH_PREFIX] [--o OFILE] [-v]
              {single,batch,pdf,demo,demo2}

RDKit molecule depiction utility

positional arguments:
  {single,batch,pdf,demo,demo2}
                        OPERATION

optional arguments:
  -h, --help            show this help message and exit
  --i IFILE             input molecule file
  --ifmt {AUTO,SMI,MDL}
                        input file format
  --ofmt {PNG,JPEG,PDF}
                        output file format
  --smilesColumn SMILESCOLUMN
  --nameColumn NAMECOLUMN
  --header              SMILES/TSV file has header
  --delim DELIM         SMILES/TSV field delimiter
  --height HEIGHT       height of image
  --width WIDTH         width of image
  --kekulize            display Kekule form
  --wedgebonds          stereo wedge bonds
  --pdf_title PDF_TITLE
                        PDF doc title
  --batch_dir BATCH_DIR
                        destination for batch files
  --batch_prefix BATCH_PREFIX
                        prefix for batch files
  --o OFILE             output file
  -v, --verbose

Modes: single = one image; batch = multiple images; pdf = multi-page

Scaffolds

(rdkit) $ python3 -m rdktools.scaffold.App -h
usage: App.py [-h] [--i IFILE] [--o OFILE] [--o_html OFILE_HTML]
              [--scratchdir SCRATCHDIR] [--smicol SMICOL] [--namcol NAMCOL]
              [--idelim IDELIM] [--odelim ODELIM] [--iheader] [--oheader]
              [--brics] [-v]
              {bmscaf,scafnet,demobm,demonet,demonetvis}

RDKit scaffold analysis

positional arguments:
  {bmscaf,scafnet,demobm,demonet,demonetvis}
                        OPERATION

optional arguments:
  -h, --help            show this help message and exit
  --i IFILE             input file, TSV or SDF
  --o OFILE             output file, TSV|SDF
  --o_html OFILE_HTML   output file, HTML
  --scratchdir SCRATCHDIR
  --smicol SMICOL       SMILES column from TSV (counting from 0)
  --namcol NAMCOL       name column from TSV (counting from 0)
  --idelim IDELIM       delim for input TSV
  --odelim ODELIM       delim for output TSV
  --iheader             input TSV has header
  --oheader             output TSV has header
  --brics               BRICS fragmentation rules (Degen, 2008)
  -v, --verbose

Standardization

(rdkit) $ python3 -m rdktools.standard.App
usage: App.py [-h] [--i IFILE] [--o OFILE] [--norms {default,unm}]
              [--i_norms IFILE_NORMS] [--remove_isomerism] [-v]
              {standardize,list_norms,show_params,demo}
App.py: error: the following arguments are required: op
(rdkit) $ python3 -m rdktools.standard.App -h
usage: App.py [-h] [--i IFILE] [--o OFILE] [--norms {default,unm}]
              [--i_norms IFILE_NORMS] [--remove_isomerism] [-v]
              {standardize,list_norms,show_params,demo}

RDKit chemical standardizer

positional arguments:
  {standardize,list_norms,show_params,demo}
                        operation

optional arguments:
  -h, --help            show this help message and exit
  --i IFILE             input file, SMI or SDF
  --o OFILE             output file, SMI or SDF
  --norms {default,unm}
                        normalizations
  --i_norms IFILE_NORMS
                        input normalizations file, format: SMIRKS<space>NAME
  --remove_isomerism    if true, output SMILES isomerism removed
  -v, --verbose

Conformations

(rdkit) $ python3 -m rdktools.conform.App -h
usage: App.py [-h] [--i IFILE] [--o OFILE] [--ff {UFF,MMFF}] [--optiters OPTITERS]
              [--nconf NCONF] [--etol ETOL] [--title_in_header] [-v]

RDKit Conformer Generation

optional arguments:
  -h, --help           show this help message and exit
  --i IFILE            input file, SMI or SDF
  --o OFILE            output SDF with 3D
  --ff {UFF,MMFF}      force-field
  --optiters OPTITERS  optimizer iterations per conf
  --nconf NCONF        # confs per mol
  --etol ETOL          energy tolerance
  --title_in_header    title line in header
  -v, --verbose

Based on distance geometry method by Blaney et al.

Fingerprints

(rdkit) $ python3 -m rdktools.fp.App MolSimilarity -h
usage: App.py [-h] [--i IFILE] [--o OFILE] [--useHs] [--useValence]
              [--dbName DBNAME] [--tableName TABLENAME] [--minSize MINSIZE]
              [--maxSize MAXSIZE] [--density DENSITY] [--outTable OUTTABLE]
              [--outDbName OUTDBNAME] [--fpColName FPCOLNAME]
              [--minPath MINPATH] [--maxPath MAXPATH]
              [--nBitsPerHash NBITSPERHASH] [--discrim]
              [--smilesColumn SMILESCOLUMN] [--molPkl MOLPKL]
              [--input_format {SMILES,SD}] [--idColumn IDCOLUMN]
              [--maxMols MAXMOLS] [--fpAlgo {RDKIT,MACCS,MORGAN}]
              [--morgan_nbits MORGAN_NBITS] [--morgan_radius MORGAN_RADIUS]
              [--keepTable] [--smilesTable SMILESTABLE] [--topN TOPN]
              [--thresh THRESH] [--querySmiles QUERYSMILES]
              [--metric {ALLBIT,ASYMMETRIC,DICE,COSINE,KULCZYNSKI,MCCONNAUGHEY,ONBIT,RUSSEL,SOKAL,TANIMOTO,TVERSKY}]
              [--tversky_alpha TVERSKY_ALPHA] [--tversky_beta TVERSKY_BETA]
              [--clusterAlgo {WARD,SLINK,CLINK,UPGMA,BUTINA}]
              [--actTable ACTTABLE] [--actName ACTNAME]
              [--reportFreq REPORTFREQ] [-v]
              {FingerprintMols,MolSimilarity,ClusterMols}

RDKit fingerprint-based analytics

positional arguments:
  {FingerprintMols,MolSimilarity,ClusterMols}
                        OPERATION

optional arguments:
  -h, --help            show this help message and exit
  --i IFILE             Input file; if provided and no tableName is specified,
                        data will be read from the input file. Text files
                        delimited with either commas (extension .csv) or tabs
                        (extension .txt) are supported.
  --o OFILE             Name of the output file (output will be a pickle file
                        with one label,fingerprint entry for each molecule).
  --useHs               Include Hs in the fingerprint Default is *false*.
  --useValence          Include valence information in the fingerprints
                        Default is *false*.
  --dbName DBNAME       Name of the database from which to pull input molecule
                        information. If output is going to a database, this
                        will also be used for that unless the --outDbName
                        option is used.
  --tableName TABLENAME
                        Name of the database table from which to pull input
                        molecule information
  --minSize MINSIZE     Minimum size of the fingerprints to be generated
                        (limits the amount of folding that happens).
  --maxSize MAXSIZE     Base size of the fingerprints to be generated.
  --density DENSITY     Target bit density in the fingerprint. The fingerprint
                        will be folded until this density is reached.
  --outTable OUTTABLE   name of the output db table used to store
                        fingerprints. If this table already exists, it will be
                        replaced.
  --outDbName OUTDBNAME
                        name of output database, if it's being used. Defaults
                        to be the same as the input db.
  --fpColName FPCOLNAME
                        name to use for the column which stores fingerprints
                        (in pickled format) in the output db table.
  --minPath MINPATH     Minimum path length to be included in fragment-based
                        fingerprints.
  --maxPath MAXPATH     Maximum path length to be included in fragment-based
                        fingerprints.
  --nBitsPerHash NBITSPERHASH
                        Number of bits to be set in the output fingerprint for
                        each fragment.
  --discrim             Use of path-based discriminators to hash bits.
  --smilesColumn SMILESCOLUMN
                        Name of the SMILES column in the input database.
  --molPkl MOLPKL
  --input_format {SMILES,SD}
                        SMILES table or SDF file.
  --idColumn IDCOLUMN   Name of the id column in the input database. Defaults
                        to the first column for dbs.
  --maxMols MAXMOLS     Maximum number of molecules to be fingerprinted.
  --fpAlgo {RDKIT,MACCS,MORGAN}
                        RDKIT = Daylight path-based; MACCS = MDL MACCS 166
                        keys
  --morgan_nbits MORGAN_NBITS
  --morgan_radius MORGAN_RADIUS
  --keepTable
  --smilesTable SMILESTABLE
  --topN TOPN           Top N similar; precedence over threshold.
  --thresh THRESH       Similarity threshold.
  --querySmiles QUERYSMILES
                        Query smiles for similarity screening.
  --metric {ALLBIT,ASYMMETRIC,DICE,COSINE,KULCZYNSKI,MCCONNAUGHEY,ONBIT,RUSSEL,SOKAL,TANIMOTO,TVERSKY}
                        Similarity algorithm
  --tversky_alpha TVERSKY_ALPHA
                        Tversky alpha parameter, weights query molecule
                        features
  --tversky_beta TVERSKY_BETA
                        Tversky beta parameter, weights target molecule
                        features
  --clusterAlgo {WARD,SLINK,CLINK,UPGMA,BUTINA}
                        Clustering algorithm: WARD = Ward's minimum variance;
                        SLINK = single-linkage clustering algorithm; CLINK =
                        complete-linkage clustering algorithm; UPGMA = group-
                        average clustering algorithm; BUTINA = Butina JCICS 39
                        747-750 (1999)
  --actTable ACTTABLE   name of table containing activity values (used to
                        color points in the cluster tree).
  --actName ACTNAME     name of column with activities in the activity table.
                        The values in this column should either be integers or
                        convertible into integers.
  --reportFreq REPORTFREQ
  -v, --verbose

This app employs custom, updated versions of RDKit FingerprintMols.py,
MolSimilarity.py, ClusterMols.py, with enhanced command-line functionality for
molecular fingerprint-based analytics.

Examples:

(rdkit) $ python3 -m rdktools.fp.App FingerprintMols --i drugcentral.smi --smilesColumn "smiles" --idColumn "name" --fpAlgo MORGAN --morgan_nbits 2048
(rdkit) $ python3 -m rdktools.fp.App MolSimilarity --i drugcentral.smi --smilesColumn "smiles" --idColumn "name" --querySmiles "NCCc1ccc(O)c(O)c1 dopamine" --fpAlgo MORGAN --morgan_nbits 512 --metric TVERSKY --tversky_alpha 0.8 --tversky_beta 0.2
(rdkit) $ python3 -m rdktools.fp.App ClusterMols --i drugcentral.smi --smilesColumn "smiles" --idColumn "name" --fpAlgo MORGAN --morgan_nbits 512 --clusterAlgo BUTINA --metric TANIMOTO

SMARTS

(rdkit) $ python3 -m rdktools.smarts.App -h
usage: App.py [-h] [--i IFILE] [--o OFILE] [--smarts SMARTS] [--usa] [--delim DELIM]
              [--smilesColumn SMILESCOLUMN] [--nameColumn NAMECOLUMN] [--header] [-v]
              {matchCounts,matchFilter,demo}

RDKit SMARTS utility

positional arguments:
  {matchCounts,matchFilter,demo}
                        OPERATION

optional arguments:
  -h, --help            show this help message and exit
  --i IFILE             input file, SMI or SDF
  --o OFILE             output file, TSV
  --smarts SMARTS       query SMARTS
  --usa                 unique set-of-atoms match counts
  --delim DELIM         delimiter for SMILES/TSV
  --smilesColumn SMILESCOLUMN
  --nameColumn NAMECOLUMN
  --header              SMILES/TSV has header line
  -v, --verbose

Properties

(rdkit) $ python3 -m rdktools.properties.App -h
usage: App.py [-h] --i IFILE [--o OFILE] [--iheader] [--oheader] [--kekulize]
              [--sanitize] [--delim DELIM] [--smilesColumn SMILESCOLUMN]
              [--nameColumn NAMECOLUMN] [-v]
              {descriptors,descriptors3d,lipinski,logp,estate,freesasa,demo}

RDKit molecular properties utility

positional arguments:
  {descriptors,descriptors3d,lipinski,logp,estate,freesasa,demo}
                        OPERATION

optional arguments:
  -h, --help            show this help message and exit
  --i IFILE             input molecule file
  --o OFILE             output file with data (TSV)
  --iheader             input file has header line
  --oheader             include TSV header line with smiles output
  --kekulize            Kekulize
  --sanitize            Sanitize
  --delim DELIM         SMILES/TSV delimiter
  --smilesColumn SMILESCOLUMN
                        input SMILES column
  --nameColumn NAMECOLUMN
                        input name column
  -v, --verbose

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rdktools-0.9.0.tar.gz (1.2 MB view hashes)

Uploaded Source

Built Distribution

rdktools-0.9.0-py3-none-any.whl (89.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page