Skip to main content

A standalone module to help generate molecular descriptors from various cheminformatics software

Project description

chemdescriptor - Molecular descriptor generator

Generic molecular descriptor generator wrapper around various software packages to simplify the process of getting descriptors

To install

Type:

pip install chemdescriptor

Requirements

  1. Pandas
  2. Working copy of ChemAxon cxcalc

Usage

Currently only supports ChemAxon cxcalc. The module can be expanded to cover other generators as well. Example input files can be found in the examples/ folder of this repo as well as the pip installed package.

Important! The code requires an environment variable CXCALC_PATH to be set, which points to the folder where cxcalc is installed!

Command Line

chemdescriptor-cx -m /path/to/SMILES/file -d /path/to/descriptor/whitelist/json -p 6.8 7.0 7.2 -o output.csv
usage: chemdescriptor-cx [-h] -m MOLECULE -d DESCRIPTORS -p PH [PH ...]
                         [-c COMMANDS] [-pc PHCOMMANDS] -o OUTPUT

optional arguments:
  -h, --help            show this help message and exit
  -m MOLECULE, --molecule MOLECULE
                        Path to input SMILES file
  -d DESCRIPTORS, --descriptors DESCRIPTORS
                        Path to descriptor white list json file
  -p PH [PH ...], --pH PH [PH ...]
                        List of pH values at which to calculate descriptors
  -c COMMANDS, --commands COMMANDS
                        Optional command stems for descriptors in json format
  -pc PHCOMMANDS, --phcommands PHCOMMANDS
                        Optional command stems for pH dependent descriptorsin
                        json format
  -o OUTPUT, --output OUTPUT
                        Path to output file

In code

Set CXCALC_PATH

import os
os.environ['CXCALC_PATH'] = '/path/to/cxcalc'

Import the generator class

from chemdescriptor import ChemAxonDescriptorGenerator

Instantiate a generator

cag = ChemAxonDescriptorGenerator('/path/to/SMILES/file',
                                  '/path/to/descriptor/whitelist/json',
                                  ph_values=[6, 7, 8],
                                  command_stems=None,
                                  ph_command_stems=None)

Generate csv output cag.generate('output.csv', dataframe=False, lec=False)

Optional keyword arguments for generate include dataframe boolean (default False) which returns a pandas dataframe in addition to writing a csv if True and lec boolean (default False) which converts the Smiles code to an intermediate "Low Energy Conformer (LEC)" representation before generating descriptors. A license is most likely required to generate LECs.

Notes:

Input SMILES file has a SMILES code in each line.

Descriptor whitelist is a json file of the form:

{
    "descriptors": [
        "refractivity",
        "maximalprojectionarea",
        "maximalprojectionradius",
        "maximalprojectionsize",
        "minimalprojectionarea",
        "minimalprojectionradius",
        "minimalprojectionsize"
    ],
    "ph_descriptors": [
        "avgpol",
        "molpol",
        "vanderwaals",
        "asa",
        "asa+",
        "asa-",
        "asa_hydrophobic",
        "asa_polar",
        "hbda_acc",
        "hbda_don",
        "polar_surface_area"
    ]
}

chemdescriptor expects 2 keys where "descriptors" are generic and "ph_descriptors" are ph dependent descriptors

2 optional dictionaries can be passed to the ChemAxonDescriptorGenerator, "command_stems" and "ph_command_stems". These dictionaries "translate" the above descriptors into commands that ChemAxon cxcalc can understand.

For example, if no value is passed to the ph_command_stems, the following dictionary is used:

_default_ph_command_stems = {
        'avgpol': 'avgpol',
        'molpol': 'molpol',
        'vanderwaals': 'vdwsa',
        'asa': ['molecularsurfacearea', '-t', 'ASA'],
        'asa+': ['molecularsurfacearea', '-t', 'ASA+'],
        'asa-': ['molecularsurfacearea', '-t', 'ASA-'],
        'asa_hydrophobic': ['molecularsurfacearea', '-t', 'ASA_H'],
        'asa_polar': ['molecularsurfacearea', '-t', 'ASA_P'],
        'hbda_acc': 'acceptorcount',
        'hbda_don': 'donorcount',
        'polar_surface_area': 'polarsurfacearea',
    }

Note that commands with multiple words are entries in a list. For example, the command

molecularsurfacearea -t ASA

is represented in the dictionary as ['molecularsurfacearea', '-t', 'ASA']

To Do

[ ] Test on different machines

[ ] Get feedback on what needs to be changed/improved

[ ] Expand to cover other descriptor generators

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chemdescriptor-0.0.6.tar.gz (7.7 kB view details)

Uploaded Source

Built Distribution

chemdescriptor-0.0.6-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file chemdescriptor-0.0.6.tar.gz.

File metadata

  • Download URL: chemdescriptor-0.0.6.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for chemdescriptor-0.0.6.tar.gz
Algorithm Hash digest
SHA256 432b25e3b496a1d70cbf65a33ff48147f68dc8076907caea334c79667b910a37
MD5 82a90260218ac0d051706dc72ab24261
BLAKE2b-256 a0d57600afc908fe6df14d3b19e9dbd0ea343d75994a5bef2991b6ad42917c93

See more details on using hashes here.

File details

Details for the file chemdescriptor-0.0.6-py3-none-any.whl.

File metadata

  • Download URL: chemdescriptor-0.0.6-py3-none-any.whl
  • Upload date:
  • Size: 8.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/41.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for chemdescriptor-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 a765f5e5c90ad809c8cac7c78db873fb1a28c0bc0dcc5c23f84a9ecbbafb4e7a
MD5 84a289bbf88533d9efdc90f155087ac4
BLAKE2b-256 ee96f59a7255a0d1fcd84c02ce0c375b29e399804cc002bba48c69619757f35b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page