A standalone module to help generate molecular descriptors from various cheminformatics software

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

chemdescriptor - Molecular descriptor generator

Generic molecular descriptor generator wrapper around various software packages to simplify the process of getting descriptors

To install

Type: git clone https://github.com/darkreactions/chemdescriptor

cd chemdescriptor

git checkout cxcalc_rewrite

pip install .

Requirements

Pandas
ChemAxon descriptors
- Working copy of ChemAxon cxcalc
RDKit descriptors
- RDKit installed

Usage

Currently only supports ChemAxon cxcalc and RDKit. The module can be expanded to cover other generators as well. Example input files can be found in the examples/ folder of this repo as well as the pip installed package.

CXCalc

Important! The code requires an environment variable CXCALC_PATH to be set, which points to the folder where cxcalc is installed!

Command Line

chemdescriptor-cx -m /path/to/SMILES/file -d /path/to/descriptor/whitelist/json -p 6.8 7.0 7.2 -o output.csv

usage: chemdescriptor-cx [-h] -m MOLECULE -d DESCRIPTORS -p PH [PH ...]
                         [-c COMMANDS] -o OUTPUT

optional arguments:
  -h, --help            show this help message and exit
  -m MOLECULE, --molecule MOLECULE
                        Path to input SMILES file
  -d DESCRIPTORS, --descriptors DESCRIPTORS
                        Path to descriptor white list json file
  -p PH [PH ...], --pH PH [PH ...]
                        List of pH values at which to calculate descriptors
  -c COMMANDS, --commands COMMANDS
                        Optional command stems for descriptors in json format
  -o OUTPUT, --output OUTPUT
                        Path to output file

In code

The package will initially search cxcalc executable in the PATH variable if not will fall back to CXCALC_PATH

Set CXCALC_PATH

import os
os.environ['CXCALC_PATH'] = '/path/to/cxcalc'

Import the generator class

from chemdescriptor.generator.chemaxon import ChemAxonDescriptorGenerator as CAG

Import SMILES and whitelist

with open('/path/to/SMILES/file', 'r') as f:
    smiles_list = f.read().splitlines()

with open('/path/to/descriptor/whitelist/json', 'r') as f:
    whitelist = json.load(f)

Instantiate a generator. smiles_list is a list of smiles and whitelist is a dictionary of keys in the command_dict logfile is the path to a log which contains information such as the final cxcalc command, columns that were renamed and other errors for debugging

Chemaxon standardize command can be used to remove small fragments in the smiles_list Set standardize=True and also set STANDARDIZE_PATH

os.environ['STANDARDIZE_PATH'] = '/path/to/standardize'

cag = CAG(smiles_list,
          whitelist,
          ph_values=[6, 7, 8],
          command_dict={},
          logfile='/path/to/logfile',
          standardize=True)

Generate csv output cag.generate('output.csv', dataframe=False, lec=False)

Optional keyword arguments for generate include dataframe boolean (default False) which returns a pandas dataframe in addition to writing a csv if True and lec boolean (default False) which converts the Smiles code to an intermediate "Low Energy Conformer (LEC)" representation before generating descriptors. A license is most likely required to generate LECs.

Notes:

Descriptor whitelist is a python dictionary of the form:

{
    "descriptors": [
        "refractivity",
        "maximalprojectionarea",
        "maximalprojectionradius",
        "maximalprojectionsize",
        "minimalprojectionarea",
        "minimalprojectionradius",
        "minimalprojectionsize"
    ],
    "ph_descriptors": [
        "avgpol",
        "molpol",
        "vanderwaals",
        "asa",
        "asa+",
        "asa-",
        "asa_hydrophobic",
        "asa_polar",
        "hbda_acc",
        "hbda_don",
        "polar_surface_area"
    ]
}

chemdescriptor expects 2 keys in the whitelist where "descriptors" are generic and "ph_descriptors" are ph dependent descriptors

An optional dictionary can be passed to the ChemAxonDescriptorGenerator, "command_dict" which "translates" the above descriptor names into commands that ChemAxon cxcalc can understand.

It also consists of column names that will be added to the final output.

Note: If the command_dict is not given or is empty, a default command dict is used whose definition is here

An example of a command_dict is:

command_dict = {
    "descriptors": {
        "atomcount_c": {
            "command": [
                "atomcount",
                "-z",
                "6"
            ],
            "column_names": [
                "_feat_AtomCount_C"
            ]
        },
        "wateraccessiblesurfacearea": {
            "command": [
                "wateraccessiblesurfacearea"
            ],
            "column_names": [
                "_feat_ASA",
                "_feat_ASA+",
                "_feat_ASA-",
                "_feat_ASA_H",
                "_feat_ASA_P"
            ]
        }
    "ph_descriptors": {
        "acceptorcount": {
            "command": [
                "acceptorcount"
            ],
            "column_names": [
                "_feat_Hacceptorcount"
            ]
        },
        "donorcount": {
            "command": [
                "donorcount"
            ],
            "column_names": [
                "_feat_Hdonorcount"
            ]
        }
    }

command_dict consists of 2 dictionaries with keys descriptors and ph_descriptors. Within each dictionary are descriptor names referred in the whitelist.

Under each descriptor, two lists are required command and column_names

Command refers to the command line options for cxcalc as documented here Note: that commands with multiple words are entries in a list. For example, the command atomcount -z 6 is represented in the dictionary as ['atomcount', '-z', '6']

column_names is a list of names the user wants to rename the cxcalc generated csv column names.

Certain commands generate multiple columns for example, wateraccessiblesurfacearea generates 5 columns. Therefore, the column_names list becomes

"column_names": [
                "_feat_ASA",
                "_feat_ASA+",
                "_feat_ASA-",
                "_feat_ASA_H",
                "_feat_ASA_P"
            ]

Note : If the number of columns generated by cxcalc do not match the expected count, none of the column names are renamed.

RDKit

Much easier to use. Only needs a list of descriptors similar to cxcalc.

To Do

[ ] Test on different machines

[ ] Get feedback on what needs to be changed/improved

[ ] Expand to cover other descriptor generators

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.2.4

Apr 21, 2020

0.2.3

Apr 8, 2020

0.2.2

Apr 8, 2020

0.2.1

Apr 7, 2020

0.2.0

Mar 31, 2020

0.1.0

Feb 19, 2020

0.0.7

Aug 20, 2019

0.0.6

Aug 9, 2019

0.0.5

Aug 7, 2019

0.0.4

Aug 7, 2019

0.0.3

Jun 4, 2019

0.0.2

May 30, 2019

0.0.1

May 8, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chemdescriptor-0.2.4.tar.gz (17.0 kB view details)

Uploaded Apr 21, 2020 Source

Built Distribution

chemdescriptor-0.2.4-py3-none-any.whl (16.3 kB view details)

Uploaded Apr 21, 2020 Python 3

File details

Details for the file chemdescriptor-0.2.4.tar.gz.

File metadata

Download URL: chemdescriptor-0.2.4.tar.gz
Upload date: Apr 21, 2020
Size: 17.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0.post20200309 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.6

File hashes

Hashes for chemdescriptor-0.2.4.tar.gz
Algorithm	Hash digest
SHA256	`fdd34eae883cf165ff03af36e049d13a8f98e05f299808aa3195f20239a5a07b`
MD5	`f81295ac8d778c6460527ec253489f9b`
BLAKE2b-256	`b9cbeddc3229440f7c4b754fb776aae4422f1e2621db8d57a6a7f889f489e667`

See more details on using hashes here.

File details

Details for the file chemdescriptor-0.2.4-py3-none-any.whl.

File metadata

Download URL: chemdescriptor-0.2.4-py3-none-any.whl
Upload date: Apr 21, 2020
Size: 16.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.0.0.post20200309 requests-toolbelt/0.9.1 tqdm/4.32.2 CPython/3.7.6

File hashes

Hashes for chemdescriptor-0.2.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`594235f68cde494851094cf0880db37d7c1285c5f0b938862ab2156648919872`
MD5	`42d4639bce4485aee5e29cac64d3dbb0`
BLAKE2b-256	`4b28afc9b9131e6de41007213b58a8d8c93fb64b41b408184cfa5733bfcbd558`

See more details on using hashes here.

chemdescriptor 0.2.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

chemdescriptor - Molecular descriptor generator

To install

Requirements

Usage

CXCalc

Command Line

In code

Notes:

RDKit

To Do

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes