Skip to main content

A python script to parallelize basic MolVS functions

Project description

chemo_sanitizer

A set of script using RDKit and MolVS to proceed to the standardization , sanitization and different format conversion. Parallelized.

In this state it take only InChI as inputs and returns sanitized SMILES, InChI, InChIKeys, Short InChIkeys, MolecularFormula, ExactMass and XLogP.

Requirements

Install the conda environment by

conda env create -f environment.yml

Specific requirement

Note that some modules of MolVS are not loaded by default. You will need to edit the MolVS init.py in ~/opt/anaconda3/lib/python3.7/site-packages/molvs/ accordingly. Add these lines to the init.py

from .fragment import LargestFragmentChooser, FragmentRemover
from .charge import Uncharger

Usage

Go to the src folder Then add input and output file path as first and second argument, InChI column header as third argument and finally the number of cpus you want to use.

Example :

cd src
python chemosanitizer.py ~/translatedStructureRdkit.tsv ./test.tsv structureTranslated 6

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chemosanitizer-0.0.1.tar.gz (9.1 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page