Skip to main content

Adds hydrogen atoms to molecular representations as specified by pH

Project description

dimorphite_dl

Adds hydrogen atoms to molecular representations as specified by pH

Build Status PyPI - Python Version codecov GitHub release (latest by date) PyPI - Downloads License GitHub repo size DOI Archived | https://doi.org/10.5281/zenodo.15486131

Dimorphite-DL is a fast, accurate, accessible, and modular open-source program designed for enumerating small-molecule ionization states. It specifically adds or removes hydrogen atoms from molecular representations to achieve the appropriate protonation state for a user-specified pH range.

Accurate protonation states are crucial in cheminformatics and computational drug discovery, as a molecule's ionization state significantly impacts its physicochemical properties, biological activity, and interactions with targets. Dimorphite-DL addresses this by providing a robust solution for preparing molecules for various downstream applications like docking, molecular dynamics, and virtual screening.

Installation

You can install the latest released version on PyPI using the following command.

pip install dimorphite_dl

Or you can install the latest development version from the main branch on GitHub using

pip install https://github.com/durrantlab/dimorphite_dl.git

Usage

CLI

The command-line interface (dimorphite_dl) provides straightforward access to Dimorphite-DL's functionalities.

Positional Arguments:

  • SMI: SMILES string or path to a file containing SMILES strings to protonate.

Options:

  • --ph_min MIN: Minimum pH to consider (default: 6.4).
  • --ph_max MAX: Maximum pH to consider (default: 8.4).
  • --precision PRE: pKa precision factor, representing the number of standard deviations from the mean pKa to consider when determining ionization states (default: 1.0).
  • --output_file FILE: Optional path to a file to write the protonated SMILES results.
  • --max_variants MXV: Limits the number of protonation variants generated per input compound (default: 128).
  • --label_states: If set, output SMILES will be labeled with their target ionization state ("DEPROTONATED", "PROTONATED", or "BOTH").
  • --log_level: Enable logging and set the level. Can be none, debug, info, warning, error, or critical. Defaults to no logging.

Examples

Protonate molecules from a file:

dimorphite_dl sample_molecules.smi

Protonate a single SMILES string within a specific pH range:

dimorphite_dl --ph_min -3.0 --ph_max -2.0 "CCC(=O)O"

Protonate a SMILES string and save output to a file:

dimorphite_dl --ph_min -3.0 --ph_max -2.0 --output_file output.smi "CCCN"

Protonate molecules from a file with increased pKa precision and state labels:

dimorphite_dl --precision 2.0 --label_states sample_molecules.smi

Scripting

Dimorphite-DL can be easily integrated into your Python scripts. The primary function for this is protonate_smiles from dimorphite_dl.protonate.

from dimorphite_dl import protonate_smiles

# Protonate a single SMILES string with custom pH range and precision
protonated_mol_1: list[str] = protonate_smiles(
    "CCC(=O)O", ph_min=6.8, ph_max=7.9, precision=0.5
)
print(f"Protonated 'CCC(=O)O': {protonated_mol_1}")

# Protonate a list of SMILES strings
protonated_mol_list: list[str] = protonate_smiles(["CCC(=O)O", "CCCN"])
print(f"Protonated list: {protonated_mol_list}")

# Protonate molecules from a SMILES file
# Make sure '~/example.smi' exists and contains SMILES strings
# protonated_from_file: list[str] = protonate_smiles("~/example.smi")
# print(f"Protonated from file: {protonated_from_file}")

# Example with labeling states and limiting variants
protonated_labeled: list[str] = protonate_smiles(
    "C1CCCCC1C(=O)O", ph_min=7.0, ph_max=7.4, label_states=True, max_variants=5
)
print(f"Protonated with labels: {protonated_labeled}")

Known issues

Dimorphite_dl is designed to handle the vast majority of ionizable functional groups accurately, but there are some edge cases where the current SMARTS patterns and pKa assignments may not behave as expected. The following are known limitations that users should be aware of when working with specific molecular substructures:

  • Tertiary Amides: Tertiary amides (e.g., N-acetylpiperidine CC(=O)N1CCCCC1) are incorrectly treated as basic amines (pKa ~8) instead of neutral species because current amide SMARTS patterns require an N-H bond.
  • Indoles and Pyrroles: These heterocycles are correctly deprotonated around pH 14.5 but are not protonated at very low pH (~-3.5) where they would be expected to protonate under extremely acidic conditions.

Development

We use pixi to manage Python environments and simplify the developer workflow. Once you have pixi installed, move into dimorphite_dl directory (e.g., cd dimorphite_dl) and install the environment using the command

pixi install

Now you can activate the new virtual environment using

pixi shell

Citation

If you use Dimorphite-DL in your research, please cite:

Ropp PJ, Kaminsky JC, Yablonski S, Durrant JD (2019) Dimorphite-DL: An open-source program for enumerating the ionization states of drug-like small molecules. J Cheminform 11:14. doi: 10.1186/s13321-019-0336-9.

License

This project is released under the Apache-2.0 License as specified in LICENSE.md.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dimorphite_dl-2.0.2.tar.gz (140.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dimorphite_dl-2.0.2-py3-none-any.whl (39.9 kB view details)

Uploaded Python 3

File details

Details for the file dimorphite_dl-2.0.2.tar.gz.

File metadata

  • Download URL: dimorphite_dl-2.0.2.tar.gz
  • Upload date:
  • Size: 140.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.0

File hashes

Hashes for dimorphite_dl-2.0.2.tar.gz
Algorithm Hash digest
SHA256 fa042935312c9681203a3b0f3df04252eb1d17f73cd71c8e7d8ed248380fae9c
MD5 2447005b1dd725cc9c2fa5ecc2e6961b
BLAKE2b-256 dba696dc4c9277d5fd656740950ca4421db6109f46a3800d6151b78152a9a3e0

See more details on using hashes here.

File details

Details for the file dimorphite_dl-2.0.2-py3-none-any.whl.

File metadata

  • Download URL: dimorphite_dl-2.0.2-py3-none-any.whl
  • Upload date:
  • Size: 39.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.0

File hashes

Hashes for dimorphite_dl-2.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 44125dfe2a5e1a37239f61914428d7f51d8fa61ec701a1337e71c293514133a8
MD5 378bfc7eb73e0f69ba289565d4bdd57a
BLAKE2b-256 d5f9d721ee039c5d070efbf67696b01aaa84b431b82c7a7e57504f2c1f46ced2

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page