Adds hydrogen atoms to molecular representations as specified by pH
Project description
dimorphite_dl
Adds hydrogen atoms to molecular representations as specified by pH
Dimorphite-DL is a fast, accurate, accessible, and modular open-source program designed for enumerating small-molecule ionization states. It specifically adds or removes hydrogen atoms from molecular representations to achieve the appropriate protonation state for a user-specified pH range.
Accurate protonation states are crucial in cheminformatics and computational drug discovery, as a molecule's ionization state significantly impacts its physicochemical properties, biological activity, and interactions with targets. Dimorphite-DL addresses this by providing a robust solution for preparing molecules for various downstream applications like docking, molecular dynamics, and virtual screening.
Installation
You can install the latest released version on PyPI using the following command.
pip install dimorphite_dl
Or you can install the latest development version from the main branch on GitHub using
pip install https://github.com/durrantlab/dimorphite_dl.git
Usage
CLI
The command-line interface (dimorphite_dl) provides straightforward access to Dimorphite-DL's functionalities.
Positional Arguments:
SMI: SMILES string or path to a file containing SMILES strings to protonate.
Options:
--ph_min MIN: Minimum pH to consider (default: 6.4).--ph_max MAX: Maximum pH to consider (default: 8.4).--precision PRE: pKa precision factor, representing the number of standard deviations from the mean pKa to consider when determining ionization states (default: 1.0).--output_file FILE: Optional path to a file to write the protonated SMILES results.--max_variants MXV: Limits the number of protonation variants generated per input compound (default: 128).--label_states: If set, output SMILES will be labeled with their target ionization state ("DEPROTONATED", "PROTONATED", or "BOTH").--log_level: Enable logging and set the level. Can benone,debug,info,warning,error, orcritical. Defaults to no logging.
Examples
Protonate molecules from a file:
dimorphite_dl sample_molecules.smi
Protonate a single SMILES string within a specific pH range:
dimorphite_dl --ph_min -3.0 --ph_max -2.0 "CCC(=O)O"
Protonate a SMILES string and save output to a file:
dimorphite_dl --ph_min -3.0 --ph_max -2.0 --output_file output.smi "CCCN"
Protonate molecules from a file with increased pKa precision and state labels:
dimorphite_dl --precision 2.0 --label_states sample_molecules.smi
Scripting
Dimorphite-DL can be easily integrated into your Python scripts.
The primary function for this is protonate_smiles from dimorphite_dl.protonate.
from dimorphite_dl import protonate_smiles
# Protonate a single SMILES string with custom pH range and precision
protonated_mol_1: list[str] = protonate_smiles(
"CCC(=O)O", ph_min=6.8, ph_max=7.9, precision=0.5
)
print(f"Protonated 'CCC(=O)O': {protonated_mol_1}")
# Protonate a list of SMILES strings
protonated_mol_list: list[str] = protonate_smiles(["CCC(=O)O", "CCCN"])
print(f"Protonated list: {protonated_mol_list}")
# Protonate molecules from a SMILES file
# Make sure '~/example.smi' exists and contains SMILES strings
# protonated_from_file: list[str] = protonate_smiles("~/example.smi")
# print(f"Protonated from file: {protonated_from_file}")
# Example with labeling states and limiting variants
protonated_labeled: list[str] = protonate_smiles(
"C1CCCCC1C(=O)O", ph_min=7.0, ph_max=7.4, label_states=True, max_variants=5
)
print(f"Protonated with labels: {protonated_labeled}")
Known issues
Dimorphite_dl is designed to handle the vast majority of ionizable functional groups accurately, but there are some edge cases where the current SMARTS patterns and pKa assignments may not behave as expected. The following are known limitations that users should be aware of when working with specific molecular substructures:
- Tertiary Amides: Tertiary amides (e.g., N-acetylpiperidine
CC(=O)N1CCCCC1) are incorrectly treated as basic amines (pKa ~8) instead of neutral species because current amide SMARTS patterns require an N-H bond. - Indoles and Pyrroles: These heterocycles are correctly deprotonated around pH 14.5 but are not protonated at very low pH (~-3.5) where they would be expected to protonate under extremely acidic conditions.
Development
We use pixi to manage Python environments and simplify the developer workflow.
Once you have pixi installed, move into dimorphite_dl directory (e.g., cd dimorphite_dl) and install the environment using the command
pixi install
Now you can activate the new virtual environment using
pixi shell
Citation
If you use Dimorphite-DL in your research, please cite:
Ropp PJ, Kaminsky JC, Yablonski S, Durrant JD (2019) Dimorphite-DL: An open-source program for enumerating the ionization states of drug-like small molecules. J Cheminform 11:14. doi: 10.1186/s13321-019-0336-9.
License
This project is released under the Apache-2.0 License as specified in LICENSE.md.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dimorphite_dl-2.0.2.tar.gz.
File metadata
- Download URL: dimorphite_dl-2.0.2.tar.gz
- Upload date:
- Size: 140.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fa042935312c9681203a3b0f3df04252eb1d17f73cd71c8e7d8ed248380fae9c
|
|
| MD5 |
2447005b1dd725cc9c2fa5ecc2e6961b
|
|
| BLAKE2b-256 |
dba696dc4c9277d5fd656740950ca4421db6109f46a3800d6151b78152a9a3e0
|
File details
Details for the file dimorphite_dl-2.0.2-py3-none-any.whl.
File metadata
- Download URL: dimorphite_dl-2.0.2-py3-none-any.whl
- Upload date:
- Size: 39.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
44125dfe2a5e1a37239f61914428d7f51d8fa61ec701a1337e71c293514133a8
|
|
| MD5 |
378bfc7eb73e0f69ba289565d4bdd57a
|
|
| BLAKE2b-256 |
d5f9d721ee039c5d070efbf67696b01aaa84b431b82c7a7e57504f2c1f46ced2
|