Conformer generation for drug-like molecules
Project description
openconf
a conformer generator for drug-like molecules: uses torsional Monte Carlo moves to quickly generate diverse ensembles, uses RDKit/MMFF94s throughout, and runs fast enough for large-scale workflows.
Installation
pip install openconf
Quick Start
Python API
from rdkit import Chem
from openconf import generate_conformers, ConformerConfig
# From SMILES
mol = Chem.MolFromSmiles("CCCCc1ccccc1")
ensemble = generate_conformers(mol)
print(f"Generated {ensemble.n_conformers} conformers")
print(ensemble.summary())
# Save to SDF
ensemble.to_sdf("output.sdf")
# Or XYZ
ensemble.to_xyz("output.xyz")
Named Presets
Five use-case presets are available out of the box:
from openconf import generate_conformers
ensemble = generate_conformers(mol, preset="rapid") # fast virtual screening
ensemble = generate_conformers(mol, preset="ensemble") # property prediction
ensemble = generate_conformers(mol, preset="spectroscopic") # NMR / IR / VCD
ensemble = generate_conformers(mol, preset="docking") # docking pose recovery
For FEP-style analogue generation from a fixed pose, see generate_conformers_from_pose below.
Custom Configuration
For full control, pass a ConformerConfig directly. You can also use
preset_config() as a starting point and override individual fields:
from openconf import generate_conformers, preset_config, ConformerConfig
# Start from a preset and tweak
config = preset_config("docking")
config.max_out = 500
config.random_seed = 42
ensemble = generate_conformers(mol, config=config)
# Or build from scratch
config = ConformerConfig(
max_out=200, # Maximum conformers to return
pool_max=2000, # Internal pool size
n_steps=500, # Exploration steps
energy_window_kcal=12.0, # Energy window for filtering
random_seed=42, # For reproducibility
)
ensemble = generate_conformers(mol, config=config)
Command Line
# From SMILES file
openconf molecules.smi --max-out 200 --out conformers.sdf
# From SDF file
openconf input.sdf --method hybrid --ew 15 --out output.sdf
# With verbose output
openconf "CCCCc1ccccc1" --max-out 100 -o butylbenzene.sdf -v
Use-Case Examples
The right configuration depends on the downstream task. Four named presets cover the most common workflows:
from openconf import generate_conformers, preset_config
# One-liner with a preset
ensemble = generate_conformers(mol, preset="docking")
# Or get the config object to inspect / tweak before use
config = preset_config("spectroscopic")
config.max_out = 200 # override a single field
ensemble = generate_conformers(mol, config=config)
Available presets: "rapid", "ensemble", "spectroscopic", "docking", "analogue".
Below are representative wall-clock timings measured on a single CPU core (Apple M2 Pro), mean over 3 runs.
1. Rapid — fast virtual screening
Enumerate a handful of diverse shapes per molecule as fast as possible. Appropriate for ligand-based virtual screening at large scale.
max_out=5,n_steps=30— minimal per-molecule budgetdo_final_refine=False— skip the final MMFF pass (shape tools re-minimize anyway)seed_n_per_rotor=2,seed_prune_rms_thresh=1.5— coarser seedingminimize_batch_size=16— larger parallel batches for multi-core machines
from openconf import generate_conformers
ensemble = generate_conformers("CC(C)Cc1ccc(cc1)C(C)C(=O)O", preset="rapid")
Full config equivalent
from openconf import ConformerConfig, generate_conformers
config = ConformerConfig(
max_out=5,
pool_max=100,
n_steps=30,
energy_window_kcal=20.0,
seed_n_per_rotor=2,
seed_prune_rms_thresh=1.5,
do_final_refine=False,
minimize_batch_size=16,
dedupe_period=15,
shake_period=10,
final_select="diverse",
)
ensemble = generate_conformers("CC(C)Cc1ccc(cc1)C(C)C(=O)O", config=config)
| Molecule | Heavy atoms | Rotors | Time (s) | Conformers |
|---|---|---|---|---|
| butylbenzene | 13 | 3 | 0.043 | 5 |
| ibuprofen | 18 | 5 | 0.046 | 5 |
| celecoxib | 26 | 4 | 0.063 | 5 |
| maraviroc | 34 | 7 | 0.848 | 5 |
At ~45 ms per drug-like molecule on a single core, a 32-core machine processes roughly 60 M molecules/day — sufficient for 1B-scale campaigns with a cluster.
2. Ensemble — property prediction
A compact, diverse ensemble for downstream ML or physics-based properties (logP, pKa, conformational descriptors).
max_out=50,n_steps=200— balanced quality/speedenergy_window_kcal=10.0— includes the thermally accessible rangefinal_select="diverse"— maximize chemical diversity over the ensemble
from openconf import generate_conformers
ensemble = generate_conformers("CC(C)Cc1ccc(cc1)C(C)C(=O)O", preset="ensemble")
Full config equivalent
from openconf import ConformerConfig, generate_conformers
config = ConformerConfig(
max_out=50,
pool_max=500,
n_steps=200,
energy_window_kcal=10.0,
seed_n_per_rotor=3,
seed_prune_rms_thresh=1.0,
do_final_refine=True,
minimize_batch_size=8,
final_select="diverse",
)
ensemble = generate_conformers("CC(C)Cc1ccc(cc1)C(C)C(=O)O", config=config)
| Molecule | Heavy atoms | Rotors | Time (s) | Conformers |
|---|---|---|---|---|
| butylbenzene | 13 | 3 | 0.122 | 50 |
| ibuprofen | 18 | 5 | 0.186 | 50 |
| celecoxib | 26 | 4 | 0.275 | 50 |
| maraviroc | 34 | 7 | 1.580 | 50 |
3. Spectroscopic — NMR, IR, VCD
Exhaustively populate all thermally accessible conformers with accurate relative MMFF energies for Boltzmann-weighted spectral averaging.
energy_window_kcal=5.0— ~3 kcal covers >99% of the Boltzmann population at 300 K; 5 kcal provides margin for MMFF errorfinal_select="energy"— return lowest-energy conformers; weight byexp(-E/kT)parent_strategy="softmax"— bias sampling toward low-energy basinsseed_n_per_rotor=5,seed_prune_rms_thresh=0.5— dense seeding to avoid missing shallow minimado_final_refine=True— accurate relative energies are critical here
import numpy as np
from openconf import generate_conformers
ensemble = generate_conformers("CC(C)Cc1ccc(cc1)C(C)C(=O)O", preset="spectroscopic")
# Boltzmann weights at 300 K
RT = 0.592 # kcal/mol at 300 K
energies = np.array(ensemble.energies)
weights = np.exp(-(energies - energies.min()) / RT)
weights /= weights.sum()
Full config equivalent
from openconf import ConformerConfig, generate_conformers
config = ConformerConfig(
max_out=100,
pool_max=1000,
n_steps=400,
energy_window_kcal=5.0,
seed_n_per_rotor=5,
seed_prune_rms_thresh=0.5,
do_final_refine=True,
minimize_batch_size=8,
parent_strategy="softmax",
final_select="energy",
)
ensemble = generate_conformers("CC(C)Cc1ccc(cc1)C(C)C(=O)O", config=config)
| Molecule | Heavy atoms | Rotors | Time (s) | Conformers |
|---|---|---|---|---|
| butylbenzene | 13 | 3 | 0.181 | 91 |
| ibuprofen | 18 | 5 | 0.327 | 100 |
| celecoxib | 26 | 4 | 0.289 | 36 |
| maraviroc | 34 | 7 | 2.374 | 75 |
Fewer conformers for celecoxib/maraviroc reflect the tight 5 kcal window — rigid, aromatic-rich scaffolds have few populated conformers at room temperature.
4. Docking Pose Recovery
Maximize the chance that the bioactive conformation is in the output set,
i.e. minimize best-RMSD-to-crystal across the ensemble. This is the right
choice when preparing a single compound for docking where conformer quality
matters. For bulk library preparation (thousands of molecules), "rapid" is
usually more appropriate.
parent_strategy="uniform"— broad exploration; energy-biased sampling hurts recall of strained bioactive conformersenergy_window_kcal=18.0— bioactive conformations are often 5–15 kcal above the MMFF global minimumdo_final_refine=False— docking programs minimize inside the binding site; pre-minimized geometries can hurt pose recallmax_out=250,n_steps=500— larger ensemble improves recall at acceptable cost
from openconf import generate_conformers
ensemble = generate_conformers("CC(C)Cc1ccc(cc1)C(C)C(=O)O", preset="docking")
ensemble.to_sdf("output.sdf")
Full config equivalent
from openconf import ConformerConfig, PrismConfig, generate_conformers
config = ConformerConfig(
max_out=250,
pool_max=2500,
n_steps=500,
energy_window_kcal=18.0,
seed_n_per_rotor=4,
seed_prune_rms_thresh=0.8,
do_final_refine=False,
minimize_batch_size=8,
parent_strategy="uniform",
final_select="diverse",
prism_config=PrismConfig(energy_window_kcal=18.0),
)
ensemble = generate_conformers("CC(C)Cc1ccc(cc1)C(C)C(=O)O", config=config)
ensemble.to_sdf("docking_input.sdf")
| Molecule | Heavy atoms | Rotors | Time (s) | Conformers |
|---|---|---|---|---|
| butylbenzene | 13 | 3 | 0.232 | 140 |
| ibuprofen | 18 | 5 | 0.326 | 169 |
| celecoxib | 26 | 4 | 0.397 | 231 |
| maraviroc | 34 | 7 | 1.826 | 172 |
5. Analogue / FEP-style R-group exploration
Generate conformers for an MCS-aligned analogue while keeping the core scaffold
exactly fixed at the input pose. The correct entry point here is
generate_conformers_from_pose rather than generate_conformers.
- Starts from the supplied conformer — no ETKDG seeding
- Only free terminal rotors are explored (those whose moving fragment is entirely outside the constrained core)
- MMFF minimization uses stiff position restraints on all constrained atoms, then snaps them to exact starting coordinates so there is zero drift
- Global shake is suppressed to avoid thrashing the starting pose
from rdkit import Chem
from rdkit.Chem import AllChem
from openconf import generate_conformers_from_pose
# Suppose we have an MCS-aligned analogue with a propyl substituent
# replacing the butyl chain on a benzene scaffold.
mol = Chem.MolFromSmiles("CCCc1ccccc1")
mol = Chem.AddHs(mol)
AllChem.EmbedMolecule(mol, AllChem.ETKDGv3())
# Ring heavy-atom indices — these must not move
ring_atoms = [3, 4, 5, 6, 7, 8]
ensemble = generate_conformers_from_pose(mol, constrained_atoms=ring_atoms)
ensemble.to_sdf("analogues.sdf")
The default preset ("analogue") returns up to 50 conformers. Pass preset= or
config= to override:
from openconf import ConformerConfig, generate_conformers_from_pose
# Fewer conformers, faster turnaround
config = ConformerConfig(max_out=10, n_steps=60, pool_max=200)
ensemble = generate_conformers_from_pose(mol, constrained_atoms=ring_atoms, config=config)
Full analogue preset equivalent
from openconf import ConformerConfig, PrismConfig, generate_conformers_from_pose
config = ConformerConfig(
max_out=50,
pool_max=500,
n_steps=150,
energy_window_kcal=10.0,
do_final_refine=True,
minimize_batch_size=8,
parent_strategy="softmax",
final_select="diverse",
prism_config=PrismConfig(energy_window_kcal=10.0),
)
ensemble = generate_conformers_from_pose(mol, constrained_atoms=ring_atoms, config=config)
Configuration Quick Reference
| Parameter | Rapid | Ensemble | Spectroscopic | Docking | Analogue |
|---|---|---|---|---|---|
max_out |
5 | 50 | 100 | 250 | 50 |
n_steps |
30 | 200 | 400 | 500 | 150 |
energy_window_kcal |
20 | 10 | 5 | 18 | 10 |
seed_n_per_rotor |
2 | 3 | 5 | 4 | — |
seed_prune_rms_thresh |
1.5 | 1.0 | 0.5 | 0.8 | — |
do_final_refine |
False | True | True | False | True |
parent_strategy |
softmax | softmax | softmax | uniform | softmax |
final_select |
diverse | diverse | energy | diverse | diverse |
Analogue mode uses generate_conformers_from_pose; seeding parameters are unused because ETKDG is skipped.
How It Works
1. Seeding
Generates initial conformers using RDKit's ETKDGv3 algorithm. The seed count is computed automatically from molecular topology. For molecules with small non-aromatic rings, ETKDGv3's useSmallRingTorsions is enabled; for macrocycles (rings ≥ 10 atoms), useMacrocycleTorsions is enabled, applying crystallography-derived distance bounds for better ring-closure geometries.
2. Hybrid Exploration
The default "hybrid" strategy combines:
- Torsion library: 365 crystallography-derived SMARTS rules (RDKit CrystalFF, Riniker & Landrum 2016) with Boltzmann-weighted angle preferences
- MCMM moves: random torsion perturbations biased by the library
- Correlated moves: simultaneous changes to adjacent rotors
- Ring flip moves: SVD plane-reflection of non-aromatic 5–7-membered rings to sample chair/envelope inversions
- Global shakes: periodic large perturbations to escape local basins
3. Minimization
Each proposed conformer is minimized with MMFF94s to ensure physically reasonable geometries.
4. Deduplication
Uses PRISM Pruner for efficient duplicate removal via moment-of-inertia filtering followed by RMSD-based pruning.
5. Selection
Final selection returns the lowest-energy conformers after PRISM deduplication.
For the full algorithm description and parameter tuning guide, see SCIENCE.md.
Benchmarks
Validated on the Iridium benchmark (120 drug-like molecules, bioactive conformer recovery from crystal structures). At N=200, openconf achieves a median best-RMSD of 0.58 Å vs. 0.63 Å for ETKDG+MMFF94s, at 10–15× lower wall time. The advantage is concentrated in flexible molecules (7–9 rotatable bonds), where openconf's torsion-library biasing and ring flip moves outperform pure distance-geometry seeding.
openconf is not recommended for macrocycles (ring size ≥ 12). On macrocyclic ring systems, ETKDG+MMFF94s with a large conformer budget outperforms openconf in both RMSD and ensemble coverage metrics. ETKDGv3 has dedicated macrocycle distance-geometry bounds that openconf does not replicate.
API Reference
Main Functions
generate_conformers(mol, method="hybrid", config=None)- Main entry pointgenerate_conformers_from_smiles(smiles, ...)- Convenience wrappergenerate_conformers_from_pose(mol, constrained_atoms, config=None)- FEP-style analogue generation from an aligned pose
Configuration Classes
ConformerConfig- Main configurationPrismConfig- PRISM Pruner settingsConstraintSpec- Positional constraints for pose-locked generation
Data Classes
ConformerEnsemble- Container for conformers and metadataConformerRecord- Per-conformer metadata
Lower-Level Components
prepare_molecule(mol)- Sanitize and add hydrogensbuild_rotor_model(mol)- Identify rotatable bonds and flippable ringsTorsionLibrary- 365 crystallography-derived SMARTS torsion rules; load custom rules withTorsionLibrary.from_json(path)prism_dedupe(mol, conf_ids, config)- Deduplication
Dependencies
- RDKit >= 2022.03
- NumPy >= 1.20
- prism-pruner >= 0.0.3
License
MIT License
Citation
If you use openconf in your research, please cite:
@software{openconf,
title = {openconf: Modular conformer generation for docking and ensemble workflows},
year = {2026},
url = {https://github.com/rowansci/openconf}
}
Acknowledgments
- PRISM Pruner by Nicolò Tampellini for efficient conformer deduplication
- RDKit for cheminformatics infrastructure and the CrystalFF torsion library (Riniker & Landrum, J. Chem. Inf. Model. 56, 2016)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file openconf-0.0.5.tar.gz.
File metadata
- Download URL: openconf-0.0.5.tar.gz
- Upload date:
- Size: 109.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
91ff3a4fd799a786c5d215640dfe8de5f637daebb2be2bc59ccc1e2e2e67db94
|
|
| MD5 |
79ccefb78670c9b99f73eb45cf416235
|
|
| BLAKE2b-256 |
60ee88f763af3df6de3f759fe98041f2b749f9f375e216e3747c7efaec6bb2aa
|
File details
Details for the file openconf-0.0.5-py3-none-any.whl.
File metadata
- Download URL: openconf-0.0.5-py3-none-any.whl
- Upload date:
- Size: 46.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c3f1a040e222e0895c57a35539e63fff7de3009af01756256d5b38cbe04d5b4a
|
|
| MD5 |
16adf630f28c1cff7fcefc4fe4c28684
|
|
| BLAKE2b-256 |
4694a10d23f817b3e47066a17f97807606e587c8c4bb7c35e5de23f37b621baa
|