Skip to main content

A python library to work with molecules. Built on top of RDKit.

Project description

Molecular Manipulation Made Easy

DOI Binder PyPI Conda PyPI - Downloads Conda PyPI - Python Version license GitHub Repo stars GitHub Repo stars Codecov

Datamol is a python library to work with molecules. It's a layer built on top of RDKit and aims to be as light as possible.

  • 🐍 Simple pythonic API
  • ⚗️ RDKit first: all you manipulate are rdkit.Chem.Mol objects.
  • ✅ Manipulating molecules often rely on many options; Datamol provides good defaults by design.
  • 🧠 Performance matters: built-in efficient parallelization when possible with optional progress bar.
  • 🕹️ Modern IO: out-of-the-box support for remote paths using fsspec to read and write multiple formats (sdf, xlsx, csv, etc).

Try Online

Visit Binder and try Datamol online.




Use conda:

mamba install -c conda-forge datamol

Quick API Tour

import datamol as dm

# Common functions
mol = dm.to_mol("O=C(C)Oc1ccccc1C(=O)O", sanitize=True)
fp = dm.to_fp(mol)
selfies = dm.to_selfies(mol)
inchi = dm.to_inchi(mol)

# Standardize and sanitize
mol = dm.to_mol("O=C(C)Oc1ccccc1C(=O)O")
mol = dm.fix_mol(mol)
mol = dm.sanitize_mol(mol)
mol = dm.standardize_mol(mol)

# Dataframe manipulation
df =
mols = dm.from_df(df)

# 2D viz
legends = [dm.to_smiles(mol) for mol in mols[:10]]
dm.viz.to_image(mols[:10], legends=legends)

# Generate conformers
smiles = "O=C(C)Oc1ccccc1C(=O)O"
mol = dm.to_mol(smiles)
mol_with_conformers = dm.conformers.generate(mol)

# 3D viz (using nglview)
dm.viz.conformers(mol, n_confs=10)

# Compute SASA from conformers
sasa = dm.conformers.sasa(mol_with_conformers)

# Easy IO
mols = dm.read_sdf("s3://my-awesome-data-lake/smiles.sdf", as_df=False)
dm.to_sdf(mols, "gs://data-bucket/smiles.sdf")

How to cite

Please cite Datamol if you use it in your research: DOI.


Version compatibilities are an essential topic for production-software stacks. We are cautious about documenting compatibility between datamol, python and rdkit.

See below the associated versions of Python and RDKit, for which a minor version of Datamol has been tested during its whole lifecycle.

datamol python rdkit
0.7 [3.8, 3.9] [2021.09, 2022.03]
0.6 [3.8, 3.9] [2021.09]
0.5 [3.8, 3.9] [2021.03, 2021.09]
0.4 [3.8, 3.9] [2020.09, 2021.03]
0.3 [3.8, 3.9] [2020.09, 2021.03]

CI Status

The CI run tests and perform code quality checks for the following combinations:

  • The three major platforms: Windows, OSX and Linux.
  • The two latest Python versions.
  • The two latest RDKit versions.
Lib build & Testing GitHub Workflow Status
Code Sanity (linting and type analysis) GitHub Workflow Status
Documentation Build GitHub Workflow Status


See the latest changelogs at CHANGELOG.rst.


Under the Apache-2.0 license. See LICENSE.


See AUTHORS.rst.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datamol-0.7.14.tar.gz (278.4 kB view hashes)

Uploaded source

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page