A python library to work with molecules. Built on top of RDKit.
Project description
datamol - molecular processing made easy
Datamol is a python library to work with molecules. It's a layer built on top of RDKit and aims to be as light as possible.
- 🐍 Simple pythonic API
- ⚗️ RDKit first: all you manipulate are
rdkit.Chem.Mol
objects. - ✅ Manipulating molecules often relies on many options; Datamol provides good defaults by design.
- 🧠 Performance matters: built-in efficient parallelization when possible with an optional progress bar.
- 🕹️ Modern IO: out-of-the-box support for remote paths using
fsspec
to read and write multiple formats (sdf, xlsx, csv, etc).
Try Online
Documentation
Visit https://docs.datamol.io.
Installation
Use conda:
mamba install -c conda-forge datamol
Quick API Tour
import datamol as dm
# Common functions
mol = dm.to_mol("O=C(C)Oc1ccccc1C(=O)O", sanitize=True)
fp = dm.to_fp(mol)
selfies = dm.to_selfies(mol)
inchi = dm.to_inchi(mol)
# Standardize and sanitize
mol = dm.to_mol("O=C(C)Oc1ccccc1C(=O)O")
mol = dm.fix_mol(mol)
mol = dm.sanitize_mol(mol)
mol = dm.standardize_mol(mol)
# Dataframe manipulation
df = dm.data.freesolv()
mols = dm.from_df(df)
# 2D viz
legends = [dm.to_smiles(mol) for mol in mols[:10]]
dm.viz.to_image(mols[:10], legends=legends)
# Generate conformers
smiles = "O=C(C)Oc1ccccc1C(=O)O"
mol = dm.to_mol(smiles)
mol_with_conformers = dm.conformers.generate(mol)
# 3D viz (using nglview)
dm.viz.conformers(mol, n_confs=10)
# Compute SASA from conformers
sasa = dm.conformers.sasa(mol_with_conformers)
# Easy IO
mols = dm.read_sdf("s3://my-awesome-data-lake/smiles.sdf", as_df=False)
dm.to_sdf(mols, "gs://data-bucket/smiles.sdf")
How to cite
Please cite Datamol if you use it in your research: .
Compatibilities
Version compatibilities are an essential topic for production-software stacks. We are cautious about documenting compatibility between datamol
, python
and rdkit
.
See below the associated versions of Python and RDKit, for which a minor version of Datamol has been tested during its whole lifecycle. It does not mean other combinations does not work but that those are not tested.
datamol |
python |
rdkit |
---|---|---|
0.12.x |
[3.10, 3.11] |
[2023.03, 2023.09] |
0.11.x |
[3.9, 3.10, 3.11] |
[2022.09, 2023.03] |
0.10.x |
[3.9, 3.10, 3.11] |
[2022.03, 2022.09] |
0.9.x |
[3.9, 3.10, 3.11] |
[2022.03, 2022.09] |
0.8.x |
[3.8, 3.9, 3.10] |
[2021.09, 2022.03, 2022.09] |
0.7.x |
[3.8, 3.9] |
[2021.09, 2022.03] |
0.6.x |
[3.8, 3.9] |
[2021.09] |
0.5.x |
[3.8, 3.9] |
[2021.03, 2021.09] |
0.4.x |
[3.8, 3.9] |
[2020.09, 2021.03] |
0.3.x |
[3.8, 3.9] |
[2020.09, 2021.03] |
CI Status
The CI runs tests and performs code quality checks for the following combinations:
- The three major platforms: Windows, OSX and Linux.
- The two latest Python versions.
- The two latest RDKit versions.
main |
|
---|---|
Lib build & Testing | |
Code Sanity (linting and type analysis) | |
Documentation Build |
License
Under the Apache-2.0 license. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file datamol-0.12.5.tar.gz
.
File metadata
- Download URL: datamol-0.12.5.tar.gz
- Upload date:
- Size: 3.9 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8f0c6f498d542b0c9182f685ba46860f0628ac4c0caa3b195923150641bdcc57 |
|
MD5 | f4ca60b1538f8526b73e9ee605641fa3 |
|
BLAKE2b-256 | e4875e0eeea2f1bf4de215de2d74af78f041de1936214de9009d46e4ff7503cf |
File details
Details for the file datamol-0.12.5-py3-none-any.whl
.
File metadata
- Download URL: datamol-0.12.5-py3-none-any.whl
- Upload date:
- Size: 495.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 70a3c60ac3379d853611e266051ea38aad02bb44611d56cc7e8a470f904d704d |
|
MD5 | ed3c0b3a65fa268b192c6c3663a633b6 |
|
BLAKE2b-256 | d4e73df58df1af04dcc694a6c970e7138ba3ec3447a442deb427914438326107 |