unimol_tools is a Python package for property prediction with Uni-Mol in molecule, materials and protein.

These details have not been verified by PyPI

Project links

Homepage

Project description

Uni-Mol Tools

Python versions Maintained

Unimol_tools is a easy-use wrappers for property prediction,representation and downstreams with Uni-Mol.

Uni-Mol tools for various prediction and downstreams.

📖 Documentation: unimol-tools.readthedocs.io

Install

pytorch is required, please install pytorch according to your environment. if you are using cuda, please install pytorch with cuda. More details can be found at https://pytorch.org/get-started/locally/

Option 1: Installing from PyPi (Recommended, for stable version)

pip install unimol_tools --upgrade

We recommend installing huggingface_hub so that the required unimol models can be automatically downloaded at runtime! It can be install by

pip install huggingface_hub

huggingface_hub allows you to easily download and manage models from the Hugging Face Hub, which is key for using Uni-Mol models.

Option 2: Installing from source (for latest version)

## Clone repository
git clone https://github.com/deepmodeling/unimol_tools.git
cd unimol_tools

## Dependencies installation
pip install -r requirements.txt

## Install
python setup.py install

Models in Huggingface

The UniMol pretrained models can be found at dptech/Uni-Mol-Models.

If pretrained_model_path or pretrained_dict_path are left as None the toolkit will automatically download the corresponding files from this Hugging Face repository at runtime.

If the download is slow, you can use a mirror, such as:

export HF_ENDPOINT=https://hf-mirror.com

By default unimol_tools first tries the official Hugging Face endpoint. If that fails and HF_ENDPOINT is not set, it automatically retries using https://hf-mirror.com. Set HF_ENDPOINT yourself if you want to explicitly choose a mirror or the official site.

Modify the default directory for weights

Setting the UNIMOL_WEIGHT_DIR environment variable specifies the directory for pre-trained weights if the weights have been downloaded from another source.

export UNIMOL_WEIGHT_DIR=/path/to/your/weights/dir/

News

2025-09-22: Lightweight pre-training tools are now available in Unimol_tools!
2025-05-26: Unimol_tools is now independent from the Uni-Mol repository!
2025-03-28: Unimol_tools now support Distributed Data Parallel (DDP)!
2024-11-22: Unimol V2 has been added to Unimol_tools!
2024-07-23: User experience improvements: Add UNIMOL_WEIGHT_DIR.
2024-06-25: unimol_tools has been publish to pypi! Huggingface has been used to manage the pretrain models.
2024-06-20: unimol_tools v0.1.0 released, we remove the dependency of Uni-Core. And we will publish to pypi soon.
2024-03-20: unimol_tools documents is available at https://unimol-tools.readthedocs.io/en/latest/

Examples

Molecule property prediction

from unimol_tools import MolTrain, MolPredict
clf = MolTrain(
    task='classification',
    data_type='molecule',
    epochs=10,
    batch_size=16,
    metrics='auc',
    # pretrained weights are downloaded automatically when left as ``None``
    # pretrained_model_path='/path/to/checkpoint.ckpt',
    # pretrained_dict_path='/path/to/dict.txt',
)
clf.fit(data = train_data)
# currently support data with smiles based csv/txt file, and sdf file with mol,
# and custom dict of {'atoms':[['C','C'],['C','H','O']], 'coordinates':[coordinates_1,coordinates_2]}

# The dict format can refer to the following format, or be obtained from sdf, 
# which can also be directly input into the model.
train_sdf = PandasTools.LoadSDF('exp/unimol_conformers_train.sdf')
train_dict = {
    'atoms': [list(atom.GetSymbol() for atom in mol.GetAtoms()) for mol in train_sdf['ROMol']],
    # atoms[0]: ['C', 'C', 'O', 'C', 'O', 'C', ...]
    'coordinates': [mol.GetConformers()[0].GetPositions() for mol in train_sdf['ROMol']],
    # coordinates[0]: array([[ 6.6462, -1.8268,  1.9275],
    #                        [ 6.1552, -1.9367,  0.4873],
    #                        [ 5.1832, -0.8757,  0.3007],
    #                        [ 5.4651, -0.0272, -0.7266],
    #                        [ 4.8586, -0.0844, -1.7917],
    #                        [ 6.5362,  0.9767, -0.3742],
    #                        ...,])
    'TARGET': train_sdf['TARGET'].tolist()
    # TARGET: [0, 1, 0, 0, 1, 0, ...]
}
# clf.fit(data = train_sdf)
# clf.fit(data = train_dict)


clf = MolPredict(load_model='../exp')
res = clf.predict(data = test_data)

Molecule representation

import numpy as np
from unimol_tools import UniMolRepr
# single SMILES UniMol representation. If no paths are provided the
# pretrained model and dictionary are fetched from Hugging Face.
clf = UniMolRepr(
    data_type='molecule',
    remove_hs=False,
    # pretrained_model_path='/path/to/checkpoint.ckpt',
    # pretrained_dict_path='/path/to/dict.txt',
)
smiles = 'c1ccc(cc1)C2=NCC(=O)Nc3c2cc(cc3)[N+](=O)[O]'
smiles_list = [smiles]
unimol_repr = clf.get_repr(smiles_list, return_atomic_reprs=True)
# CLS token repr
print(np.array(unimol_repr['cls_repr']).shape)
# atomic level repr, align with rdkit mol.GetAtoms()
print(np.array(unimol_repr['atomic_reprs']).shape)

Command-line utilities

Hydra-powered entry points make training, prediction, and representation available from the command line. Key-value pairs override options from the YAML files in unimol_tools/config.

Training

python -m unimol_tools.cli.run_train \
    train_path=train.csv \
    task=regression \
    save_path=./exp \
    smiles_col=smiles \
    target_cols=[target1] \
    epochs=10 \
    learning_rate=1e-4 \
    batch_size=16 \
    kfold=5

Prediction

python -m unimol_tools.cli.run_predict load_model=./exp data_path=test.csv

Representation

python -m unimol_tools.cli.run_repr data_path=test.csv smiles_col=smiles

Molecule pretraining

unimol_tools provides a command-line utility for pretraining Uni-Mol models on your own dataset. The script uses Hydra so configuration values can be overridden at the command line. Two common invocation examples are shown below: one for LMDB data and one for a CSV of SMILES strings.

LMDB dataset

export TORCH_NCCL_ASYNC_ERROR_HANDLING=1
export HYDRA_FULL_ERROR=1
export OMP_NUM_THREADS=1

torchrun --standalone --nproc_per_node=NUM_GPUS \
    -m unimol_tools.cli.run_pretrain \
    dataset.train_path=train.lmdb \
    dataset.valid_path=valid.lmdb \
    dataset.data_type=lmdb \
    dataset.dict_path=dict.txt \
    training.total_steps=1000000 \
    training.batch_size=16 \
    training.update_freq=1

dataset.dict_path is optional. The effective batch size is n_gpu * training.batch_size * training.update_freq.

CSV dataset

export TORCH_NCCL_ASYNC_ERROR_HANDLING=1
export HYDRA_FULL_ERROR=1
export OMP_NUM_THREADS=1

torchrun --standalone --nproc_per_node=NUM_GPUS \
    -m unimol_tools.cli.run_pretrain \
    dataset.train_path=train.csv \
    dataset.valid_path=valid.csv \
    dataset.data_type=csv \
    dataset.smiles_column=smiles \
    training.total_steps=1000000 \
    training.batch_size=16 \
    training.update_freq=1

For multi-node training, specify additional arguments, for example:

export TORCH_NCCL_ASYNC_ERROR_HANDLING=1
export HYDRA_FULL_ERROR=1
export OMP_NUM_THREADS=1

torchrun --nnodes=2 --nproc_per_node=8 --node_rank=0 \
    --master_addr=<master-ip> --master_port=<port> \
    -m unimol_tools.cli.run_pretrain ...

All available options are defined in pretrain_config.py, and checkpoints along with the dictionary are saved to the run directory. When GPU memory is limited, increase training.update_freq to accumulate gradients while keeping the effective batch size n_gpu * training.batch_size * training.update_freq.

Credits

We thanks all contributors from the community for their suggestions, bug reports and chemistry advices. Currently unimol-tools is maintained by Yaning Cui, Xiaohong Ji, Zhifeng Gao from DP Technology and AI for Science Insitution, Beijing.

Please kindly cite our papers if you use this tools.


@article{gao2023uni,
  title={Uni-qsar: an auto-ml tool for molecular property prediction},
  author={Gao, Zhifeng and Ji, Xiaohong and Zhao, Guojiang and Wang, Hongshuai and Zheng, Hang and Ke, Guolin and Zhang, Linfeng},
  journal={arXiv preprint arXiv:2304.12239},
  year={2023}
}

License

This project is licensed under the terms of the MIT license. See LICENSE for additional details.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.1.5

Sep 24, 2025

0.1.4.post1

May 28, 2025

0.1.4

May 26, 2025

0.1.3.post1

Apr 7, 2025

0.1.3

Mar 28, 2025

0.1.2.post2

Jan 6, 2025

0.1.2.post1

Dec 26, 2024

0.1.2

Dec 23, 2024

0.1.1.post1

Nov 29, 2024

0.1.1

Nov 21, 2024

0.1.0.post4

Sep 9, 2024

0.1.0.post3

Sep 9, 2024

0.1.0.post2

Jul 24, 2024

0.1.0.post1

Jun 25, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

unimol_tools-0.1.5.tar.gz (88.4 kB view details)

Uploaded Sep 24, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

unimol_tools-0.1.5-py3-none-any.whl (106.3 kB view details)

Uploaded Sep 24, 2025 Python 3

File details

Details for the file unimol_tools-0.1.5.tar.gz.

File metadata

Download URL: unimol_tools-0.1.5.tar.gz
Upload date: Sep 24, 2025
Size: 88.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for unimol_tools-0.1.5.tar.gz
Algorithm	Hash digest
SHA256	`58b995881797ad57824df9363dc9ea91ab396b65492b8898b6faf3418980ac77`
MD5	`2110b9350fd39880cb42dabaef1a54e9`
BLAKE2b-256	`18cc2664de775c033ea96c3423c789f45275c68f89feaff02ae16a3a5f2c9227`

See more details on using hashes here.

File details

Details for the file unimol_tools-0.1.5-py3-none-any.whl.

File metadata

Download URL: unimol_tools-0.1.5-py3-none-any.whl
Upload date: Sep 24, 2025
Size: 106.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.18

File hashes

Hashes for unimol_tools-0.1.5-py3-none-any.whl
Algorithm	Hash digest
SHA256	`311a80fdcfeebd842c3270b32775acaf4f8989bcc6f20ab2b0e8acaa73bcbdca`
MD5	`8194b6e9fd4adf00f0f304920d19b365`
BLAKE2b-256	`1cd19ad33541fb5d8f6051af52ee8b4760bcea9356341a5a71ee36e2cabb43ef`

See more details on using hashes here.

unimol-tools 0.1.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Uni-Mol Tools

Uni-Mol tools for various prediction and downstreams.

Install

Option 1: Installing from PyPi (Recommended, for stable version)

Option 2: Installing from source (for latest version)

Models in Huggingface

Modify the default directory for weights

News

Examples

Molecule property prediction

Molecule representation

Command-line utilities

Training

Prediction

Representation

Molecule pretraining

LMDB dataset

CSV dataset

Credits

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes