OF-DFT using machine learning

Project description

SCIAI-DFT

This repository contains the code used in our publication Stable and Accurate Orbital-Free Density Functional Theory Powered by Machine Learning. Using equivariant graph neural networks we enable Orbital-Free Density Functional Theory calculations by learning the kinetic energy functional from data.

Contributors

_{Roman Remme}	_{Tobias Kaczun}	_{Tim Ebert}	_{Christof A. Gehrig}	_{Dominik Geng}	_{Gerrit Gerhartz}	_{Marc K. Ickler}
_{Manuel V. Klockow}	_{Peter Lippmann}	_{Johannes S. Schmidt}	_{Simon Wagner}	_{Fred A. Hamprecht}	_{Andreas Dreuw}

Installation

UV (recommended)

For installation with CUDA run

uv sync

inside this directory. For the cpu version run

uv sync --group pyg-cpu --no-group pyg

PyPI

Please note that the PyPI package does not support data generation or training, just inference! If you want to train your own model please clone the github project.

# install prerequisites
pip install torch-scatter torch-sparse torch-cluster --find-links https://data.pyg.org/whl/torch-2.4.1+cu124.html
# Install tensorframes
pip install git+https://github.com/sciai-lab/tensor_frames.git@cd1addfd3c82a47095c9961ab999dcabfab4c21d#
# install mldft
pip install mldft

Alternatively with uv in one go:

uv pip install mldft torch-scatter torch-sparse torch-cluster --find-links https://data.pyg.org/whl/torch-2.4.1+cu124.html git+https://github.com/sciai-lab/tensor_frames.git@cd1addfd3c82a47095c9961ab999dcabfab4c21d#

Install using Conda/Mamba/Micromamba

For conda or mamba replace micromamba with conda or mamba below. If you want to create the environment with CPU support only, you can replace environment.yaml with environment_cpu.yaml.

micromamba env create -f environment.yaml  # create mamba environment
micromamba activate mldft                  # activate environment
pip install -e .                           # install as an editable package
pip install -e tensorframes                # install tensorframes

Install using Pip

pip install -r requirements.txt -e .       # install requirements and package
pip install -e tensorframes                # install tensorframes

Setup using script

To actually run an OFDFT calculation using an ML model you now need to get an ML model and setup some environment variables. To make this easier we supply a little setup script that you can call with

mldft_setup

It will ask you where to place the datasets and ml models, defaults are $HOME/dft_data$ and $HOME/dft_models and also export them as the respective environment variables DFT_DATA and DFT_MODELS. Then it will offer you to download our two models from our repo on hugging face (note that you require the hugging face python package that you can install with pip install huggingface_hub). It will furthermore offer you to download some dataset statistics which will be required if you want to use the SAD guess during density optimization with the cli.

Environment variables

Before running the code you need to set the two following environment variables DFT_DATA, the path where the data should be stored and DFT_MODELS which is the path where the training runs including model checkpoints, logs and tensorboard files should be stored. You can set them in your .bashrc or .zshrc

export DFT_DATA="/path/to/data"
export DFT_MODELS="/path/to/models"

Usage

This is a general usage manual. To reproduce results from our paper see REPLICATION_GUIDE.md.

We use hydra to manage configurations. The main configuration files are located in configs/.

Data generation

Our datasets used are available at dryad.

(Optional) Create your own dataset class or use the MISC dataset and provide xyz files to set which molecules should be generated.
(Optional) Create a config file in configs/datagen/dataset/
Run Kohn-Sham DFT on the dataset and create .chk files in $DFT_DATA/dataset/kohn_sham:

Example: mldft_ks dataset=<your_dataset_config_name> n_molecules=1000 start_idx=0
Based on the Kohn-Sham result, do density fitting, compute energy and gradients and save as labels for the machine learning model in $DFT_DATA/dataset/labels:

Example: mldft_labelgen dataset=<your_dataset_config_name> n_molecules=-1
Split the file into train, validation and test dataset usingmldft/utils/create_dataset_splits.py.

Example: python mldft/utils/create_dataset_splits.py <dataset_name>
Create a train data config in configs/ml/data to link to the dataset, important are dataset_name and the right setting of atom types in the dataset.
Transform into a basis (to reduce dataloading computations during training), for Graphformer models use local_frames_global_natrep.

Example: python mldft/datagen/transform_dataset.py data=<your_train_data_config_name> data/transforms=local_frames_global_natrep
Compute dataset statistics, important is to compute them for the transformation and the energy target that you want to use.

Example: python mldft/ml/compute_dataset_statistics.py data=<your_dataset_config_name>

Now you can start Training

Training

Training can be run with:

python mldft/ml/train.py data=<train_data_config> model=<model_config>

Two important settings are

data/transforms: This determines whether the data has been pre-transformed. The default is local_frames_global_natrep which means that both local frames and global natural reparametrization has been applied.
data.target_key: The target you are training. The default is kin_plus_xc which means you train on the total kinetic energy and exchange-correlation energy and their gradient. Alternatives are kin_minus_apbe which is a delta learning approach to the kinetic energy obtained from the APBE kinetic energy functional and tot which means you are training on the total electronic energy.

Density Optimization

On a Dataset

To run density optimization on a dataset in our format, you can run the following command:

python mldft/ofdft/run_density_optimization run_path=<path_to_ml_model> \
    n_molecules=<number_of_molecules> device=<device> initialization=<initialization> num_devices=<num_devices>

path_to_model is the path of to the model relative to DFT_MODELS
n_molecules the number of molecules which should be computed
device the device on which the computation should run, e.g. cuda, cpu, ...
initialization the initialization to use either sad, minao or hückel, the sad initialization requires appropriate dataset statistics.

By default this will run on the validation set of the dataset the model was trained on, but you can overwrite split_file_path to use another split file and split to toggle between train, val and test splits of the dataset. Results are plotted in the files density_optimization.pdf and density_optimization_summary.pdf.

On arbitrary molecules

If you want to run the density optimization on molecules in .xyz files which are not part of any dataset you can do so with:

mldft example.xyz --model /path/to/some/model
# get all options:
mldft --help

--model needs to be the path to the model directory containing a hparams.yaml file as well as a checkpoints/ directory with a last.ckpt checkpoint. This will only work if the model has been trained for all atom types present in the molecule. A logfile with the same base name as the .xyz file and the .log suffix will be created. Additionally, if you have the required dataset statistics you can use the sad initialization, by default minao. The result will be saved in a file with .pt ending with the same base name as your .xyz file.

Alternatively if you downloaded our models using our setup script you can also give our models by name:

mldft xyzfile --model  str25_qm9 # or str25_qmugs

Additional Info

Build documentation

make docs
# or to build from scratch:
make docs-clean

Template

For more details about the template visit: https://github.com/ashleve/lightning-hydra-template

Third-party licenses

This code adapts code from the following third party libraries:

These are distributed under the MIT license which can be found in the license file.

Citation

If you use this repository in your research please cite following paper:

@article{Remme_Stable_and_Accurate_2025,
author = {Remme, Roman and Kaczun, Tobias and Ebert, Tim and Gehrig, Christof A. and Geng, Dominik and Gerhartz, Gerrit and Ickler, Marc K. and Klockow, Manuel V. and Lippmann, Peter and Schmidt, Johannes S. and Wagner, Simon and Dreuw, Andreas and Hamprecht, Fred A.},
doi = {10.1021/jacs.5c06219},
journal = {Journal of the American Chemical Society},
number = {32},
pages = {28851--28859},
title = {{Stable and Accurate Orbital-Free Density Functional Theory Powered by Machine Learning}},
url = {https://doi.org/10.1021/jacs.5c06219},
volume = {147},
year = {2025}
}

Project details

Release history Release notifications | RSS feed

0.0.2

Nov 21, 2025

This version

0.0.1

Oct 20, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mldft-0.0.1.tar.gz (253.5 kB view details)

Uploaded Oct 20, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

mldft-0.0.1-py3-none-any.whl (308.7 kB view details)

Uploaded Oct 20, 2025 Python 3

File details

Details for the file mldft-0.0.1.tar.gz.

File metadata

Download URL: mldft-0.0.1.tar.gz
Upload date: Oct 20, 2025
Size: 253.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.0

File hashes

Hashes for mldft-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`c0d1bf3eb3f0470e9bfab57c4cc8bd6e482b23a440abe69bb8de634bb81a2989`
MD5	`3bbf9eb76a28d0034b6345c504d46152`
BLAKE2b-256	`19084e34ecbc437ec67d604544401318a36456a4344f6c4836edcbf38f2ada40`

See more details on using hashes here.

File details

Details for the file mldft-0.0.1-py3-none-any.whl.

File metadata

Download URL: mldft-0.0.1-py3-none-any.whl
Upload date: Oct 20, 2025
Size: 308.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.0

File hashes

Hashes for mldft-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`290bdfbac641480c68bdb88f88a2f64a408bb35e3f070403b31732f037de1c83`
MD5	`ec389050237dcf3af923b926b924f958`
BLAKE2b-256	`9d2783bb2f7562d2419617dd002f4c855129acd786a083d89ecca2e443bb36a6`

See more details on using hashes here.

mldft 0.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

SCIAI-DFT

Contributors

Installation

UV (recommended)

PyPI

Install using Conda/Mamba/Micromamba

Install using Pip

Setup using script

Environment variables

Usage

Data generation

Training

Density Optimization

On a Dataset

On arbitrary molecules

Additional Info

Build documentation

Template

Third-party licenses

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes