OF-DFT using machine learning
Project description
This repository contains the code used in our publication Stable and Accurate Orbital-Free Density Functional Theory Powered by Machine Learning. Using equivariant graph neural networks we enable Orbital-Free Density Functional Theory calculations by learning the kinetic energy functional from data.
Contributors
Roman Remme |
Tobias Kaczun |
Tim Ebert |
Christof A. Gehrig |
Dominik Geng |
Gerrit Gerhartz |
Marc K. Ickler |
Manuel V. Klockow |
Peter Lippmann |
Johannes S. Schmidt |
Simon Wagner |
Fred A. Hamprecht |
Andreas Dreuw |
Installation
UV (recommended)
For installation with CUDA run
uv sync
inside this directory. For the cpu version run
uv sync --group pyg-cpu --no-group pyg
PyPI
Please note that the PyPI package does not support data generation or training, just inference! If you want to train your own model please clone the github project.
# install prerequisites
pip install torch-scatter torch-sparse torch-cluster --find-links https://data.pyg.org/whl/torch-2.4.1+cu124.html
# Install tensorframes
pip install git+https://github.com/sciai-lab/tensor_frames.git@cd1addfd3c82a47095c9961ab999dcabfab4c21d#
# install mldft
pip install mldft
Alternatively with uv in one go:
uv pip install mldft torch-scatter torch-sparse torch-cluster --find-links https://data.pyg.org/whl/torch-2.4.1+cu124.html git+https://github.com/sciai-lab/tensor_frames.git@cd1addfd3c82a47095c9961ab999dcabfab4c21d#
Install using Conda/Mamba/Micromamba
For conda or mamba replace micromamba with conda or mamba below.
If you want to create the environment with CPU support only, you can replace
environment.yaml with environment_cpu.yaml.
micromamba env create -f environment.yaml # create mamba environment
micromamba activate mldft # activate environment
pip install -e . # install as an editable package
pip install -e tensorframes # install tensorframes
Install using Pip
pip install -r requirements.txt -e . # install requirements and package
pip install -e tensorframes # install tensorframes
Setup using script
To actually run an OFDFT calculation using an ML model you now need to get an ML model and setup some environment variables. To make this easier we supply a little setup script that you can call with
mldft_setup
It will ask you where to place the datasets and ml models, defaults are
$HOME/dft_data$ and $HOME/dft_models and also export them as the respective environment variables DFT_DATA and DFT_MODELS. Then it will offer you
to download our two models from our repo on hugging
face (note that you
require the hugging face python package that you can install with pip install huggingface_hub).
It will furthermore offer you to download some dataset statistics which will be required if you want to use the SAD guess during density optimization with the cli.
Environment variables
Before running the code you need to set the two following environment variables DFT_DATA, the path
where the data should be stored and DFT_MODELS which is the path where the training runs
including model checkpoints, logs and tensorboard files should be stored. You can set them in your
.bashrc or .zshrc
export DFT_DATA="/path/to/data"
export DFT_MODELS="/path/to/models"
Usage
This is a general usage manual. To reproduce results from our paper see REPLICATION_GUIDE.md.
We use hydra to manage configurations. The main configuration files are located in configs/.
Data generation
Our datasets used are available at dryad.
-
(Optional) Create your own dataset class or use the MISC dataset and provide xyz files to set which molecules should be generated.
-
(Optional) Create a config file in
configs/datagen/dataset/ -
Run Kohn-Sham DFT on the dataset and create
.chkfiles in$DFT_DATA/dataset/kohn_sham:Example:
mldft_ks dataset=<your_dataset_config_name> n_molecules=1000 start_idx=0 -
Based on the Kohn-Sham result, do density fitting, compute energy and gradients and save as labels for the machine learning model in
$DFT_DATA/dataset/labels:Example:
mldft_labelgen dataset=<your_dataset_config_name> n_molecules=-1 -
Split the file into train, validation and test dataset using
mldft/utils/create_dataset_splits.py.Example:
python mldft/utils/create_dataset_splits.py <dataset_name> -
Create a train data config in
configs/ml/datato link to the dataset, important are dataset_name and the right setting of atom types in the dataset. -
Transform into a basis (to reduce dataloading computations during training), for
Graphformermodels uselocal_frames_global_natrep.Example:
python mldft/datagen/transform_dataset.py data=<your_train_data_config_name> data/transforms=local_frames_global_natrep -
Compute dataset statistics, important is to compute them for the transformation and the energy target that you want to use.
Example:
python mldft/ml/compute_dataset_statistics.py data=<your_dataset_config_name>
Now you can start Training
Training
Training can be run with:
python mldft/ml/train.py data=<train_data_config> model=<model_config>
Two important settings are
data/transforms: This determines whether the data has been pre-transformed. The default islocal_frames_global_natrepwhich means that both local frames and global natural reparametrization has been applied.data.target_key: The target you are training. The default iskin_plus_xcwhich means you train on the total kinetic energy and exchange-correlation energy and their gradient. Alternatives arekin_minus_apbewhich is a delta learning approach to the kinetic energy obtained from the APBE kinetic energy functional andtotwhich means you are training on the total electronic energy.
Density Optimization
On a Dataset
To run density optimization on a dataset in our format, you can run the following command:
python mldft/ofdft/run_density_optimization run_path=<path_to_ml_model> \
n_molecules=<number_of_molecules> device=<device> initialization=<initialization> num_devices=<num_devices>
path_to_modelis the path of to the model relative toDFT_MODELSn_moleculesthe number of molecules which should be computeddevicethe device on which the computation should run, e.g.cuda,cpu, ...initializationthe initialization to use eithersad,minaoorhückel, thesadinitialization requires appropriate dataset statistics.
By default this will run on the validation set of the dataset the model was trained on, but you can overwrite split_file_path to use another split file and split to toggle between train, val and test splits of the dataset.
Results are plotted in the files density_optimization.pdf and density_optimization_summary.pdf.
On arbitrary molecules
If you want to run the density optimization on molecules in .xyz files which are not part of any dataset you
can do so with:
mldft example.xyz --model /path/to/some/model
# get all options:
mldft --help
--model needs to be the path to the model directory containing a hparams.yaml file as well as a checkpoints/ directory with a last.ckpt checkpoint.
This will only work if the model has been trained for all atom types present in the molecule. A logfile with the same base name as the .xyz file and the .log suffix will be created.
Additionally, if you have the required dataset statistics you can use the sad initialization, by default minao.
The result will be saved in a file with .pt ending with the same base name as your .xyz file.
Alternatively if you downloaded our models using our setup script you can also give our models by name:
mldft xyzfile --model str25_qm9 # or str25_qmugs
Additional Info
Build documentation
make docs
# or to build from scratch:
make docs-clean
Template
For more details about the template visit: https://github.com/ashleve/lightning-hydra-template
Third-party licenses
This code adapts code from the following third party libraries:
These are distributed under the MIT license which can be found in the license file.
Citation
If you use this repository in your research please cite following paper:
@article{Remme_Stable_and_Accurate_2025,
author = {Remme, Roman and Kaczun, Tobias and Ebert, Tim and Gehrig, Christof A. and Geng, Dominik and Gerhartz, Gerrit and Ickler, Marc K. and Klockow, Manuel V. and Lippmann, Peter and Schmidt, Johannes S. and Wagner, Simon and Dreuw, Andreas and Hamprecht, Fred A.},
doi = {10.1021/jacs.5c06219},
journal = {Journal of the American Chemical Society},
number = {32},
pages = {28851--28859},
title = {{Stable and Accurate Orbital-Free Density Functional Theory Powered by Machine Learning}},
url = {https://doi.org/10.1021/jacs.5c06219},
volume = {147},
year = {2025}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mldft-0.0.1.tar.gz.
File metadata
- Download URL: mldft-0.0.1.tar.gz
- Upload date:
- Size: 253.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c0d1bf3eb3f0470e9bfab57c4cc8bd6e482b23a440abe69bb8de634bb81a2989
|
|
| MD5 |
3bbf9eb76a28d0034b6345c504d46152
|
|
| BLAKE2b-256 |
19084e34ecbc437ec67d604544401318a36456a4344f6c4836edcbf38f2ada40
|
File details
Details for the file mldft-0.0.1-py3-none-any.whl.
File metadata
- Download URL: mldft-0.0.1-py3-none-any.whl
- Upload date:
- Size: 308.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
290bdfbac641480c68bdb88f88a2f64a408bb35e3f070403b31732f037de1c83
|
|
| MD5 |
ec389050237dcf3af923b926b924f958
|
|
| BLAKE2b-256 |
9d2783bb2f7562d2419617dd002f4c855129acd786a083d89ecca2e443bb36a6
|