Skip to main content

OF-DFT using machine learning

Project description

SCIAI-DFT

python pytorch lightning hydra black isort

figure1

This repository contains the code used in our publication Stable and Accurate Orbital-Free Density Functional Theory Powered by Machine Learning. Using equivariant graph neural networks we enable Orbital-Free Density Functional Theory calculations by learning the kinetic energy functional from data.

Contributors

Installation

UV (recommended)

For installation with CUDA run

uv sync

inside this directory. For the cpu version run

uv sync --group pyg-cpu --no-group pyg

PyPI

Please note that the PyPI package does not support data generation or training, just inference! If you want to train your own model please clone the github project.

# install prerequisites
pip install torch-scatter torch-sparse torch-cluster --find-links https://data.pyg.org/whl/torch-2.4.1+cu124.html
# Install tensorframes
pip install git+https://github.com/sciai-lab/tensor_frames.git@cd1addfd3c82a47095c9961ab999dcabfab4c21d#
# install mldft
pip install mldft

Alternatively with uv in one go:

uv pip install mldft torch-scatter torch-sparse torch-cluster --find-links https://data.pyg.org/whl/torch-2.4.1+cu124.html git+https://github.com/sciai-lab/tensor_frames.git@cd1addfd3c82a47095c9961ab999dcabfab4c21d#

Install using Conda/Mamba/Micromamba

For conda or mamba replace micromamba with conda or mamba below. If you want to create the environment with CPU support only, you can replace environment.yaml with environment_cpu.yaml.

micromamba env create -f environment.yaml  # create mamba environment
micromamba activate mldft                  # activate environment
pip install -e .                           # install as an editable package
pip install -e tensorframes                # install tensorframes

Install using Pip

pip install -r requirements.txt -e .       # install requirements and package
pip install -e tensorframes                # install tensorframes

Setup using script

To actually run an OFDFT calculation using an ML model you now need to get an ML model and setup some environment variables. To make this easier we supply a little setup script that you can call with

mldft_setup

It will ask you where to place the datasets and ml models, defaults are $HOME/dft_data$ and $HOME/dft_models and also export them as the respective environment variables DFT_DATA and DFT_MODELS. Then it will offer you to download our two models from our repo on hugging face (note that you require the hugging face python package that you can install with pip install huggingface_hub). It will furthermore offer you to download some dataset statistics which will be required if you want to use the SAD guess during density optimization with the cli.

Environment variables

Before running the code you need to set the two following environment variables DFT_DATA, the path where the data should be stored and DFT_MODELS which is the path where the training runs including model checkpoints, logs and tensorboard files should be stored. You can set them in your .bashrc or .zshrc

export DFT_DATA="/path/to/data"
export DFT_MODELS="/path/to/models"

Usage

This is a general usage manual. To reproduce results from our paper see REPLICATION_GUIDE.md.

We use hydra to manage configurations. The main configuration files are located in configs/.

Data generation

Our datasets used are available at dryad.

  1. (Optional) Create your own dataset class or use the MISC dataset and provide xyz files to set which molecules should be generated.

  2. (Optional) Create a config file in configs/datagen/dataset/

  3. Run Kohn-Sham DFT on the dataset and create .chk files in $DFT_DATA/dataset/kohn_sham:

    Example: mldft_ks dataset=<your_dataset_config_name> n_molecules=1000 start_idx=0

  4. Based on the Kohn-Sham result, do density fitting, compute energy and gradients and save as labels for the machine learning model in $DFT_DATA/dataset/labels:

    Example: mldft_labelgen dataset=<your_dataset_config_name> n_molecules=-1

  5. Split the file into train, validation and test dataset usingmldft/utils/create_dataset_splits.py.

    Example: python mldft/utils/create_dataset_splits.py <dataset_name>

  6. Create a train data config in configs/ml/data to link to the dataset, important are dataset_name and the right setting of atom types in the dataset.

  7. Transform into a basis (to reduce dataloading computations during training), for Graphformer models use local_frames_global_natrep.

    Example: python mldft/datagen/transform_dataset.py data=<your_train_data_config_name> data/transforms=local_frames_global_natrep

  8. Compute dataset statistics, important is to compute them for the transformation and the energy target that you want to use.

    Example: python mldft/ml/compute_dataset_statistics.py data=<your_dataset_config_name>

Now you can start Training

Training

Training can be run with:

python mldft/ml/train.py data=<train_data_config> model=<model_config>

Two important settings are

  • data/transforms: This determines whether the data has been pre-transformed. The default is local_frames_global_natrep which means that both local frames and global natural reparametrization has been applied.
  • data.target_key: The target you are training. The default is kin_plus_xc which means you train on the total kinetic energy and exchange-correlation energy and their gradient. Alternatives are kin_minus_apbe which is a delta learning approach to the kinetic energy obtained from the APBE kinetic energy functional and tot which means you are training on the total electronic energy.

Density Optimization

On a Dataset

To run density optimization on a dataset in our format, you can run the following command:

python mldft/ofdft/run_density_optimization run_path=<path_to_ml_model> \
    n_molecules=<number_of_molecules> device=<device> initialization=<initialization> num_devices=<num_devices>
  • path_to_model is the path of to the model relative to DFT_MODELS
  • n_molecules the number of molecules which should be computed
  • device the device on which the computation should run, e.g. cuda, cpu, ...
  • initialization the initialization to use either sad, minao or hückel, the sad initialization requires appropriate dataset statistics.

By default this will run on the validation set of the dataset the model was trained on, but you can overwrite split_file_path to use another split file and split to toggle between train, val and test splits of the dataset. Results are plotted in the files density_optimization.pdf and density_optimization_summary.pdf.

On arbitrary molecules

If you want to run the density optimization on molecules in .xyz files which are not part of any dataset you can do so with:

mldft example.xyz --model /path/to/some/model
# get all options:
mldft --help

--model needs to be the path to the model directory containing a hparams.yaml file as well as a checkpoints/ directory with a last.ckpt checkpoint. This will only work if the model has been trained for all atom types present in the molecule. A logfile with the same base name as the .xyz file and the .log suffix will be created. Additionally, if you have the required dataset statistics you can use the sad initialization, by default minao. The result will be saved in a file with .pt ending with the same base name as your .xyz file.

Alternatively if you downloaded our models using our setup script you can also give our models by name:

mldft xyzfile --model  str25_qm9 # or str25_qmugs

Additional Info

Build documentation

make docs
# or to build from scratch:
make docs-clean

Template

For more details about the template visit: https://github.com/ashleve/lightning-hydra-template

Third-party licenses

This code adapts code from the following third party libraries:

These are distributed under the MIT license which can be found in the license file.

Citation

If you use this repository in your research please cite following paper:

@article{Remme_Stable_and_Accurate_2025,
author = {Remme, Roman and Kaczun, Tobias and Ebert, Tim and Gehrig, Christof A. and Geng, Dominik and Gerhartz, Gerrit and Ickler, Marc K. and Klockow, Manuel V. and Lippmann, Peter and Schmidt, Johannes S. and Wagner, Simon and Dreuw, Andreas and Hamprecht, Fred A.},
doi = {10.1021/jacs.5c06219},
journal = {Journal of the American Chemical Society},
number = {32},
pages = {28851--28859},
title = {{Stable and Accurate Orbital-Free Density Functional Theory Powered by Machine Learning}},
url = {https://doi.org/10.1021/jacs.5c06219},
volume = {147},
year = {2025}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mldft-0.0.1.tar.gz (253.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mldft-0.0.1-py3-none-any.whl (308.7 kB view details)

Uploaded Python 3

File details

Details for the file mldft-0.0.1.tar.gz.

File metadata

  • Download URL: mldft-0.0.1.tar.gz
  • Upload date:
  • Size: 253.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.0

File hashes

Hashes for mldft-0.0.1.tar.gz
Algorithm Hash digest
SHA256 c0d1bf3eb3f0470e9bfab57c4cc8bd6e482b23a440abe69bb8de634bb81a2989
MD5 3bbf9eb76a28d0034b6345c504d46152
BLAKE2b-256 19084e34ecbc437ec67d604544401318a36456a4344f6c4836edcbf38f2ada40

See more details on using hashes here.

File details

Details for the file mldft-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: mldft-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 308.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.0

File hashes

Hashes for mldft-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 290bdfbac641480c68bdb88f88a2f64a408bb35e3f070403b31732f037de1c83
MD5 ec389050237dcf3af923b926b924f958
BLAKE2b-256 9d2783bb2f7562d2419617dd002f4c855129acd786a083d89ecca2e443bb36a6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page