Atomistic simulation tools based on Gaussian Processes Regression

These details have not been verified by PyPI

Project links

Homepage

Project description

ænet-gpr

Efficient Data Augmentation for ANN Potential Training Using GPR Surrogate Models

aenet-gpr is a Python package that enables scalable and cost-efficient training of artificial neural network (ANN) potentials by leveraging Gaussian Process Regression (GPR) as a surrogate model.
It automates data augmentation to:

Reduce the number of expensive DFT calculations
Lower ANN training overhead particularly critical for complex and heterogeneous interface systems
Maintain high accuracy comparable to the demanding direct force training

📄 Reference:
In Won Yeu, Alexander Urban, Nongnuch Artrith et al., “Scalable Training of Neural Network Potentials for Complex Interfaces Through Data Augmentation”, npj Computational Materials 11, 156 (2025)

📬 Contact:

In Won Yeu (iy2185@columbia.edu)
Nongnuch Artrith (n.artrith@uu.nl)

🔁 Workflow Overview

Data Grouping
- Split the initial DFT database into homogeneous subsets (same composition and number of atoms)
Train
- Construct local GPR models using structure, energy, and atomic force data of each subset
Test
- Predict and evaluate target properties with the trained GPR models
Augment
- Perturb reference structures and generate new data
- Tag with GPR-predicted energies to expand the training dataset

✅ Outputs are saved in XCrysDen Structure Format (XSF), fully compatible with the ænet package for indirect force training (GPR-ANN).

🔑 Key Features

GPR-based prediction of energies and atomic forces with uncertainty estimates
Supports various descriptors including Cartesian and SOAP
Applicable to periodic and non-periodic systems
Batch-based kernel computation for speed and memory efficiency
Accepts multiple input formats (e.g., XSF, VASP OUTCAR, etc.)
Fully controlled through a single input file (train.in)
Compatible with various GPR applications such as GPR-NEB, GPR-ANN, and ASE-Calculator

📦 Installation

Requirements:

Python with PyTorch (to be installed separately, see below)
Other dependencies (numpy, ASE) are automatically installed when installing aenet-gpr

1. Install PyTorch

Refer to official guide and install compatible versions depending on availablity of GPU and CUDA:

With CUDA (optional for GPU support):

$ pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu118
CPU-only:

$ pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

2. Install ænet-gpr

Installation using pip

$ pip install aenet-gpr

📘 Tutorial

Find interactive notebooks *.ipynb in the ./tutorial/ folder, or run directly on Google Colab:

GPR tutorials for various systems

GPR applications for accelerating atomistic simulations

The ./example/ directory includes example input and output data files.

📂 Input Files

1. Structure–Energy–Force Data

By default, input data is provided in .xsf format.

Example: aenet XSF format (non-periodic)

The first comment line should specify total energy of a structure. Each line following the keyword ATOMS contains atomic symbol, three Cartesian coordinates, and the three components of atomic forces. The length, energy, and force units are Å, eV, and eV/Å.

# total energy =  -0.0970905812353288 eV

ATOMS
H    -0.91666666666667    0.00000000000000    0.00000000000000    0.32660398877491    0.00000000000000    0.00000000000000
H    0.91666666666667    0.00000000000000    0.00000000000000    -0.32660398877491    0.00000000000000    0.00000000000000

Example: aenet XSF format (periodic)

# total energy = -16688.9969866290994105 eV

CRYSTAL
PRIMVEC
 10.31700000000000 0.00000000000000 0.00000000000000
 0.00000000000000 10.31700000000000 0.00000000000000
 0.00000000000000 0.00000000000000 32.00000000000000
PRIMCOORD
 46 1
Li     -0.02691046000000     0.02680527000000     10.32468480000000     -0.01367780493112     -0.01466501222916     0.08701630310868
Li     -0.04431013000000     3.46713645000000     10.25290534000000     0.06865473174602     -0.00786890285541     0.15426435842600
Li     0.02355300000000     6.82569825000000     10.31803445000000     0.00877419275000     0.03943267659765     0.14805797440506
...

Other formats such as VASP OUTCAR (with a line of File_format vasp-out in train.in below) are also supported as long as they can be read through ASE.

2. Configuration File

Example: `train.in` (comments are provided to guide the keyword meanings)

# File path
Train_file ./example/3_Li-EC/train_set/file_*.xsf
Test_file ./example/3_Li-EC/test_set/file_*.xsf

# File format (default: xsf)
File_format xsf  # Other DFT output files, which can be read via ASE such as "vasp-out" "aims-output" "espresso-out", are also supported

# Uncertainty estimation (default: True)
Get_variance True  # False -> only energy and forces are evaluated without uncertainty estimate

# Descriptor (default: cartesian coordinates)
Descriptor cart  # cart or soap

# Kernel parameter (default: Squared exponential)
scale 0.4  # default: 0.4
weight 1.0  # default: 1.0

# Data process (default: batch, 25)
data_process batch  # batch (memory cost up, time cost down) or iterative (no-batch: memory down, time up)
batch_size 25

# Flags for xsf file writing (default: False)
Train_write False  # True -> xsf files for reference training set are written under "./train_xsf/" directory
Test_write False  # True -> xsf files for reference test set are written under "./test_xsf/" directory
Additional_write False  # True -> additional xsf files are written under "./additional_xsf/" directory; False -> Augmentation step is not executed

# Data augmentation parameter (default: 0.055, 25)
Disp_length 0.05
Num_copy 20  # [num_copy] multiples of reference training data are augmented

🚀 Usage Example

With the train.in file and datasets prepared, simply run:

$ python -m aenet_gpr ./train.in > train.out

The Train–Test–Augment steps will be executed sequentially. Augmented data will be saved in the ./additional_xsf/ directory.

🖥️ Running on an HPC system (SLURM)

To run aenet_gpr on an HPC cluster using SLURM, use a batch script like the following:

#!/bin/bash
#SBATCH --job-name=aenet-job
#SBATCH --nodes=1
#SBATCH --tasks-per-node=8
#SBATCH --cpus-per-task=4
#SBATCH --time=1:00:00

module load anaconda3
source activate aenet-env

ulimit -s unlimited
python -m aenet_gpr ./train.in > train.out

⚙️ Tuning Tips

1. Accuracy – Descriptor and Kernel Scale Parameter

Descriptor: Cartesian, SOAP, and others supported by DScribe
Default kernel: Squared Exponential (sqexp)
Kernel parameters: scale and weight

Following figure shows energy prediction errors of the ./example/3_Li-EC/ example with different kernel parameters and descriptors.

When using the Cartesian descriptor (gray circles), the error decreases as the scale parameter increases, and it converges at scale = 3.0. When using the periodic SOAP descriptor (for details, see DScribe documentation), the error is significantly reduced by one order of magnitude compared to the Cartesian descriptor.

As demonstrated in the examples for the ./example/2_EC-EC/ (results available in the example directory), non-periodic systems can be well-represented using non-periodic Cartesian descriptors, while periodic systems are expected to yield better accuracy when using periodic SOAP descriptors.

For the example of SOAP descriptor here, eight uniformly distributed points in the Li slab Rectangular cuboid were used as centers argument for SOAP.

The corresponding train.in input arguments are

Descriptor soap
soap_r_cut 5.0
soap_n_max 6
soap_l_max 4
soap_centers [[2.20113706670393, 2.328998192856251, 6.952547732109352], [2.20113706670393, 2.328998192856251, 11.895790642109352], [2.20113706670393, 6.760484232856251, 6.952547732109352], [2.20113706670393, 6.760484232856251, 11.895790642109352], [6.63924050670393, 2.328998192856251, 6.952547732109352], [6.63924050670393, 2.328998192856251, 11.895790642109352], [6.63924050670393, 6.760484232856251, 6.952547732109352], [6.63924050670393, 6.760484232856251, 11.895790642109352]]
soap_n_jobs 4  
  
scale 2.0  
weight 1.0

2. Efficiency – Data Processing Mode

data_process iterative: Computing kernels data-by-data involves n_data × n_data sequential kernel evaluations, minimizing the memory overhead but significantly increasing computational time.
data_process batch: aenet-gpr supports batch processing by grouping the data process into a specific size (batch_size 25), which significantly reduces train and evaluation time while keeping memory usage efficient.

Below, we provide a benchmark comparing the required time and memory for different batch sizes on the ./example/3_Li-EC/ example.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

3.7.9

Dec 24, 2025

3.7.8

Dec 24, 2025

3.7.7

Dec 17, 2025

3.7.6

Dec 13, 2025

3.7.5

Dec 13, 2025

3.7.4

Dec 12, 2025

3.7.3

Dec 12, 2025

3.7.2

Dec 11, 2025

3.7.1

Dec 11, 2025

3.7.0

Dec 10, 2025

3.6.9

Dec 10, 2025

3.6.8

Dec 10, 2025

3.6.7

Dec 9, 2025

3.6.6

Dec 9, 2025

3.6.5

Dec 5, 2025

3.6.4

Dec 5, 2025

3.6.3

Dec 3, 2025

3.6.2

Dec 3, 2025

3.6.1

Dec 3, 2025

3.6.0

Dec 3, 2025

3.5.9

Dec 3, 2025

3.5.8

Dec 3, 2025

3.5.7

Dec 2, 2025

3.5.6

Nov 30, 2025

3.5.5

Nov 28, 2025

3.5.4

Nov 25, 2025

3.5.3

Nov 25, 2025

3.5.2

Nov 24, 2025

3.5.1

Nov 24, 2025

3.5.0

Nov 22, 2025

3.4.9

Nov 21, 2025

3.4.7

Nov 20, 2025

3.4.6

Nov 18, 2025

3.4.5

Nov 18, 2025

3.4.4

Nov 16, 2025

3.4.3

Nov 16, 2025

3.4.2

Nov 16, 2025

3.4.1

Nov 15, 2025

3.4.0

Nov 14, 2025

3.3.9

Nov 13, 2025

3.3.8

Nov 13, 2025

3.3.7

Nov 13, 2025

3.3.6

Nov 12, 2025

3.3.5

Nov 12, 2025

3.3.4

Nov 12, 2025

3.3.3

Nov 5, 2025

3.3.2

Nov 5, 2025

3.3.1

Oct 31, 2025

3.3.0

Oct 31, 2025

3.2.9

Oct 30, 2025

3.2.8

Oct 30, 2025

3.2.7

Oct 30, 2025

3.2.6

Oct 30, 2025

3.2.5

Oct 30, 2025

3.2.4

Oct 30, 2025

3.2.3

Oct 30, 2025

3.2.2

Oct 30, 2025

3.2.1

Oct 30, 2025

3.2.0

Oct 29, 2025

3.1.9

Oct 29, 2025

3.1.8

Oct 29, 2025

3.1.7

Oct 29, 2025

3.1.6

Oct 28, 2025

3.1.5

Oct 28, 2025

3.1.4

Oct 28, 2025

3.1.3

Oct 28, 2025

3.1.2

Oct 28, 2025

3.1.1

Oct 28, 2025

3.1.0

Oct 28, 2025

This version

3.0.9

Oct 28, 2025

3.0.8

Oct 28, 2025

3.0.7

Oct 28, 2025

3.0.6

Oct 28, 2025

3.0.5

Oct 28, 2025

3.0.4

Oct 27, 2025

3.0.3

Oct 27, 2025

3.0.2

Oct 27, 2025

3.0.1

Oct 27, 2025

3.0.0

Oct 27, 2025

2.9.9

Oct 27, 2025

2.9.8

Oct 27, 2025

2.9.7

Oct 26, 2025

2.9.6

Oct 26, 2025

2.9.5

Oct 26, 2025

2.9.4

Oct 26, 2025

2.9.3

Oct 26, 2025

2.9.2

Oct 26, 2025

2.9.1

Oct 26, 2025

2.9.0

Oct 26, 2025

2.8.9

Oct 25, 2025

2.8.8

Oct 25, 2025

2.8.7

Oct 24, 2025

2.8.6

Oct 24, 2025

2.8.5

Oct 24, 2025

2.8.4

Oct 24, 2025

2.8.3

Oct 24, 2025

2.8.2

Oct 24, 2025

2.8.1

Oct 24, 2025

2.8.0

Oct 24, 2025

2.7.9

Oct 24, 2025

2.7.8

Oct 24, 2025

2.7.7

Oct 23, 2025

2.7.5

Oct 23, 2025

2.7.4

Oct 23, 2025

2.7.3

Oct 23, 2025

2.7.2

Oct 20, 2025

2.7.1

Oct 20, 2025

2.7.0

Oct 20, 2025

2.6.9

Oct 20, 2025

2.6.7

Oct 20, 2025

2.6.6

Oct 19, 2025

2.6.5

Oct 19, 2025

2.6.4

Oct 19, 2025

2.6.3

Oct 19, 2025

2.6.2

Oct 19, 2025

2.6.1

Oct 19, 2025

2.6.0

Oct 19, 2025

2.5.8

Oct 19, 2025

2.5.7

Oct 19, 2025

2.5.6

Oct 19, 2025

2.5.5

Oct 19, 2025

2.5.4

Oct 19, 2025

2.5.3

Oct 19, 2025

2.5.2

Oct 19, 2025

2.5.1

Oct 19, 2025

2.5.0

Oct 19, 2025

2.4.9

Oct 19, 2025

2.4.8

Oct 18, 2025

2.4.7

Oct 18, 2025

2.4.6

Oct 18, 2025

2.4.5

Oct 18, 2025

2.4.4

Oct 18, 2025

2.4.3

Oct 18, 2025

2.4.2

Oct 18, 2025

2.4.1

Oct 18, 2025

2.4.0

Oct 18, 2025

2.3.4

Oct 17, 2025

2.3.3

Oct 17, 2025

2.3.2

Oct 17, 2025

2.3.1

Oct 16, 2025

2.3.0

Oct 15, 2025

2.2.7

Oct 7, 2025

2.2.6

Oct 1, 2025

2.2.5

Oct 1, 2025

2.2.4

Sep 30, 2025

2.2.3

Sep 29, 2025

2.2.2

Sep 29, 2025

2.2.1

Sep 29, 2025

2.2.0

Sep 29, 2025

2.1.27

Sep 29, 2025

2.1.26

Sep 28, 2025

2.1.25

Sep 28, 2025

2.1.24

Sep 28, 2025

2.1.23

Sep 28, 2025

2.1.22

Sep 28, 2025

2.1.21

Sep 28, 2025

2.1.20

Sep 27, 2025

2.1.19

Sep 27, 2025

2.1.18

Sep 26, 2025

2.1.17

Sep 26, 2025

2.1.16

Sep 26, 2025

2.1.15

Sep 26, 2025

2.1.14

Sep 26, 2025

2.1.13

Sep 26, 2025

2.1.12

Sep 26, 2025

2.1.11

Sep 26, 2025

2.1.10

Sep 26, 2025

2.1.9

Sep 26, 2025

2.1.8

Sep 26, 2025

2.1.7

Sep 26, 2025

2.1.6

Sep 18, 2025

2.1.5

Sep 18, 2025

2.1.4

Sep 18, 2025

2.1.3

Sep 18, 2025

2.1.2

Sep 18, 2025

2.1.1

Sep 18, 2025

2.1.0

Sep 3, 2025

2.0.9

Sep 2, 2025

2.0.8

Sep 2, 2025

2.0.7

Sep 2, 2025

2.0.6

Sep 2, 2025

2.0.5

Sep 2, 2025

2.0.4

Aug 29, 2025

2.0.3

Aug 29, 2025

2.0.2

Aug 29, 2025

2.0.1

Aug 29, 2025

2.0.0

Aug 29, 2025

1.9.9

Aug 29, 2025

1.9.8

Aug 29, 2025

1.9.7

Aug 29, 2025

1.9.6

Aug 29, 2025

1.9.5

Aug 29, 2025

1.9.4

Aug 29, 2025

1.9.3

Aug 29, 2025

1.9.2

Aug 15, 2025

1.9.1

Aug 15, 2025

1.9.0

Aug 15, 2025

1.8.9

Aug 15, 2025

1.8.8

Aug 2, 2025

1.8.7

Aug 2, 2025

1.8.6

Aug 2, 2025

1.8.5

Aug 2, 2025

1.8.4

Aug 2, 2025

1.8.3

Aug 1, 2025

1.8.2

Aug 1, 2025

1.8.0

Jul 21, 2025

1.7.9

Jul 21, 2025

1.7.8

Jul 21, 2025

1.7.6

Jul 17, 2025

1.7.5

Jul 17, 2025

1.7.4

Jul 16, 2025

1.7.3

Jul 15, 2025

1.7.2

Jul 13, 2025

1.7.1

Jul 13, 2025

1.7.0

Jul 11, 2025

1.6.9

Jul 6, 2025

1.6.8

Jul 6, 2025

1.6.7

Jul 6, 2025

1.6.5

Jul 6, 2025

1.6.3

Jul 5, 2025

1.6.2

Jul 5, 2025

1.6.1

Jul 5, 2025

1.6.0

Jul 5, 2025

1.5.5

Jul 5, 2025

1.5.3

Jul 4, 2025

1.5.2

Jul 4, 2025

1.5.1

Jul 4, 2025

1.5.0

Jul 3, 2025

1.4.9

Jul 3, 2025

1.4.8

Jul 2, 2025

1.4.7

Jul 1, 2025

1.4.6

Jun 29, 2025

1.4.5

Jun 20, 2025

1.4.4

Jun 19, 2025

1.4.3

Jun 19, 2025

1.4.2

Jun 19, 2025

1.4.1

Jun 19, 2025

1.4.0

Jun 19, 2025

1.3.0

May 26, 2025

1.2.9

May 26, 2025

1.2.8

May 22, 2025

1.2.7

May 18, 2025

1.2.6

May 18, 2025

1.2.5

May 18, 2025

1.2.4

May 18, 2025

1.2.3

May 16, 2025

1.2.2

May 16, 2025

1.2.1

May 16, 2025

1.2.0

May 16, 2025

1.1.9

May 16, 2025

1.1.7

May 16, 2025

1.1.6

May 16, 2025

1.1.5

May 16, 2025

1.1.4

May 15, 2025

1.1.3

May 15, 2025

1.1.2

May 15, 2025

1.1.1

May 15, 2025

1.1.0

May 15, 2025

1.0.1

May 15, 2025

1.0.0

May 15, 2025

0.0.2

May 15, 2025

0.0.1

May 15, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

aenet_gpr-3.0.9.tar.gz (70.9 kB view details)

Uploaded Oct 28, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

aenet_gpr-3.0.9-py3-none-any.whl (92.1 kB view details)

Uploaded Oct 28, 2025 Python 3

File details

Details for the file aenet_gpr-3.0.9.tar.gz.

File metadata

Download URL: aenet_gpr-3.0.9.tar.gz
Upload date: Oct 28, 2025
Size: 70.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for aenet_gpr-3.0.9.tar.gz
Algorithm	Hash digest
SHA256	`7b51c701d1752bdeedcf4727485a7debe416a8444a1f60315fccced4e037204c`
MD5	`b5178953532e3be30a5b2f96e2e2b69a`
BLAKE2b-256	`3859b1cf0cf34a4005a8bc24b559d787aa91738ffdaf4c3372eebda222552c80`

See more details on using hashes here.

File details

Details for the file aenet_gpr-3.0.9-py3-none-any.whl.

File metadata

Download URL: aenet_gpr-3.0.9-py3-none-any.whl
Upload date: Oct 28, 2025
Size: 92.1 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for aenet_gpr-3.0.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2544979b04717e9db39e8d61ffcc5bf7d992b794ce51229b67b85cd63d794add`
MD5	`5e95a8e39d922758509214c4c9b943d6`
BLAKE2b-256	`09db0ca1428ace56dc835f8f94eca8a802cd05435b57d7f6d16df5b54829ccf3`

See more details on using hashes here.

aenet-gpr 3.0.9

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

ænet-gpr

🔁 Workflow Overview

🔑 Key Features

📦 Installation

1. Install PyTorch

2. Install ænet-gpr

📘 Tutorial

GPR tutorials for various systems

GPR applications for accelerating atomistic simulations

📂 Input Files

1. Structure–Energy–Force Data

Example: aenet XSF format (non-periodic)

Example: aenet XSF format (periodic)

2. Configuration File

Example: train.in (comments are provided to guide the keyword meanings)

🚀 Usage Example

🖥️ Running on an HPC system (SLURM)

⚙️ Tuning Tips

1. Accuracy – Descriptor and Kernel Scale Parameter

2. Efficiency – Data Processing Mode

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Example: `train.in` (comments are provided to guide the keyword meanings)