Skip to main content

Add your description here

Project description

augment-atoms

Test PyPI GitHub last commit License

augment-atoms is a tool for augmenting datasets of atomic configurations via a model-driven, GPU-accelerated, rattle-relax-repeat procedure.

For each structure in the starting dataset, augment-atoms uses the provided potential energy surface (PES) model to generate a "family tree" of new structures. In the beginning, the tree consists of the single starting structure. To generate a new "child" structure, augment-atoms:

  1. selects a "parent" structure from the tree,
  2. rattles the atomic positions and unit cell,
  3. relaxes using the PES model to get a new structure,
  4. labels the child structure with the PES model, and
  5. inserts the child structure into the tree.

For precise details of each of these steps, see the Details section below.

Installation

pip install augment-atoms

This will install the augment-atoms command line tool (see pyproject.toml for the dependencies, requires Python 3.9+). Using uv is recommended, and will install augment-atoms with the correct dependencies in under 20 seconds starting from scratch.

There are no specific hardware requirements for augment-atoms. If a GPU is available, and the PES model supports it, the GPU will be used to accelerate structure generation. augment-atoms has been tested on both Linux and macOS.

Usage

augment-atoms config.yaml

where config.yaml is a YAML file containing the following:

data:
  # an ase-readable file containing the starting structures
  input: input.xyz 

  # an ase-writeable path to append the new structures to
  output: output.xyz

config:
  # number of augmentations per starting structure
  n_per_structure: 10
  
  # the temperature
  T: 300  # units are Kelvin

  # the explore-vs-exploit trade-off (see below)
  beta: 0.5

  # the range of values from which to sample a 
  # standard deviation to rattle with at each step
  sigma_range: [0.01, 0.1]  # units are Å

  # the random seed to use (for reproducibility)
  seed: 42

  # the standard deviation of the cell perturbation
  # if null, no cell perturbation is applied
  cell_sigma: null  # units are Å
  
  # the units of the energies generated by the PES model
  units: eV

  # the maximum force magnitude to relax to
  max_force: 30  # units are (energy / Å)

  # the minimum separation between atoms to consider
  min_separation: 0.5  # units are Å

  # the maximum number of relaxations to perform per iteration
  max_relax_steps: 20

  # the threshold for considering a structure too similar to the existing pool
  similarity_threshold: 0.1  # units are Å

model:
  # the calculator to use to generate the PES model
  calculator: +lennard_jones()

In-built options for the calculator are:

  • a Lennard-Jones calculator:
model:
  calculator: +lennard_jones()
  • any model from the graph-pes package. If a GPU is available, it will be used to accelerate the PES model.
model:
  calculator:
    +graph_pes_calculator:
      path: path/to/model.pt

Alternatively, you are free to point to any instance of an ase.Calculator object. If you have my_function in my_file.py that returns an ase.Calculator object, you can use it as follows:

model:
  calculator: +my_file.my_function()

Details

1. Selecting a parent structure

To choose a new parent structure, we randomly sample from all structures in the tree, such that atom $i$ in structure $i$ has a probability of being picked given by

$$\mathbb{P}_i = \beta \cdot \frac{e^{-E_i / kT}}{\sum_j e^{-E_j / kT}} + (1-\beta) \cdot \frac{G_i}{\sum_j G_j}$$

where $E_i$ is the energy of structure $i$ and $G_i \in \mathbb{Z}^+$ is the `generation' of the structure, $k$ is the Boltzmann constant, $T$ is the temperature and $\beta \in [0, 1]$. Small values of $\beta$ favour the sampling of "younger" structures in the family tree, and hence a greater degree of exploration. Large values of $\beta$ favour the sampling of lower energy structures, and hence a denser sampling of the PES around energy minima.

2. Rattling the atomic positions and unit cell

To create a "child" from this parent structure, we perform the following transformation:

$$\begin{aligned} R^\prime &\leftarrow [(A + I) \times R] + B \ C^\prime &\leftarrow (A + I) \times C_0 \end{aligned}$$

where

  • $R$ are the atomic positions
  • $C_0$ is the unit cell of the original seed structure
  • $A \in \mathbb{R}^{3\times 3}$ has entries sampled from $\mathcal{N}(0, \sigma_{A})$ where $\sigma_{A} \in \rm{sigma \_ range}$
  • $B \in \mathbb{R}^{N \times 3}$ has entries sampled from $\mathcal{N}(0, \sigma_{B})$ where $\sigma_{B} \in [0, \rm{cell \_ sigma}]$

In the case of isolated structures, we only rattle the positions (i.e. $A = 0^{3 \times 3}$).

3. Relaxing the rattled child structure

To relax the rattled child structure, we use energies and forces generated by the PES model using a scheme inspired by the Robbins-Monro algorithm.

Step $x$ of this relaxation involves updating the atomic positions according to:

$$R^\prime \leftarrow R + \frac{\sigma_B}{x} \cdot \frac{F}{||F||}$$

where $F/||F||$ are the normalised unit vectors corresponding to the direction of each atomic force. We perform up to $M$ relaxations steps, but stop early with probability $\min(0.25, e^{-\Delta E / kT})$ providing the maximum force magnitude is less than config.max_force and where $\Delta E$ is the energy difference between the relaxed child and its starting parent structure. We reject all final structures that have any pair of atoms closer than config.min_separation Å.

Demo

This demo uses structures and a model taken from this repo's sister repository, found here.

We include a stand-alone demo usage in the demo directory. This takes 3 water structures as input and uses a PaiNN model to generate and label 27 new structures, for a total of 30 structures.

The demo directory has the following files:

  • input.xyz contains 3 starting water structures
  • config.yaml contains the configuration for the demo
  • model.pt is a PaiNN model trained on water structures from ...
  • output.xyz is the augmented dataset output.

To run this demo yourself:

# clone the repository
git clone https://github.com/jla-gardner/augment-atoms.git
cd augment-atoms/demo
# remove the output file if it exists
rm -rf output.xyz
# run the demo
augment-atoms config.yaml

This entire script took under 10 seconds on my M1 MacBook Pro.

Citation

If you use augment-atoms in your research, please cite the following pre-print:

@misc{Gardner-25-06,
  title = {Distillation of Atomistic Foundation Models across Architectures and Chemical Domains},
  author = {Gardner, John L. A. and du Toit, Daniel F. Thomas and Mahmoud, Chiheb Ben and Beaulieu, Zo{\'e} Faure and Juraskova, Veronika and Pa{\c s}ca, Laura-Bianca and Rosset, Louise A. M. and Duarte, Fernanda and Martelli, Fausto and Pickard, Chris J. and Deringer, Volker L.},
  year = {2025},
  number = {arXiv:2506.10956},
  doi = {10.48550/arXiv.2506.10956},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

augment_atoms-0.2.0.tar.gz (432.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

augment_atoms-0.2.0-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file augment_atoms-0.2.0.tar.gz.

File metadata

  • Download URL: augment_atoms-0.2.0.tar.gz
  • Upload date:
  • Size: 432.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for augment_atoms-0.2.0.tar.gz
Algorithm Hash digest
SHA256 b9c2d7a67aaa233e55c111abe593af91f9cf2d333f7cf4f3531e73e4b295a816
MD5 1c27a4002ce08588bc43d8e95db1a88b
BLAKE2b-256 e7de1d760d24a203eb1abe93bddd609f42a04a41a14b032a704ac9aca2f15d29

See more details on using hashes here.

File details

Details for the file augment_atoms-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: augment_atoms-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 8.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.11 {"installer":{"name":"uv","version":"0.9.11"},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for augment_atoms-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6b042c348551217e0c989decd3dd4e8ff87ecf8d0b2ad0de6a7361ec42768ab4
MD5 d502c7d8e9186dfb8f2039bd8f0375b3
BLAKE2b-256 cce97fc412dc49ab603448b16babb3c400b2046af78d1ad25e0c490e6fcff90c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page