Skip to main content

Python package for analyzing vacancy hopping mechanism

Project description

VacHopPy


VacHopPy is a Python package for analyzing vacancy hopping mechanisms based on Ab initio molecular dynamics (AIMD) simulations. A detailed explanation on VacHopPy framwork is available here.

Features

  • Tracking of vacancy trajectories in AIMD simulations
  • Extraction of effective hopping parameter set
  • Assessment of lattice stability or phase transitions

Effective hopping parameter set, a key improvement of VacHopPy, is a single, representative set of hopping parameters, which is determined by integrating all possible hopping paths in a given system considering energetic and geometric properties. Hence, the effective hopping parameters are suitable for multiscaling modeling, bridging the ab initio calculations and device-scale simulations (e.g., continuum models).

The list of effective hopping parameters, which can be obtained using VacHopPy is summarized below:

Symbol Description
D Diffusion coefficient (m²/s)
f Correlation factor
τ Residence time (ps)
a Hopping distance (Å)
Eₐ Hopping barrier (eV)
z Number of equivalent paths
ν Attempt frequency (THz)

Within the VacHopPy framework, the temperature dependencies of overall diffusion behavior, represented by D and τ, are simplified to an Arrehnius equation composed of the effective hopping parameters. Please see the original paper for a detailed description.

Contents

  • Installation
  • Requirement
  • List of commands
  • How to implement
    1. Preparation
    2. Vacancy trajectory visualization
    3. Extraction of effective hopping parameters
    4. Assessment of lattice stability
  • Reference
  • License

Installation

This package can be easily installed via pip.

pip install vachoppy

If you need parallelized calculations, download the mpi4py package:

pip install mpi4py

Requirements

VacHopPy requires Python 3.10 or higher and the following Python packages:

  • numpy
  • lxml
  • tqdm
  • colorama
  • matplotlib >= 3.10.0
  • scipy
  • tabulate
  • pymatgen >= 2024.6.10

Current version of VacHopPy was developed and tested using VASP 5.4.4 and LAMMPS (12 Jun 2025).

Available commands

VacHopPy provides a command-line interface (CLI). Belows are available CLI commands:

Option 1 Option 2 Use
-m
(main)
t Make an animation for vacancy trajectories
p Calculate effective hopping parameters (excluding z and ν)
pp Calculate z and ν (post-processing for -m p mode)
f Perform fingerprint analyses
-u
(utility)
extract_data Extract AIMD data (cond.json, pos.npy, force.npy)
combine_vasprun Combine two successive vasprun.xml
crop_vasprun Crop vasprun.xml
fingerprint Extract fingerprint
cosine_distance Calculate cosine distance

For detailed descriptions, please use -h flag:

vachoppy -h # list of available commands
vachoppy -m p -h # explanation for '-m p' mode

For time-consuming commands, vachoppy -m p and vachoppy -m f, parallelization is supported by mpirun. For parallelization, please specify --parallel flag:

vachoppy -m p 0.1 # serial
mpirun -np {num_nodes} vachoppy -m p 0.1 --parallel # parallel

Below is a summary of the main commands (only main modules are shown for clarity):

How to implement

Example files can be downloaded from:

1. Preparation

Download Example1 directory linked above.

Input data

To run VacHopPy, the user needs three types of input data: AIMD data, POSCAR_LATTICE, and neb.csv (optional). The current version of VacHopPy supports AIMD simulations performed under the NVT ensemble only.

(1) AIMD data

AIMD data can be extracted from the VASP or LAMMPS output files using the vachoppy -u extract_data command.

  • Extracting AIMD data from VASP

Navigate to the Example1/vasp directory and run:

vachoppy -u extract_data O vasprun.xml 

Here, the arguments are:

  • atom symbol = O
  • MD result = vasprun.xml

The vasprun.xml file contains the result of an AIMD simulation of a rutile TiO₂ system with two oxygen vacancies at 2100 K. The atom symbol specifies the type of atom to track, as well as the type of vacancy. When the atom symbol is set to O, this command extracts the positions and forces of oxygen atoms from the vasprun.xml file.

This command generates the following three output files:

  • cond.json
    cond.json contains simulation conditions, such as temperature, time, atom numbers, and lattice parameters.

  • pos.npy (numpy binary)
    pos.npy contains evolution in atomic positions during the simulation. The raw trajectories are refined with consideration of periodic boundary condition (PBC).

  • force.npy (numpy binary)
    force.npy contains force vectors acting on atoms.

To reduce the size of the input data, only atoms of the specified type (as defined by the atom symbol) are included in the output files.

  • Extract AIMD data from LAMMPS

Starting from version 2.0.0, VacHopPy supports integration with LAMMPS. To extract AIMD data from LAMMPS output files, use the -l flag:

Navigate to the Example1/lammps directory and run:

vachoppy -u extract_data O lammps.in -l

Here, the arguments are:

  • atom symbol = O
  • MD result = lammps.in

This command reads both the LAMMPS data file (e.g., coo.lammps) and the LAMMPS dump file (e.g., lammps.dump) specified in the lammps.in, and produces the same three output files: cond.json, pos.npy, and force.npy.

Using LAMMPS with machine learning potentials (MLPs) can greatly reduce the computational cost required to sample sufficient hopping events in MD simulations. Although LAMMPS is not an AIMD engine, it can still provide reliable MD trajectories when used with a well-trained MLP.

(2) POSCAR_LATTICE

This file contains the perfect crystal structure without vacnacies. Its lattice parameters must match those of input structure (e.g., POSCAR) of the MD simulations. This file is used to define the lattice points for vacancy identification.

(3) neb.csv (optional)

This file contains hopping barriers (Eₐ) for all vacancy hopping paths in the system. Below is an example of neb.csv (example system: rutile TiO2):

# neb.csv
A1,0.8698
A2,1.058
A3,1.766

Here, the first column corresponds to the path names, and the second column contains the Eₐ values. The user can obtain a list of possible vacancy hopping paths by running the vachoppy -m t or vachoppy -m p command. For the vachoppy -m t command (when used with the -v flag), hopping path information is saved to the trajectory.txt file. For the vachoppy -m p command, the information is saved to the parameter.txt file.

Note1: the neb.csv file is only required to extract the effective values for coordination number (z) and attempt frequency (ν) (by running the vachoppy -m pp command).

Note2: It is highly recommended to perform NEB calculations using a larger supercell than that used in MD simulations. In MD simulations, thermal fluctuations attenuate interactions with periodic images and provide a broader sampling of atomic configurations, which helps approximate the effects of a larger supercell.

(4) File organization

Since AIMD simulations typically cover timescales shorter than a nanosecond, a single AIMD simulation may contain a few hopping events. However, since VacHopPy computes the effective hopping parameters in static manner, sufficient sampling of hopping events is necessary to ensure reliablilty. To address this, VacHopPy processes multiple AIMD datasets simultaneously. Each AIMD dataset is distinguished by a label appended after an underscore in the file names (e.g., cond_{label}.json, pos_{label}.npy, force_{label}.npy). Below is an example of the recommended file structure:

{where VacHopPy is executed}
  traj # name (traj) is specified by -p1 flag
   traj.1900K # prefix (traj) is specifed by -p2 flag
    cond_01.json, pos_01.npy, force_03.npy  
    cond_02.json, pos_02.npy, force_03.npy  
    cond_02.json, pos_02.npy, force_04.npy  
   traj.2000K
    cond_01.json, pos_01.npy, force_03.npy  
    cond_02.json, pos_02.npy, force_03.npy  
    cond_02.json, pos_02.npy, force_04.npy  
   traj.2100K
     cond_01.json, pos_01.npy, force_03.npy  
     cond_02.json, pos_02.npy, force_03.npy  
     cond_02.json, pos_02.npy, force_04.npy  
   POSCAR_LATTICE
  neb.csv

The name of the outer directory is specified by the -p1 flag (default: traj), and the prefix of the inner directories is specified by the -p2 flag (default: traj). Each inner directory must contain AIMD datasets generated at the same temperature.

Hyperparameter: tinterval

To run VacHopPy, the user needs to determine one hyperparameter, tinterval, in advance. This parameter defines the time interval for averaging atomic positions and forces. Thermal fluctuations in AIMD simulations can make it difficult to precisely determine atomic occupancies. However, since these fluctuations are random, they can be effectively averaged out over time. VacHopPy processes AIMD data by dividing it into segments of length of tinterval. Each segment corresponds to one analysis step, representing the averaged structure over that time interval. The total number of steps is given by tsimulation/tinterval, where tsimulation is the total AIMD simulation time.

Choosing an appropriate tinterval is crucial for reliable analysis. The tinterval should be large enough to mitigate thermal fluctuations but short enough to prevent multiple hopping events from being included in a single step. A typical value is around 0.05-0.1 ps, through it may vary depending on the system.

One recommended approach for determining the optimal tinterval is through convergence tests using the correlation factor (f). Below is an example of a convergence test (example system: rutile TiO2):

The left and right figures show the convergences of f with respect to the number of AIMD datasets (Ncell) and tinterval, respectively, at each temperature. The results confirm that convergence is achieved at Ncell=20 and tinterval=0.07 ps.

2. Vacancy trajectory visualization

Download Example2 directory linked above.

Navigate to the Example2 directory and run:

 vachoppy -m t 0.07 2100 07 -v 

Here, the arguments are:

  • tinterval = 0.07 ps
  • temperature = 2100 K
  • label = 07

The -v flag (verbosity flag) enables the generation of the trajectory.txt file, which contains the identified vacancy hopping paths and hopping history. You can adjust the resolution of the animation using the --dpi flag (default: 300). The number and type of vacancies are automatically determined by comparing the POSCAR_LATTICE file with the AIMD datasets.

Output:

The trajectory animation is saved as traj.gif, with individual snapshots stored in the snapshot directory. Below is an example of traj.gif:

In this animation, the solid box represents the lattice (here, rutile TiO2 2×2×3 supercell), and the color-coded circles indicate the lattice points corresponding to the selected atom type (here, oxygen). The yellow-colored circles mark the vacancy positions, while other colors denote occupied lattice points. Atomic movements are depicted with arrows matching the color of the moving atoms.

Occasionally, spurious vacancies may appear; hence, the detected number of vacancies exceeds the initial count in the system. This scenario arise when two or more atoms are assigned to the same lattice site and are commonly associated with non-vacancy-hopping mechanism (e.g., kick-out mechanism). Because such spurious vacancies typically vanish within a few hundred femtoseconds, they are referred to as transient vacancies.

In this animation, the transient vacancies are represented as orange-colored circles. Below three successive snapshots shows the kick-out mechanism:

3. Extraction of effective hopping parameters

Navigate to the Example2 directory and run:

For serial computation:

vachoppy -m p 0.07

For parallel computation:

mpirun -np {num_nodes} vachoppy -m p 0.07 --parallel

Here, the arguments are:

  • tinterval = 0.07 ps

For serial computation, the process is displayed via a progress bar. For parallel computation, process is recorded in the VACHOPPY_PROGRESS file in real time. The number and type of vacancies are automatically determined by comparing the POSCAR_LATTICE file with the AIMD datasets.

Output:

All results are stored in the parameter.txt file, which includes:

  1. A list of vacancy hopping paths in the system

  2. Effective hopping parameters (excluding z and ν)

  3. Vacancy hopping history for each AIMD dataset

To print the vacancy hopping paths, use:

awk '/Vacancy hopping paths :/ {f=1} f; /^$/ {f=0}' parameter.txt

To print the effective hopping parameters, use:

awk '/Effective/ {f=1} f; /^$/ {f=0}' parameter.txt

Below is the expected output:

The names of hopping paths are automatically assigned in ascending order of hopping distance. Symmetrically distinct sites are labeled with different alphabets (e.g., A1, A2, … for site 1 and B1, B2, … for site 2, where site 1 and site 2 are symmetrically distinct).

The correlation factor (f) is inherently temperature-dependent, but an average value across all simulated temperatures is displayed. To print f values for each temperature, use:

awk '/Cumulative/ {f=1} f; /^$/ {f=0}' parameter.txt

Below is the expected output:

3-1. Extract effective values for z and ν (optional)

To obtain the effective values of z and ν, users must first perform NEB calculations for all vacancy hopping paths. The results of the NEB calculations should be stored in a file named neb.csv (see above).

Navigate to the Example2 directory and run:

vachoppy -m pp

This command reads parameter.txt and neb.csv files and outputs postprocess.txt which contains the complete set of the effective hopping parameters.

To print the effective hopping parameters, use:

awk '/hopping parameter/ {f=1} f; /^$/ {f=0}' postprocess.txt

Below is the expected output:

In the upper table, the effective hopping parameters averaged over all simulated temperatures are shown, while the temperature-dependent parameters are presented in the lower table.

Additionally, VacHopPy provides individual attempt frequencies for each hopping path. To print them, use

awk '/Jump attempt frequency/ {f=1} f; /^$/ {f=0}' postprocess.txt

Attempt frequencies are estimated based on statistical approach. Therefore, only the values for hopping paths with a sufficient number of hopping events can be considered reliable.

4. Assessment of lattice stability

Download and unzip the Example3 file linked above.

VacHopPy employs the fingerprint analysis proposed by Oganov et al. to assess lattice stability. The key quantities used in this analysis are the fingerprint vector (ψ) and the cosine distance (dcos). Detailed descriptions can be found in this paper.

Fingerprint vector (ψ)

To construct ψ, three parameters are required:

  1. Threshold radius (Rmax)
  2. Bin size (Δ)
  3. Standard deviation for Gaussian-smeared delta function (σ)

A well-defined ψ satisfies ψ(r=0) = -1 and converges to 0 as r → ∞. Therefore, the user needs to set these parameters appropriately to ensure these conditions are met. The ψ can be generated by using vachoppy -u fingerprint command:

Navigate to the Example3 directory and run:

vachoppy -u fingerprint POSCAR_MONO 20.0 0.04 0.04 -d 

Here, the arguments are:

  • Atomic structure = POSCAR_MONO (monoclinic HfO2)
  • Rmax = 20 Å
  • Δ =0.04 Å
  • σ = 0.04 Å

Using -d flag displays the resulting ψ in a pop-up window. Below is an example output (ψ for monoclinic HfO2):

To enhance robustness of ψ, VacHopPy considers all possible atom pairs (e.g., Hf-Hf, Hf-O, and O-O) and concatenates them to construct a single well-defined ψ.


Cosine distance (dcos)

Cosine distance (dcos(x)) quantifies structural similarity to a reference phase x, where a lower dcos(x) indicates a greater similarity. By analyzing variations in dcos(x) over time, users can assess lattice stability or explore phase transitions occurred in the AIMD simulations.

(1) Assessment of lattice stability

Navigate to the Example3 directory and run:

For serial computation:

vachoppy -m f 0.07 20 0.04 0.04 -v vasprun_1600K.xml -p POSCAR_MONO

For parallel computation:

mpirun -np {num_nodes} vachoppy -m f 0.07 20 0.04 0.04 -v vasprun_1600K.xml -p POSCAR_MONO --parallel

Here, the arguments are:

  • tinterval = 0.07 ps
  • Rmax = 20 Å
  • Δ = 0.04 Å
  • σ = 0.04 Å

The -v flag specifies vasprun.xml file (default: vasprun.xml), where vasprun_1600K.xml contains the AIMD trajectory at 1600 K. The -p flag specifies the reference phase (default: POSCAR_REF), where POSCAR_MONO contains monoclinic HfO2 lattice.

Results are stored in cosine_distance.txt and cosine_distance.png. To avoid overwriting, rename cosine_distance.txt to dcos_1600K_mono.txt.


Next, run:

For serial computation:

vachoppy -m f 0.07 20 0.04 0.04 -v vasprun_2200K.xml -p POSCAR_MONO

For parallel computation:

mpirun -np {num_nodes} vachoppy -m f 0.07 20 0.04 0.04 -v vasprun_2200K.xml -p POSCAR_MONO --parallel 

Here, vasprun_2200K.xml contains the AIMD data at 2200 K. Rename cosine_distance.txt to dcos_2200K_mono.txt.


For comparison, plot dcos_1600K_mono.txt and dcos_2200K_mono.txt together using plot.py:

# plot.py
import sys
import numpy as np
import matplotlib.pyplot as plt

data, num_data = [], len(sys.argv)-1
for i in range(num_data):
    data.append(np.loadtxt(sys.argv[i+1], skiprows=2))

plt.rcParams['figure.figsize'] = (6, 2.5)
plt.rcParams['font.size'] = 10

space = 0.09
for i, data_i in enumerate(data):
    data_i[:,1] -= np.average(data_i[:,1]) - space * i
    plt.scatter(data_i[:,0], data_i[:,1], s=10)
    
plt.yticks([])
plt.xlabel('Time (ps)', fontsize=12)
plt.ylabel(r'$d_{cos}$($x$)', fontsize=12)
plt.legend(loc='center right')
plt.show()
python plot.py dcos_1600K_mono.txt dcos_2200K_mono.txt

In this figure, the dcos data at each temperature is arranged vertically; hence, the absolute y-values are meaningless. Instead, the focus is on the relative change in dcos over time.

  • At 1600 K, dcos remains nearly constant, indicating structural stability.
  • At 2200 K, dcos exhibits substantial fluctuations near 20 ps, suggesting that the monoclinic lattice becomes unstable at high temperatures.

It is important to note that the lattice parameters were constrained to those of monoclinic lattice since the AIMD simulations were performed under NVT ensemble. As a result, any lattice distortion is not sustained but instead revert to the original lattice, producing peaks in the dcos trace.

In unstable lattices, such as monoclinic HfO2 at 2200 K, vacancies are poorly defined since atomic vibration centers may shift away from the original lattice point. Consequently, vacancy trajectory determination (vachoppy -m t) and effective hopping parameter extraction (vachoppy -m p) may lack accuracy.


(2) Exploring phase transition

By varying the reference phase, users can explore phase transitions occuring in AIMD simulations.

Navigate to the Example3 directory and run:

For serial computation:

vachoppy -m f 0.07 20 0.04 0.04 -v vasprun_2200K.xml -p POSCAR_TET

For parallel computation:

mpirun -np {num_nodes} vachoppy -m f 0.07 20 0.04 0.04 -v vasprun_2200K.xml -p POSCAR_TET --parallel 

Here, POSCAR_TET contains the atomic structure of tetragonal HfO2. To avoid overwriting, rename cosine_distance.txt to dcos_2200K_tet.txt.


Next, run:

For serial computation:

vachoppy -m f 0.07 20 0.04 0.04 -v vasprun_2200K.xml -p POSCAR_AO

For parallel computation:

mpirun -np {num_nodes} vachoppy -m f 0.07 20 0.04 0.04 -v vasprun_2200K.xml -p POSCAR_AO --parallel 

Here, POSCAR_AO contains the atomic structure of antipolar orthorhombic HfO2. Rename cosine_distance.txt to dcos_2200K_ao.txt.


To compare the results, run plot.py:

python plot.py dcos_2200K_mono.txt dcos_2200K_tet.txt dcos_2200K_ao.txt

As before, the dcos data is arranged vertically, so the absolute y-values are not meaningful. Instead, the focus is on the relative change in dcos over time.

  • As dcos(mono) increases,
  • dcos(tet) decreases,
  • while dcos(ao) remain nearly constant.

This result clearly suggests that the phase transition is directed toward the tetragonal phase.

References

If you used VacHopPy package, please cite this paper

If you used vachoppy -m f or vachoppy -u fingerprint commands, also cite this paper.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vachoppy-2.0.0.tar.gz (55.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vachoppy-2.0.0-py3-none-any.whl (51.1 kB view details)

Uploaded Python 3

File details

Details for the file vachoppy-2.0.0.tar.gz.

File metadata

  • Download URL: vachoppy-2.0.0.tar.gz
  • Upload date:
  • Size: 55.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for vachoppy-2.0.0.tar.gz
Algorithm Hash digest
SHA256 2db1632f9efe3b12f9e9bdb3e67f67ef906831e0dfce95d72b2ba6fd0fbd7c9a
MD5 5716dd506e8d6b59a15362d4bc84096e
BLAKE2b-256 0398892e2468cf28e5caa085ef1431861661bca66b4b2153a23542cb59e4bdd6

See more details on using hashes here.

File details

Details for the file vachoppy-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: vachoppy-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 51.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for vachoppy-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c2b5d795eb2e0648fa2fff5ee7150ea32e84d3f0b1c9fb9f07dc920f1f35ade8
MD5 5a2ac24ce109e40e875f08bc5fa667dd
BLAKE2b-256 dd4d6c6052c0d0d4cb73bab85a5f2036ab7e2133c9d8067616fa52ffbc4dc5b1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page