Skip to main content

LiTraj: Li-ion migration dataset

Project description

toc

LiTraj: A Dataset for Benchmarking Machine Learning Models for Predicting Lithium Ion Migration

License

This repository contains links to datasets for benchmarking machine learning models for predicting Li-ion migration, as described in our paper "Benchmarking machine learning models for predicting lithium ion migration", along with Python utilities for handling the datasets.

Contents

About

Modern electrochemical devices like Li-ion batteries require materials with fast ion transport. While traditional quantum chemistry methods for predicting ion mobility are computationally intensive, machine learning offers a faster alternative but depends on high-quality data. We introduce the LiTraj dataset, which provides 13,000 percolation barriers, 122,000 migration barriers, and 1,700 migration trajectories — key metrics for evaluating Li-ion mobility in crystals.

The datasets are calculated using density functional theory (DFT) and bond valence site energy (BVSE) methods. For the calculations, we used crystal structures collected from the Materials Project database (see license). The data is stored in the extended .xyz format, which can be read by the Atomistic Simulation Environment (ASE) Python library.

Available datasets and source links

We provide training, validation, and test data splits for each dataset. See the Python Tools and Datasets in Detail sections to learn how to work with the datasets.

Dataset Theory level Specs File Size
nebDFT2k DFT(PBE) target: migration barrier, geometry
# of samples: 1,681
zip (65 MB) 0.7 GB
nebDFT2k_U DFT(PBE+U) target: migration barrier, geometry
# of samples: in progress
in progress
MPLiTrj DFT(PBE) target: energy, forces, stress tensor
# of samples: 929,066
zip (3.8 GB) 16 GB
MPLiTrj_subsample DFT(PBE) target: energy, forces, stress tensor
# of samples: 118,024
zip (0.5 GB) 2.1 GB
BVEL13k BVSE target: 1-3D percolation barrier
# of samples: 12,807
zip (11 MB) 35 MB
nebBVSE122k BVSE target: migration barrier
# of samples: 122,421
zip (0.2 GB) 0.9 GB

Python tools

Installation

git clone https://github.com/AIRI-Institute/LiTraj
cd LiTraj
pip install .

Download a specific dataset

from litraj.data import download_dataset
download_dataset('BVEL13k', '.', unzip = True) # save to the current directory and unzip

Read the downloaded dataset

from litraj.data import load_data
train, val, text, index = load_data('BVEL13k', '.')

for atoms in train:
    mp_id =  atoms.info['material_id']
    e1d = atoms.info['E_1D']
    e2d = atoms.info['E_2D']
    e3d = atoms.info['E_3D']
    # do stuff
    pass

For more details see Datasets basics in the notebooks

Datasets in detail

nebDFT2k dataset

benchmark

The nebDFT2k dataset includes Li-ion migration trajectories and corresponding migration barriers for 1,681 vacancy hops between initial and final equilibrium sites. The trajectories are optimized using the climbing image nudged elastic band (NEB) method with DFT for force and energy calculations. Initial trajectories were generated via linear interpolation between start and end positions, followed by preconditioning using the BVSE-NEB method from the ions library.

For benchmarking universal machine learning interatomic potentials (uMLIPs), the BVSE-NEB optimized initial trajectories serve as the starting point for uMLIP-NEB optimization. The resulting trajectories are compared to the DFT ground truth. See an example to learn how to benchmark the pre-trained MACE_MP model.

As an input for graph neural network (GNN) models for structure-to-property prediction of Li-ion migration barriers, we use the supercells with a centroid, marked as 'X' chemical element, added between the starting and final positions of the migrating ion.

Structure of the nebDFT2k dataset

nebDFT2k/
├── nebDFT2k_index.csv              # Table with material_id, edge_id, chemsys, _split, .., columns
├── edge-id1_init.xyz               # Initial (BVSE-NEB optimized) trajectory file, edge_id1 = mp-id1_source_target_offsetx_offsety_offsetz
├── edge-id1_relaxed.xyz            # Final (DFT-NEB optimized) trajectory file 
├── ...
└── nebDFT2k_centroids.xyz          # File with centroid supercells

Usage example

from litraj.data import download_dataset, load_data

# download the dataset to the selected folder
download_dataset('nebDFT2k', 'path/to/save/data') 

# read the dataset from the folder
data = load_data('nebDFT2k', 'path/to/save/data') 

data_train = data[data._split == 'train']
for atoms in data_train.centroid:
    edge_id = atoms.info['edge_id']
    mp_id = atoms.info['material_id']
    em = atoms.info['em']
    # do stuff


for traj_init, traj_relaxed in zip(data.trajectory_init, data.trajectory_relaxed):
    # do stuff
    pass

MPLiTrj dataset

The MPLiTrj dataset contains 929,066 configurations with calculated energies, forces and stress tensors obtained during the DFT-NEB optimization of 2,698 Li-ion migration pathways. Its subsample comprises 118,024 configurations. Each dataset has three .xyz files corresponding to the training, validation, and test data splits.

Structure of the MPLiTrj dataset

MPLiTraj/
├── MPLiTrj_train.xyz               # Training file
├── MPLiTrj_val.xyz                 # Validation file
└── MPLiTrj_test.xyz                # Test file 

Usage example

from litraj.data import download_dataset, load_data

# download the dataset to the selected folder
download_dataset('MPLiTrj_subsample', 'path/to/save/data') 

train, val, test = load_data('MPLiTrj_subsample', 'path/to/save/data') 

structures, energies, forces, stresses = [], [], [], []
for atoms in train:
    structures.append(atoms)
    energies.append(atoms.calc.get_potential_energy())
    forces.append(atoms.calc.get_forces().tolist())
    stresses.append(atoms.calc.get_stress().tolist())

BVEL13k dataset

BVEL13k_stats_figure

The BVEL13k dataset contains Li-ion 1-3D percolation barriers calculated for 12,807 Li-containing ionic crystal structures. The percolation barriers are calculated using BVEL method as implemented in the BVlain python package. There are 73 chemical elements (species), each structure contains at most 160 atoms and has a unit cell volume smaller than 1500 Å3.

Structure of the BVEL13k dataset

BVEL13k/
├── BVEL13k_index.csv                # Table with material_id, chemsys, _split, E_1D, E_2D, E_3D columns
├── BVEL13k_train.xyz                # Training file 
├── BVEL13k_val.xyz                  # Validation file 
└── BVEL13k_test.xyz                 # Test file 

Usage example

from litraj.data import download_dataset, load_data

# download the dataset to the selected folder
download_dataset('BVEL13k', 'path/to/save/data') 

# get train, val, test split of the dataset and the index dataframe
atoms_list_train, atoms_list_val, atoms_list_test, index = load_data('BVEL13k', 'path/to/save/data') 

# the data is stored in the Ase's Atoms object
for atoms in atoms_list_train: 

    mp_id = atoms.info['material_id']
    e1d = atoms.info['E_1D']
    e2d = atoms.info['E_2D']
    e3d = atoms.info['E_3D']
    # do stuff

nebBVSE122k dataset

nebBVSE_stats_figure The nebBVSE122k dataset contains Li-ion migration barriers calculated for 122,421 Li-ion vacancy hops from its starting to final equilibrium positions. The migration barriers are calculated using the NEB method employing BVSE approach as implemented in the ions python package.

As an input for GNN models for structure-to-property prediction, we use the supercells with a centroid, marked as 'X' chemical element, added between the starting and final positions of the migrating ion.

Structure of the nebBVSE122k dataset

nebBVSE122k/
├── nebBVSE122k_index.csv            # Table with material_id, chemsys, _split, em columns
├── nebBVSE122k_train.xyz            # Training file 
├── nebBVSE122k_val.xyz              # Validation file 
└── nebBVSE122k_test.xyz             # Test file 

Usage example

from litraj.data import download_dataset, load_data

# download the dataset to the selected folder
download_dataset('nebBVSE122k', 'path/to/save/data') 

# get train, val, test split of the dataset and the index dataframe
atoms_list_train, atoms_list_val, atoms_list_test, index = load_data('nebBVSE122k', 'path/to/save/data')

for atoms_with_centroid in atoms_list_train:
    edge_id = atoms_with_centroid.info['edge_id']   # mp-id_source_target_offsetx_offsety_offsetz
    mp_id = atoms_with_centroid.info['material_id']
    em = atoms_with_centroid.info['em']
    centroid_index = np.argwhere(atoms_with_centroid.symbols =='X')
    # do stuff

Notebooks

How to cite

If you use the LiTraj dataset, please, consider citing our paper

@article{dembitskiy2025benchmarking,
	title = {{Benchmarking machine learning models for predicting lithium ion migration}},
	author = {Dembitskiy, Artem D. and Humonen, Innokentiy S. and Eremin, Roman A. and Aksyonov, Dmitry A. and Fedotov, Stanislav S. and Budennyy, Semen A.},
	journal = {npj Comput. Mater.},
	volume = {11},
	number = {131},
	year = {2025},
	publisher = {Nature Publishing Group},
	doi = {10.1038/s41524-025-01571-z}
}

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

litraj-0.1.tar.gz (13.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

litraj-0.1-py3-none-any.whl (11.0 kB view details)

Uploaded Python 3

File details

Details for the file litraj-0.1.tar.gz.

File metadata

  • Download URL: litraj-0.1.tar.gz
  • Upload date:
  • Size: 13.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.1 pkginfo/1.8.2 requests/2.28.1 requests-toolbelt/0.9.1 tqdm/4.66.2 CPython/3.8.8

File hashes

Hashes for litraj-0.1.tar.gz
Algorithm Hash digest
SHA256 53d28ca09cf0ed9e0a075abae92f2b614a8e58d671a45dd3e545f90e88248ca6
MD5 a4ff8b3c4036218c66dd8ca4781fe026
BLAKE2b-256 1c21744fccdfb6ac5cdf0fee96cb33b0da44a283adde5a690ff562374c5510a7

See more details on using hashes here.

File details

Details for the file litraj-0.1-py3-none-any.whl.

File metadata

  • Download URL: litraj-0.1-py3-none-any.whl
  • Upload date:
  • Size: 11.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.1 pkginfo/1.8.2 requests/2.28.1 requests-toolbelt/0.9.1 tqdm/4.66.2 CPython/3.8.8

File hashes

Hashes for litraj-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 d3c614b5c7f85ef2bea845ef6e782e37e1aa9993643e949f5f9a01e4e98c8e0f
MD5 337f5b1450c5e8c86220f6e4dcbf9024
BLAKE2b-256 1dfabdd82a5cbbf29180535f1efef8ae7a3392a348dd379191d557284deb4bfb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page