alignn
Project description
Table of Contents
- Introduction
- Installation
- Examples
- Pre-trained models
- Quick start using colab
- JARVIS-ALIGNN webapp
- ALIGNN-FF & ASE Calculator
- Peformances on a few datasets
- Useful notes
- References
- How to contribute
- Correspondence
- Funding support
ALIGNN (Introduction)
The Atomistic Line Graph Neural Network (https://www.nature.com/articles/s41524-021-00650-1) introduces a new graph convolution layer that explicitly models both two and three body interactions in atomistic systems.
This is achieved by composing two edge-gated graph convolution layers, the first applied to the atomistic line graph L(g) (representing triplet interactions) and the second applied to the atomistic bond graph g (representing pair interactions).
The atomistic graph g consists of a node for each atom i (with atom/node representations hi), and one edge for each atom pair within a cutoff radius (with bond/pair representations eij).
The atomistic line graph L(g) represents relationships between atom triplets: it has nodes corresponding to bonds (sharing representations eij with those in g) and edges corresponding to bond angles (with angle/triplet representations tijk).
The line graph convolution updates the triplet representations and the pair representations; the direct graph convolution further updates the pair representations and the atom representations.
Installation
First create a conda environment: Install miniconda environment from https://conda.io/miniconda.html Based on your system requirements, you'll get a file something like 'Miniconda3-latest-XYZ'.
Now,
bash Miniconda3-latest-Linux-x86_64.sh (for linux)
bash Miniconda3-latest-MacOSX-x86_64.sh (for Mac)
Download 32/64 bit python 3.10 miniconda exe and install (for windows) Now, let's make a conda environment, say "version", choose other name as you like::
conda create --name version python=3.10
source activate version
optional GPU dependencies
If you need CUDA support, it's best to install PyTorch and DGL before installing alignn to ensure that you get a CUDA-enabled version of DGL.
To [install the stable release of PyTorch] on linux with cudatoolkit 11.8 run
conda install pytorch torchvision torchaudio pytorch-cuda=11.8 -c pytorch -c nvidia
Then install the matching DGL version
conda install -c dglteam/label/cu118 dgl
Some of our models may not be stable with the latest DGL release (v1.1.0) so you may wish to install v1.0.2 instead:
conda install -c dglteam/label/cu118 dgl==1.0.2.cu118
Method 1 (editable in-place install)
You can install a development version of alignn by cloning the repository and installing in place with pip:
git clone https://github.com/usnistgov/alignn
cd alignn
python -m pip install -e .
Method 2 (using pypi):
As an alternate method, ALIGNN can also be installed using pip
command as follows:
pip install alignn
pip install dgl==1.0.1+cu117 -f https://data.dgl.ai/wheels/cu117/repo.html
Examples
Dataset
The main script to train model is train_folder.py
. A user needs at least the following info to train a model: 1) id_prop.csv
with name of the file and corresponding value, 2) config_example.json
a config file with training and hyperparameters.
Users can keep their structure files in POSCAR
, .cif
, .xyz
or .pdb
files in a directory. In the examples below we will use POSCAR format files. In the same directory, there should be an id_prop.csv
file.
In this directory, id_prop.csv
, the filenames, and correponding target values are kept in comma separated values (csv) format
.
Here is an example of training OptB88vdw bandgaps of 50 materials from JARVIS-DFT database. The example is created using the generate_sample_data_reg.py script. Users can modify the script for more than 50 data, or make their own dataset in this format. For list of available datasets see Databases.
The dataset in split in 80:10:10 as training-validation-test set (controlled by train_ratio, val_ratio, test_ratio
) . To change the split proportion and other parameters, change the config_example.json file. If, users want to train on certain sets and val/test on another dataset, set n_train
, n_val
, n_test
manually in the config_example.json
and also set keep_data_order
as True there so that random shuffle is disabled.
A brief help guide (-h
) can be obtained as follows.
train_folder.py -h
Regression example
Now, the model is trained as follows. Please increase the batch_size
parameter to something like 32 or 64 in config_example.json
for general trainings.
train_folder.py --root_dir "alignn/examples/sample_data" --config "alignn/examples/sample_data/config_example.json" --output_dir=temp
Classification example
While the above example is for regression, the follwoing example shows a classification task for metal/non-metal based on the above bandgap values. We transform the dataset
into 1 or 0 based on a threshold of 0.01 eV (controlled by the parameter, classification_threshold
) and train a similar classification model. Currently, the script allows binary classification tasks only.
train_folder.py --root_dir "alignn/examples/sample_data" --classification_threshold 0.01 --config "alignn/examples/sample_data/config_example.json" --output_dir=temp
Multi-output model example
While the above example regression was for single-output values, we can train multi-output regression models as well. An example is given below for training formation energy per atom, bandgap and total energy per atom simulataneously. The script to generate the example data is provided in the script folder of the sample_data_multi_prop. Another example of training electron and phonon density of states is provided also.
train_folder.py --root_dir "alignn/examples/sample_data_multi_prop" --config "alignn/examples/sample_data/config_example.json" --output_dir=temp
Automated model training
Users can try training using multiple example scripts to run multiple dataset (such as JARVIS-DFT, Materials project, QM9_JCTC etc.). Look into the alignn/scripts/train_*.py folder. This is done primarily to make the trainings more automated rather than making folder/ csv files etc. These scripts automatically download datasets from Databases in jarvis-tools and train several models. Make sure you specify your specific queuing system details in the scripts.
Using pre-trained models
All the trained models are distributed on [Figshare](https://figshare.com/projects/ALIGNN_models/126478.
The pretrained.py script can be applied to use them. These models can be used to directly make predictions.
A brief help section (-h
) is shown using:
pretrained.py -h
An example of prediction formation energy per atom using JARVIS-DFT dataset trained model is shown below:
pretrained.py --model_name jv_formation_energy_peratom_alignn --file_format poscar --file_path alignn/examples/sample_data/POSCAR-JVASP-10.vasp
Quick start using GoogleColab notebook example
The following notebook provides an example of 1) installing ALIGNN model, 2) training the example data and 3) using the pretrained models. For this example, you don't need to install alignn package on your local computer/cluster, it requires a gmail account to login. Learn more about Google colab here.
The following notebook provides an example of ALIGNN-FF model.
Web-app
A basic web-app is for direct-prediction available at JARVIS-ALIGNN app. Given atomistic structure in POSCAR format it predict formation energy, total energy per atom and bandgap using data trained on JARVIS-DFT dataset.
ALIGNN-FF
ASE calculator provides interface to various codes. An example for ALIGNN-FF is give below. Note that there are multiple pretrained ALIGNN-FF models available, here we use the deafult_path model. As more accurate models are developed, they will be made available as well:
from alignn.ff.ff import AlignnAtomwiseCalculator,default_path
model_path = default_path()
calc = AlignnAtomwiseCalculator(path=model_path)
from ase import Atom, Atoms
import numpy as np
import matplotlib.pyplot as plt
lattice_params = np.linspace(3.5, 3.8)
fcc_energies = []
ready = True
for a in lattice_params:
atoms = Atoms([Atom('Cu', (0, 0, 0))],
cell=0.5 * a * np.array([[1.0, 1.0, 0.0],
[0.0, 1.0, 1.0],
[1.0, 0.0, 1.0]]),
pbc=True)
atoms.set_tags(np.ones(len(atoms)))
atoms.calc = calc
e = atoms.get_potential_energy()
fcc_energies.append(e)
import matplotlib.pyplot as plt
%matplotlib inline
plt.plot(lattice_params, fcc_energies)
plt.title('1x1x1')
plt.xlabel('Lattice constant ($\AA$)')
plt.ylabel('Total energy (eV)')
plt.show()
To train ALIGNN-FF use train_folder_ff.py
script which uses atomwise_alignn
model:
AtomWise prediction example which looks for similar setup as before but unstead of id_prop.csv
, it requires id_prop.json
file (see example in the sample_data_ff directory). Note ALIGNN-FF requires energy stored as energy per atom:
train_folder_ff.py --root_dir "alignn/examples/sample_data_ff" --config "alignn/examples/sample_data_ff/config_example_atomwise.json" --output_dir=temp
A pretrained ALIGNN-FF (under active development right now) can be used for predicting several properties, such as:
run_alignn_ff.py --file_path alignn/examples/sample_data/POSCAR-JVASP-10.vasp --task="unrelaxed_energy"
run_alignn_ff.py --file_path alignn/examples/sample_data/POSCAR-JVASP-10.vasp --task="optimize"
run_alignn_ff.py --file_path alignn/examples/sample_data/POSCAR-JVASP-10.vasp --task="ev_curve"
To know about other tasks, type.
run_alignn_ff.py -h
Performances
Please refer to JARVIS-Leaderboard to check the performance of ALIGNN models on several databases.
1) On JARVIS-DFT 2021 dataset (classification)
Model | Threshold | ALIGNN |
---|---|---|
Metal/non-metal classifier (OPT) | 0.01 eV | 0.92 |
Metal/non-metal classifier (MBJ) | 0.01 eV | 0.92 |
Magnetic/non-Magnetic classifier | 0.05 µB | 0.91 |
High/low SLME | 10 % | 0.83 |
High/low spillage | 0.1 | 0.80 |
Stable/unstable (ehull) | 0.1 eV | 0.94 |
High/low-n-Seebeck | -100 µVK-1 | 0.88 |
High/low-p-Seebeck | 100 µVK-1 | 0.92 |
High/low-n-powerfactor | 1000 µW(mK2)-1 | 0.74 |
High/low-p-powerfactor | 1000µW(mK2)-1 | 0.74 |
2) On JARVIS-DFT 2021 dataset (regression)
Property | Units | MAD | CFID | CGCNN | ALIGNN | MAD: MAE |
---|---|---|---|---|---|---|
Formation energy | eV(atom)-1 | 0.86 | 0.14 | 0.063 | 0.033 | 26.06 |
Bandgap (OPT) | eV | 0.99 | 0.30 | 0.20 | 0.14 | 7.07 |
Total energy | eV(atom)-1 | 1.78 | 0.24 | 0.078 | 0.037 | 48.11 |
Ehull | eV | 1.14 | 0.22 | 0.17 | 0.076 | 15.00 |
Bandgap (MBJ) | eV | 1.79 | 0.53 | 0.41 | 0.31 | 5.77 |
Kv | GPa | 52.80 | 14.12 | 14.47 | 10.40 | 5.08 |
Gv | GPa | 27.16 | 11.98 | 11.75 | 9.48 | 2.86 |
Mag. mom | µB | 1.27 | 0.45 | 0.37 | 0.26 | 4.88 |
SLME (%) | No unit | 10.93 | 6.22 | 5.66 | 4.52 | 2.42 |
Spillage | No unit | 0.52 | 0.39 | 0.40 | 0.35 | 1.49 |
Kpoint-length | Å | 17.88 | 9.68 | 10.60 | 9.51 | 1.88 |
Plane-wave cutoff | eV | 260.4 | 139.4 | 151.0 | 133.8 | 1.95 |
єx (OPT) | No unit | 57.40 | 24.83 | 27.17 | 20.40 | 2.81 |
єy (OPT) | No unit | 57.54 | 25.03 | 26.62 | 19.99 | 2.88 |
єz (OPT) | No unit | 56.03 | 24.77 | 25.69 | 19.57 | 2.86 |
єx (MBJ) | No unit | 64.43 | 30.96 | 29.82 | 24.05 | 2.68 |
єy (MBJ) | No unit | 64.55 | 29.89 | 30.11 | 23.65 | 2.73 |
єz (MBJ) | No unit | 60.88 | 29.18 | 30.53 | 23.73 | 2.57 |
є (DFPT:elec+ionic) | No unit | 45.81 | 43.71 | 38.78 | 28.15 | 1.63 |
Max. piezoelectric strain coeff (dij) | CN-1 | 24.57 | 36.41 | 34.71 | 20.57 | 1.19 |
Max. piezo. stress coeff (eij) | Cm-2 | 0.26 | 0.23 | 0.19 | 0.147 | 1.77 |
Exfoliation energy | meV(atom)-1 | 62.63 | 63.31 | 50.0 | 51.42 | 1.22 |
Max. EFG | 1021 Vm-2 | 43.90 | 24.54 | 24.7 | 19.12 | 2.30 |
avg. me | electron mass unit | 0.22 | 0.14 | 0.12 | 0.085 | 2.59 |
avg. mh | electron mass unit | 0.41 | 0.20 | 0.17 | 0.124 | 3.31 |
n-Seebeck | µVK-1 | 113.0 | 56.38 | 49.32 | 40.92 | 2.76 |
n-PF | µW(mK2)-1 | 697.80 | 521.54 | 552.6 | 442.30 | 1.58 |
p-Seebeck | µVK-1 | 166.33 | 62.74 | 52.68 | 42.42 | 3.92 |
p-PF | µW(mK2)-1 | 691.67 | 505.45 | 560.8 | 440.26 | 1.57 |
3) On Materials project 2018 dataset
The results from models other than ALIGNN are reported as given in corresponding papers, not necessarily reproduced by us.
Prop | Unit | MAD | CFID | CGCNN | MEGNet | SchNet | ALIGNN | MAD:MAE |
---|---|---|---|---|---|---|---|---|
Ef | eV(atom)-1 | 0.93 | 0.104 | 0.039 | 0.028 | 0.035 | 0.022 | 42.27 |
Eg | eV | 1.35 | 0.434 | 0.388 | 0.33 | - | 0.218 | 6.19 |
4) On QM9 dataset
Note the issue related to QM9 dataset. The results from models other than ALIGNN are reported as given in corresponding papers, not necessarily reproduced by us. These models were trained with same parameters as solid-state databases but for 1000 epochs.
Target | Units | SchNet | MEGNet | DimeNet++ | ALIGNN |
---|---|---|---|---|---|
HOMO | eV | 0.041 | 0.043 | 0.0246 | 0.0214 |
LUMO | eV | 0.034 | 0.044 | 0.0195 | 0.0195 |
Gap | eV | 0.063 | 0.066 | 0.0326 | 0.0381 |
ZPVE | eV | 0.0017 | 0.00143 | 0.00121 | 0.0031 |
µ | Debye | 0.033 | 0.05 | 0.0297 | 0.0146 |
α | Bohr3 | 0.235 | 0.081 | 0.0435 | 0.0561 |
R2 | Bohr2 | 0.073 | 0.302 | 0.331 | 0.5432 |
U0 | eV | 0.014 | 0.012 | 0.00632 | 0.0153 |
U | eV | 0.019 | 0.013 | 0.00628 | 0.0144 |
H | eV | 0.014 | 0.012 | 0.00653 | 0.0147 |
G | eV | 0.014 | 0.012 | 0.00756 | 0.0144 |
5) On hMOF dataset
Property | Unit | MAD | MAE | MAD:MAE | R2 | RMSE |
---|---|---|---|---|---|---|
Grav. surface area | m2 g-1 | 1430.82 | 91.15 | 15.70 | 0.99 | 180.89 |
Vol. surface area | m2 cm-3 | 561.44 | 107.81 | 5.21 | 0.91 | 229.24 |
Void fraction | No unit | 0.16 | 0.017 | 9.41 | 0.98 | 0.03 |
LCD | Å | 3.44 | 0.75 | 4.56 | 0.83 | 1.83 |
PLD | Å | 3.55 | 0.92 | 3.86 | 0.78 | 2.12 |
All adsp | mol kg-1 | 1.70 | 0.18 | 9.44 | 0.95 | 0.49 |
Adsp at 0.01bar | mol kg-1 | 0.12 | 0.04 | 3.00 | 0.77 | 0.11 |
Adsp at 2.5bar | mol kg-1 | 2.16 | 0.48 | 4.50 | 0.90 | 0.97 |
6) On qMOF dataset
MAE on electronic bandgap 0.20 eV
7) On OMDB dataset
coming soon!
8) On HOPV dataset
coming soon!
9) On QETB dataset
coming soon!
10) On OpenCatalyst dataset
DataSplit | CGCNN | DimeNet | SchNet | DimeNet++ | ALIGNN | MAD: MAE |
---|---|---|---|---|---|---|
10k | 0.988 | 1.0117 | 1.059 | 0.8837 | 0.61 | - |
Useful notes (based on some of the queries we received)
- If you are using GPUs, make sure you have a compatible dgl-cuda version installed, for example: dgl-cu101 or dgl-cu111, so e.g.
pip install dgl-cu111
. - While comnventional '.cif' and '.pdb' files can be read using jarvis-tools, for complex files you might have to install
cif2cell
andpytraj
respectively i.e.pip install cif2cell==2.0.0a3
andconda install -c ambermd pytraj
. - Make sure you use
batch_size
as 32 or 64 for large datasets, and not 2 as given in the example config file, else it will take much longer to train, and performnce might drop a lot. - Note that
train_folder.py
andpretrained.py
in alignn folder are actually python executable scripts. So, even if you don't provide absolute path of these scripts, they should work. - Learn about the issue with QM9 results here: https://github.com/usnistgov/alignn/issues/54
- Make sure you have
pandas
version as 1.2.3.
References
- Atomistic Line Graph Neural Network for improved materials property predictions
- Prediction of the Electron Density of States for Crystalline Compounds with Atomistic Line Graph Neural Networks (ALIGNN)
- Recent advances and applications of deep learning methods in materials science
- Designing High-Tc Superconductors with BCS-inspired Screening, Density Functional Theory and Deep-learning
- A Deep-learning Model for Fast Prediction of Vacancy Formation in Diverse Materials
- Graph neural network predictions of metal organic framework CO2 adsorption properties
- Rapid Prediction of Phonon Structure and Properties using an Atomistic Line Graph Neural Network (ALIGNN)
- Unified graph neural network force-field for the periodic table
- Large Scale Benchmark of Materials Design Methods
Please see detailed publications list here.
How to contribute
For detailed instructions, please see Contribution instructions
Correspondence
Please report bugs as Github issues (https://github.com/usnistgov/alignn/issues) or email to kamal.choudhary@nist.gov.
Funding support
NIST-MGI (https://www.nist.gov/mgi).
Code of conduct
Please see Code of conduct
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file alignn-2024.2.4.tar.gz
.
File metadata
- Download URL: alignn-2024.2.4.tar.gz
- Upload date:
- Size: 92.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d6e77e0e1536786850bc7191303cdec12eb614ad1a716e9f9cb40df0f39645d7 |
|
MD5 | 1c26a6ff842b3a6052b05c2afff2a417 |
|
BLAKE2b-256 | f63d1cd8c586267fce0bc68ec0108fc8dab34df8e6a1768bcbb4f9ca37939769 |
Provenance
File details
Details for the file alignn-2024.2.4-py2.py3-none-any.whl
.
File metadata
- Download URL: alignn-2024.2.4-py2.py3-none-any.whl
- Upload date:
- Size: 130.2 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.9.18
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ac68a918630e04792275ad609e9449aa685a7f2bcd4c7b45348b7ebbe0c02b63 |
|
MD5 | a331a99504aff370ab0f8c55d7221ca5 |
|
BLAKE2b-256 | c0a66b368d73cf38ae5b42846e7ae85485422cced14eda936851286204719946 |