Skip to main content

Tools for working with MatPES.

Project description

GitHub license Linting

Introduction

Welcome to MatPES, a foundational potential energy surface (PES) dataset for materials. This is a collaboration between the Materials Virtual Lab and the Materials Project.

Machine learning interatomic potentials (MLIPs) have revolutionized the field of computational materials science. MLIPs use ML to reproduce the PES (energies, forces, and stresses) of a collection of atoms, typically computed using an ab initio method such as density functional theory (DFT). This enables the simulation of materials at much larger length and longer time scales at near-ab initio accuracy.

One of the most exciting developments in the past few years is the emergence of universal MLIPs (uMLIPs, aka materials foundational models), with near-complete coverage of the periodic table of elements. Examples include M3GNet,[^1] CHGNet,[^2] MACE,[^3] to name a few. uMLIPs have broad applications, including materials discovery and the prediction of PES-derived properties such as elastic constants, phonon dispersion, etc.

However, most current uMLIPs were trained on DFT relaxation calculations from the Materials Project. This dataset, referred to as MPF or MPTraj in the literature, suffer from several issues:

  1. The energies, forces, and stresses are not converged to the accuracies necessary to train a high quality MLIP.
  2. Most of the structures are near-equilibrium, with very little coverage of non-equilibrium local environments.
  3. The calculations were performed using the common Perdew-Burke-Ernzerhof (PBE) generalized gradient approximation (GGA) functional, even though improved functionals with better performance across diverse chemistries and bonding such as the strongly constrained and appropriately normed (SCAN) meta-GGA functional already exists.

MatPES is a continuing effort to address these limitations comprehensively.

Goals

The aims of MatPES are three-fold:

  1. Accuracy. The data in MatPES was computed using static DFT calculations with stringent converegence criteria. Please refer to the MatPESStaticSet in pymatgen for details.
  2. Diversity. The structures in MatPES are robustly sampled from 300K MD simulations using the original M3GNet uMLIP.[^1] MatPES uses a modified version of DImensionality-Reduced Encoded Clusters with sTratified (DIRECT) sampling to ensure comprehensive coverage of structures and local environments.[^4]
  3. Quality. MatPES contains not only data computed using the PBE functional, but also the revised regularized SCAN (r2SCAN) meta-GGA functional. The r2SCAN functional recovers all 17 exact constraints presently known for meta-GGA functionals and has shown good transferable accuracy across diverse bonding and chemistries.

How to use MatPES

Download

You can download the entire MatPES dataset at MatPES.ai. (note: links are not functional until publication).

We have also provided a simple tool to extract subsets of the data, e.g., by elements or chemical system, via the matpes package.

pip install matpes

Example code

Exploring

The MatPES Explorer provides a statistical visualization of the dataset.

Pre-trained uMLIPs

Together with MatPES, we have released a set of MatPES uMLIPs in various architectures (M3GNet, CHGNet, TensorNet) in the MatGL package. This is by far the easiest way to get started with using MatPES.

Notebooks

We have provided a series of Jupyter notebooks demonstrating how to load the MatPES dataset, train a model and perform finetuning.

Citing MatPES

If you use MatPES, please cite the following work:

Aaron Kaplan, Runze Liu, Ji Qi, Tsz Wai Ko, Bowen Deng, Gerbrand Ceder, Kristin A. Persson, Shyue Ping Ong.
A foundational potential energy surface dataset for materials. Submitted.

[^1]: Chen, C.; Ong, S. P. A Universal Graph Deep Learning Interatomic Potential for the Periodic Table. Nat Comput Sci 2022, 2 (11), 718-728. DOI: 10.1038/s43588-022-00349-3. [^2]: Deng, B.; Zhong, P.; Jun, K.; Riebesell, J.; Han, K.; Bartel, C. J.; Ceder, G. CHGNet as a Pretrained Universal Neural Network Potential for Charge-Informed Atomistic Modelling. Nat Mach Intell 2023, 5 (9), 1031-1041. DOI: 10.1038/s42256-023-00716-3. [^3]: Batatia, I.; Kovacs, D. P.; Simm, G.; Ortner, C.; Csanyi, G. MACE: Higher Order Equivariant Message Passing Neural Networks for Fast and Accurate Force Fields. Advances in Neural Information Processing Systems 2022, 35, 11423-11436. [^4]: Qi, J.; Ko, T. W.; Wood, B. C.; Pham, T. A.; Ong, S. P. Robust Training of Machine Learning Interatomic Potentials with Dimensionality Reduction and Stratified Sampling. npj Computational Materials 2024, 10 (43), 1-11. DOI: 10.1038/s41524-024-01227-4.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matpes-0.0.1.tar.gz (12.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

matpes-0.0.1-py3-none-any.whl (9.6 kB view details)

Uploaded Python 3

File details

Details for the file matpes-0.0.1.tar.gz.

File metadata

  • Download URL: matpes-0.0.1.tar.gz
  • Upload date:
  • Size: 12.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.10

File hashes

Hashes for matpes-0.0.1.tar.gz
Algorithm Hash digest
SHA256 44d95e375bda9dd293ed3de747bdbec01d320e973aa752ebbd93b1b7e3f54b22
MD5 170e85cf00b34dbe2a891b1201bec19a
BLAKE2b-256 23c951aeac57009ce86b11609c5929bea360a38bece720e258023156e32eed4b

See more details on using hashes here.

File details

Details for the file matpes-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: matpes-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 9.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.10

File hashes

Hashes for matpes-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 89987de0858d0b70f5003f065a4ecb40229c8e5dab08b9b3899d3c5619bd35f6
MD5 7da41ee359f47eea59daff792efd37a5
BLAKE2b-256 ee383326524345cc4aa4763f7ff83e09f0e2dd5bc74ae88b53bd992235e546b1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page