A machine learning benchmark that simulates high-throughput screening for new materials and ranks energy models by their ability to increase the hit rate of stable crystals

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Matbench Discovery

Matbench Discovery is an interactive leaderboard and associated PyPI package for benchmarking ML energy models on a task designed to closely emulate a real-world computational materials discovery workflow. In it, these models take on the role of a triaging step prior to DFT to decide how to allocate limited compute budget for structure relaxations.

We welcome contributions that add new models to the leaderboard through GitHub PRs. See the usage and contributing guide for details.

Several new energy models specifically designed to handle unrelaxed structures were published in 2021/22

BOWSR
M3GNet
Wren
missing one? Please open an issue.

Such models are suited for a materials discovery workflow in which they pre-filter and/or pre-relax structures that are then fed into high-throughput DFT. Even for someone trying to keep up with the literature though, it's unclear which model performs best at that task. Consequently, we think a follow-up paper to the 2020 work from Chris Bartel is in order.

A critical examination of compound stability predictions from machine-learned formation energies

This project aims to complement Matbench using the WBM dataset published in Predicting stable crystalline compounds using chemical similarity. They generated ~250k structures with chemical similarity-based elemental substitution and relaxed all of them. ~20k or 10% were found to lie on the Materials Project convex hull. They did 5 iterations of this substitution process. This is a unique and compelling feature of the dataset as it allows out-of-distribution testing. We can look at how a model performs when asked to predict on structures increasingly more different from the training set (which is restricted to MP for all models in this benchmark at the moment) since repeated substitutions should - on average - increase chemical dissimilarity.

A good set of baseline models would be CGCNN, Wren and Voronoi tessellation combined with a random forest. In addition to CGCNN, Wren and Voronoi plus RF, this benchmark includes BOWSR and M3GNet to see how many of the 20k stable structures each of these models recover and how their performance changes as a function of iteration number, i.e. how well they extrapolate. Like Matbench, future model submissions to this benchmark can be added via PRs to this repo.

Our goal with this site is to serve as an interactive dashboard for researchers that makes it easy to compare the performance of different energy models on metrics like precision, recall and discovery acceleration to find the model that best suits your needs. You can then make an informed decision about which model to pick by trading off compute savings from an increased hit rate to a more complete discovery in your materials space of interest from higher recall.

On a more philosophical note: Another primary goal of this benchmark is to at least partly answer the question of how useful ML energy models really are at helping to accelerate inorganic crystal searching and whether DFT emulators like M3GNet or one-shot predictors like Wren do better.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.1.1

Jan 28, 2024

1.1.0

Jan 25, 2024

1.0.0

Sep 14, 2023

0.1.2

Feb 21, 2023

0.1.1

Feb 21, 2023

This version

0.1.0

Feb 10, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matbench-discovery-0.1.0.tar.gz (28.2 kB view hashes)

Uploaded Feb 10, 2023 Source

Built Distribution

matbench_discovery-0.1.0-py3-none-any.whl (27.8 kB view hashes)

Uploaded Feb 10, 2023 Python 3

Hashes for matbench-discovery-0.1.0.tar.gz

Hashes for matbench-discovery-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`034e970a9dcab8a56d4795be61e57329434236995b57d3dffd417cd4bacc3898`
MD5	`2359a736455cd42154e996ff8879af1c`
BLAKE2b-256	`7c2805099310dd20bed11fafbf31266c9e0a0a7d46fb0ce9815ef51722ed3e23`

Hashes for matbench_discovery-0.1.0-py3-none-any.whl

Hashes for matbench_discovery-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`52acf0f469a50768296ce483a3265e18d1d0c85c692b681229a2c7e583d4cc0d`
MD5	`f09206860a772634e4e61cc9196ae791`
BLAKE2b-256	`c351511b6da252682256c86cedee2dc75292cb59259aa5d32cd10cae046ac348`