Skip to main content

Fair and transparent benchmark of machine learning interatomic potentials (MLIPs), beyond error-based regression metrics

Project description

MLIP Arena

Static Badge Hugging Face GitHub Actions Workflow Status PyPI - Version DOI

MLIP Arena is a unified platform for evaluating foundation machine learning interatomic potentials (MLIPs) beyond conventional error metrics. It focuses on revealing the physical soundness learned by MLIPs and assessing their utilitarian performance agnostic to underlying model architecture. The platform's benchmarks are specifically designed to evaluate the readiness and reliability of open-source, open-weight models in accurately reproducing both qualitative and quantitative behaviors of atomic systems.

MLIP Arena leverages modern pythonic workflow orchestrator Prefect to enable advanced task/flow chaining and caching.

[!NOTE] Contributions of new tasks are very welcome! If you're interested in joining the effort, please reach out to Yuan at cyrusyc@berkeley.edu. See project page for some outstanding tasks, or propose new one in Discussion.

Announcement

Installation

From PyPI (prefect workflow only, without pretrained models)

pip install mlip-arena

From source

[!CAUTION] We recommend clean build in a new virtual environment due to the compatibility issues between multiple popular MLIPs. We provide a single installation script using uv for minimal package conflicts and fast installation!

[!CAUTION] To automatically download farichem OMat24 checkpoint, please make sure you have gained downloading access to their HuggingFace model repo (not dataset repo), and login locally on your machine through huggginface-cli login (see HF hub authentication)

Linux

# (Optional) Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env

git clone https://github.com/atomind-ai/mlip-arena.git
cd mlip-arena

# One script uv pip installation
bash scripts/install-linux.sh

Mac

# (Optional) Install uv
curl -LsSf https://astral.sh/uv/install.sh | sh
source $HOME/.local/bin/env
# One script uv pip installation
bash scripts/install-macosx.sh

Quickstart

The first example: Molecular Dynamics

Arena provides a unified interface to run all the compiled MLIPs. This can be achieved simply by looping through MLIPEnum:

from mlip_arena.models import MLIPEnum
from mlip_arena.tasks.md import run as MD 
# from mlip_arena.tasks import MD # for convenient import
from mlip_arena.tasks.utils import get_calculator

from ase import units
from ase.build import bulk

atoms = bulk("Cu", "fcc", a=3.6)

results = []

for model in MLIPEnum:
    result = MD(
        atoms=atoms,
        calculator=get_calculator(
            model,
            calculator_kwargs=dict(), # passing into calculator
            dispersion=True,
            dispersion_kwargs=dict(damping='bj', xc='pbe', cutoff=40.0 * units.Bohr), # passing into TorchDFTD3Calculator
        ),
        ensemble="nve",
        dynamics="velocityverlet",
        total_time=1e3, # 1 ps = 1e3 fs
        time_step=2, # fs
    )
    results.append(result)

🚀 Parallelize Benchmarks at Scale

To run multiple benchmarks in parallel, add .submit before the task function and wrap all the tasks into a flow to dispatch the tasks to worker for concurrent execution. See Prefect Doc on tasks and flow for more details.

...
from prefect import flow

@flow
def run_all_tasks:

    futures = []
    for model in MLIPEnum:
        future = MD.submit(
            atoms=atoms,
            ...
        )
        future.append(future)

    return [f.result(raise_on_failure=False) for f in futures]

For a more practical example, please now refer to MOF classification.

List of implemented tasks

The implemented tasks are available under mlip_arena.tasks.<module>.run or from mlip_arena.tasks import * for convenient imports (currently doesn't work if phonopy is not installed).

  • OPT: Structure optimization
  • EOS: Equation of state (energy-volume scan)
  • MD: Molecular dynamics with flexible dynamics (NVE, NVT, NPT) and temperature/pressure scheduling (annealing, shearing, etc)
  • PHONON: Phonon calculation driven by phonopy
  • NEB: Nudged elastic band
  • NEB_FROM_ENDPOINTS: Nudge elastic band with convenient image interpolation (linear or IDPP)
  • ELASTICITY: Elastic tensor calculation

Contribute

MLIP Arena is now in pre-alpha. If you're interested in joining the effort, please reach out to Yuan at cyrusyc@berkeley.edu.

Development

git lfs fetch --all
git lfs pull
streamlit run serve/app.py

Add new benchmark tasks (WIP)

[!NOTE] Please reuse, extend, or chain the general tasks defined above

Add new MLIP models

If you have pretrained MLIP models that you would like to contribute to the MLIP Arena and show benchmark in real-time, there are two ways:

External ASE Calculator (easy)

  1. Implement new ASE Calculator class in mlip_arena/models/externals.
  2. Name your class with awesome model name and add the same name to registry with metadata.

[!CAUTION] Remove unneccessary outputs under results class attributes to avoid error for MD simulations. Please refer to other class definition for example.

Hugging Face Model (recommended, difficult)

  1. Inherit Hugging Face ModelHubMixin class to your awesome model class definition. We recommend PytorchModelHubMixin.
  2. Create a new Hugging Face Model repository and upload the model file using push_to_hub function.
  3. Follow the template to code the I/O interface for your model here.
  4. Update model registry with metadata

Citation

If you find the work useful, please consider citing the following:

@inproceedings{
    chiang2025mlip,
    title={{MLIP} Arena: Advancing Fairness and Transparency in Machine Learning Interatomic Potentials through an Open and Accessible Benchmark Platform},
    author={Yuan Chiang and Tobias Kreiman and Elizabeth Weaver and Ishan Amin and Matthew Kuner and Christine Zhang and Aaron Kaplan and Daryl Chrzan and Samuel M Blau and Aditi S. Krishnapriyan and Mark Asta},
    booktitle={AI for Accelerated Materials Design - ICLR 2025},
    year={2025},
    url={https://openreview.net/forum?id=ysKfIavYQE}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mlip_arena-0.1.1.tar.gz (2.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mlip_arena-0.1.1-py3-none-any.whl (96.1 kB view details)

Uploaded Python 3

File details

Details for the file mlip_arena-0.1.1.tar.gz.

File metadata

  • Download URL: mlip_arena-0.1.1.tar.gz
  • Upload date:
  • Size: 2.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.32.3

File hashes

Hashes for mlip_arena-0.1.1.tar.gz
Algorithm Hash digest
SHA256 4f5b00a3a2bf53c9588895aa5fe1f2cd3bfda15abe77e64b16239908c0a907b1
MD5 5bd5024e2e58f7591d14ee87cb9aa594
BLAKE2b-256 ee53ef7ff56045ee14fd3950d71094f35161a74433c4afb26941c42dc1d472b0

See more details on using hashes here.

File details

Details for the file mlip_arena-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: mlip_arena-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 96.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: python-requests/2.32.3

File hashes

Hashes for mlip_arena-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3b30256bb16c040f34adb0f085c788c3e9a37d92d6f02279e360f84a5358e7ee
MD5 dd53fadd65b69f363cba721b08ed0d2d
BLAKE2b-256 a008cb3d86389d6aa9b4a7f94b2d04d9cad8e5ada13ee400486e772497f316f0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page