Skip to main content

An open benchmark suite and reference baselines for high-entropy alloy phase prediction.

Project description

hea-bench

An open, reproducible benchmark suite and reference baselines for high-entropy alloy (HEA) phase prediction.

DOI License: MIT tests: 156 coverage: 86.7%

TL;DR

  • A consolidated, deduplicated open dataset of 7,784 experimentally characterized multi-principal element alloys, merged from three primary sources (Borg 2020, Pei 2020, Peivaste 2023) with per-row source provenance.
  • Reference baseline implementations of the four canonical empirical phase-prediction rules (Yeh ΔSmix, Zhang δ, Guo-Liu VEC, Yang-Zhang Ω), wrapped as proper diagnostic classifiers with sensitivity / specificity / Wilson 95% CIs.
  • A clean, dependency-free Python API (pip install hea-bench) and an in-browser frontend that runs the same library via Pyodide/WebAssembly — no install, no server, no JavaScript re-implementation.

Using an AI coding agent to integrate this? See AGENTS.md for a machine-oriented guide to the API, exact return types and units, the fastest path to each task, and the mistakes to avoid.

Headline benchmark numbers (v0.1.0)

Running the four canonical rules against the consolidated benchmark produces the reference baselines below. These are pinned in tests so any drift in dataset, descriptor code, or rule thresholds surfaces as a test failure.

Rule n_eval Accuracy Sens (single-phase) Spec (multi-phase) Youden's J
Zhang δ < 6.5% 6,651 56.7% 99.0% 8.5% 0.075
Yang Ω > 1.1 6,651 54.4% 95.8% 7.4% 0.032

The Guo–Liu VEC rule predicts crystal structure rather than single-vs-multi, so it's evaluated stratified to single-phase observations (BCC|FCC only):

Rule n_eval Accuracy FCC sensitivity BCC sensitivity
Guo–Liu VEC (FCC if VEC ≥ 8.0, BCC if VEC < 6.87) 3,463 66.9% 92.4% 48.3%

Yeh ΔSmix is descriptive (no phase-prediction claim attached) — 47% of the consolidated benchmark passes the 1.5R HEA-class threshold, 37% sits in the MEA bin, 16% is dilute.

The publishable observation: on a consolidated benchmark drawn from three independent open sources, both binary rules collapse to "predict single-phase almost always" (Youden's J ~ 0.03–0.08), and the VEC rule misses about half of observed BCC alloys despite catching 92% of FCC alloys. The canonical rules generalize poorly.

Quick start (Python)

pip install hea-bench
import hea_bench

cantor = {"Co": 0.2, "Cr": 0.2, "Fe": 0.2, "Mn": 0.2, "Ni": 0.2}

hea_bench.smix(cantor)               # 13.381 J/(mol·K)  = R · ln 5
hea_bench.delta(cantor)              # 3.164 % atomic-size mismatch
hea_bench.vec(cantor)                # 8.0 valence electrons
hea_bench.mixing_enthalpy(cantor)    # -4.16 kJ/mol  (Miedema)
hea_bench.omega(cantor)              # 5.79  (Yang-Zhang)

# Apply the canonical rules
from hea_bench.rules import zhang_delta, yang_omega, guo_vec
zhang_delta.predict(cantor)          # 'single-phase'
yang_omega.predict(cantor)           # 'single-phase'
guo_vec.predict(cantor)              # 'FCC'

# Run the full rule benchmark against the consolidated v0.1.0 dataset
from hea_bench.evaluate import build_report
report = build_report()
print(report["rules"]["zhang_delta_6_5"]["accuracy"])  # 0.5670

Quick start (CLI)

hea-bench --version
python -m hea_bench.evaluate           # run all 4 rules on v0.1.0
python -m hea_bench.benchmark.coverage # coverage analysis on v0.1.0

Quick start (browser, no install)

The same Python library runs in a browser tab via Pyodide. After cloning:

python -m http.server 8000 --directory web
# open http://localhost:8000

First load downloads the Pyodide runtime (~10 MB, cached after) and the hea_bench wheel. Then the in-page UI computes descriptors locally — same code, same numerics, no server.

Architecture

                                ┌────────────────────────────┐
                                │  data/consolidated/v0.1.0/ │
                                │  - consolidated.csv        │
                                │  - rule_baselines.json     │
                                │  - coverage_report.json    │
                                │  - manifest.json           │
                                └─────────────▲──────────────┘
                                              │
                                              │ produced by
                                              │
   ┌─────────────────────┐    ┌───────────────┴───────────────┐
   │  data/raw/          │    │  src/hea_bench/               │
   │  - borg2020/        │───►│  - benchmark/                 │
   │  - pei2020/         │    │      consolidate.py           │
   │  - peivaste/        │    │      coverage.py              │
   │  (per-source READMEs│    │      loaders/{borg,pei,...}.py│
   │   + provenance)     │    │  - descriptors/{size, vec,    │
   └─────────────────────┘    │      melting, miedema, omega} │
                              │  - rules/{yeh, zhang,         │
                              │      guo, yang}               │
                              │  - classifiers/               │
                              │      diagnostic_stats.py      │
                              │  - evaluate.py                │
                              └──────────────┬────────────────┘
                                             │
                                             │ also runs in
                                             ▼
                              ┌──────────────────────────────┐
                              │  web/   (Pyodide front-end)  │
                              │  - index.html                │
                              │  - dist/hea_bench-*.whl      │
                              └──────────────────────────────┘

What's in the benchmark

data/consolidated/v0.1.0/consolidated.csv7,784 unique compositions × 14 columns:

  • composition_key — alphabetically sorted element symbols + 4-decimal mole fractions, the canonical join key
  • n_elements, sources (semicolon-separated)
  • canonical_phase — one of BCC / FCC / HCP / multi-phase (blank when the contributing sources disagree)
  • has_conflict — 1 when the canonical_phase is blank because of a source-label disagreement
  • Per-source canonical and raw labels preserved verbatim
  • borg_processing, borg_doi, source_row_ids for provenance

100 of the 7,784 compositions are cross-source label conflicts — flagged for downstream resolution rather than silently picked. The sources are: Borg 2020 (740 alloys), Pei 2020 (1,209 alloys), Peivaste 2023 (7,747 alloys).

See data/consolidated/v0.1.0/README.md for the full schema, per-source attribution, and a complete description of the consolidation rules. See data/raw/ for per-source provenance, licenses, and SHA-256s.

What's covered

  • 86.7% of the 7,784 compositions are scorable by every descriptor (δ, VEC, T_m, ΔS_mix, ΔH_mix, Ω) with the current 24-element ELEMENTAL_DATA table
  • 99.6% are scorable for Miedema-based descriptors only (the vendored matminer pair table covers 75 elements)
  • Top elements whose addition would lift coverage to ~95%: Mg, C, Zn, B, Sn, Re (all already in the matminer pair table — pending v0.2.0 data release)

Re-run the coverage analysis on your own version of the dataset with:

python -m hea_bench.benchmark.coverage

Sources and attribution

Every primary source is cited per-row in the consolidated CSV. The data files in data/raw/ carry per-source READMEs with DOIs, licenses, and acquisition SHA-256s.

Source Citation License Status
Borg 2020 Sci. Data 7, 430 (doi:10.1038/s41597-020-00768-9) CC-BY-4.0 Mirrored
Pei 2020 npj Comput. Mater. 6, 50 (doi:10.1038/s41524-020-0308-7) CC-BY-4.0 Mirrored
Peivaste 2023 Sci. Rep. 13, 22556 + GitHub none on data Pointer-only (fetch.py)
Miedema pair enthalpies matminer MiedemaLiquidDeltaHf.tsv BSD-3-Clause Vendored (see descriptors/data/)

Project layout

hea-bench/
├── data/
│   ├── raw/             per-source data with READMEs, licenses, SHAs
│   └── consolidated/    versioned benchmark releases (v0.1.0 here)
├── src/hea_bench/
│   ├── benchmark/       loaders, consolidator, coverage analysis
│   ├── descriptors/     ΔS_mix, δ, VEC, T_m, ΔH_mix, Ω + data tables
│   ├── rules/           four canonical empirical rules as classifiers
│   ├── classifiers/     diagnostic-stats machinery
│   ├── composition.py   formula parser, normalizer
│   ├── constants.py     R = 8.314
│   ├── evaluate.py      orchestrator: rules vs benchmark → headline stats
│   └── cli.py           command-line entry point
├── tests/               155 tests, all passing
├── web/                 Pyodide browser frontend
└── pyproject.toml

Development

git clone <repo>
cd hea-bench
pip install -e ".[dev,data]"
python -m pytest tests/ -q

After modifying any descriptor code or vendored data, rebuild the Pyodide wheel:

python -m pip wheel . --no-deps -w web/dist

Contributing and support

Contributions, bug reports, and dataset additions are welcome. See CONTRIBUTING.md for development setup, the testing convention, and the data-provenance policy. To report a bug or ask a question, open a GitHub issue; for direct contact, email the maintainer at davjfies@gmail.com. Participation is governed by the Code of Conduct.

License

MIT. The vendored matminer Miedema data files remain under their upstream BSD-3-Clause license, preserved at descriptors/data/LICENSE.matminer.txt.

Citation

Citation metadata in CITATION.cff. When citing hea-bench, please also cite the original source datasets (Borg, Pei, Peivaste) and matminer — see data/raw/<source>/README.md for each source's preferred citation.

hea-bench is archived on Zenodo. The concept DOI 10.5281/zenodo.20346287 always resolves to the latest version; v0.1.0 specifically is 10.5281/zenodo.20346288.

Acknowledgements

All numerical parameters, formulas, threshold values, and benchmark numbers are derived from cited primary sources or computed in this codebase from documented inputs; the author verified outputs against the cited literature.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

hea_bench-0.1.0.tar.gz (328.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

hea_bench-0.1.0-py3-none-any.whl (63.0 kB view details)

Uploaded Python 3

File details

Details for the file hea_bench-0.1.0.tar.gz.

File metadata

  • Download URL: hea_bench-0.1.0.tar.gz
  • Upload date:
  • Size: 328.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.1

File hashes

Hashes for hea_bench-0.1.0.tar.gz
Algorithm Hash digest
SHA256 85401203f34c184a8a008991d92f657da8ad14b5c59541fe5fbfc0b1e2bc5617
MD5 cce2d51117d1063a00995a9cbea6d618
BLAKE2b-256 e41fed46a1ed5c64426115beb66b619d07ef54f0deb789cce83cbd01da59a6e9

See more details on using hashes here.

File details

Details for the file hea_bench-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: hea_bench-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 63.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.1

File hashes

Hashes for hea_bench-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 daffb0970fb050b52e936bc1220d14849743b464006de84dc26c4069f2e395d5
MD5 0d0a6b23b143db8f11096026a2bd0fb6
BLAKE2b-256 9fa70803807ecb9ad329b60c051a731317e4870eaa588e8c02607ba4b0152e14

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page