An open benchmark suite and reference baselines for high-entropy alloy phase prediction.
Project description
hea-bench
An open, reproducible benchmark suite and reference baselines for high-entropy alloy (HEA) phase prediction.
TL;DR
- A consolidated, deduplicated open dataset of 7,784 experimentally characterized multi-principal element alloys, merged from three primary sources (Borg 2020, Pei 2020, Peivaste 2023) with per-row source provenance.
- Reference baseline implementations of the four canonical empirical phase-prediction rules (Yeh ΔSmix, Zhang δ, Guo-Liu VEC, Yang-Zhang Ω), wrapped as proper diagnostic classifiers with sensitivity / specificity / Wilson 95% CIs.
- A clean, dependency-free Python API (
pip install hea-bench) and an in-browser frontend that runs the same library via Pyodide/WebAssembly — no install, no server, no JavaScript re-implementation.
Using an AI coding agent to integrate this? See AGENTS.md for a machine-oriented guide to the API, exact return types and units, the fastest path to each task, and the mistakes to avoid.
Headline benchmark numbers (v0.1.0)
Running the four canonical rules against the consolidated benchmark produces the reference baselines below. These are pinned in tests so any drift in dataset, descriptor code, or rule thresholds surfaces as a test failure.
| Rule | n_eval | Accuracy | Sens (single-phase) | Spec (multi-phase) | Youden's J |
|---|---|---|---|---|---|
| Zhang δ < 6.5% | 6,651 | 56.7% | 99.0% | 8.5% | 0.075 |
| Yang Ω > 1.1 | 6,651 | 54.4% | 95.8% | 7.4% | 0.032 |
The Guo–Liu VEC rule predicts crystal structure rather than single-vs-multi, so it's evaluated stratified to single-phase observations (BCC|FCC only):
| Rule | n_eval | Accuracy | FCC sensitivity | BCC sensitivity |
|---|---|---|---|---|
| Guo–Liu VEC (FCC if VEC ≥ 8.0, BCC if VEC < 6.87) | 3,463 | 66.9% | 92.4% | 48.3% |
Yeh ΔSmix is descriptive (no phase-prediction claim attached) — 47% of the consolidated benchmark passes the 1.5R HEA-class threshold, 37% sits in the MEA bin, 16% is dilute.
The publishable observation: on a consolidated benchmark drawn from three independent open sources, both binary rules collapse to "predict single-phase almost always" (Youden's J ~ 0.03–0.08), and the VEC rule misses about half of observed BCC alloys despite catching 92% of FCC alloys. The canonical rules generalize poorly.
Quick start (Python)
pip install hea-bench
import hea_bench
cantor = {"Co": 0.2, "Cr": 0.2, "Fe": 0.2, "Mn": 0.2, "Ni": 0.2}
hea_bench.smix(cantor) # 13.381 J/(mol·K) = R · ln 5
hea_bench.delta(cantor) # 3.164 % atomic-size mismatch
hea_bench.vec(cantor) # 8.0 valence electrons
hea_bench.mixing_enthalpy(cantor) # -4.16 kJ/mol (Miedema)
hea_bench.omega(cantor) # 5.79 (Yang-Zhang)
# Apply the canonical rules
from hea_bench.rules import zhang_delta, yang_omega, guo_vec
zhang_delta.predict(cantor) # 'single-phase'
yang_omega.predict(cantor) # 'single-phase'
guo_vec.predict(cantor) # 'FCC'
# Run the full rule benchmark against the consolidated v0.1.0 dataset
from hea_bench.evaluate import build_report
report = build_report()
print(report["rules"]["zhang_delta_6_5"]["accuracy"]) # 0.5670
Quick start (CLI)
hea-bench --version
python -m hea_bench.evaluate # run all 4 rules on v0.1.0
python -m hea_bench.benchmark.coverage # coverage analysis on v0.1.0
Quick start (browser, no install)
The same Python library runs in a browser tab via Pyodide. After cloning:
python -m http.server 8000 --directory web
# open http://localhost:8000
First load downloads the Pyodide runtime (~10 MB, cached after) and
the hea_bench wheel. Then the in-page UI computes descriptors
locally — same code, same numerics, no server.
Architecture
┌────────────────────────────┐
│ data/consolidated/v0.1.0/ │
│ - consolidated.csv │
│ - rule_baselines.json │
│ - coverage_report.json │
│ - manifest.json │
└─────────────▲──────────────┘
│
│ produced by
│
┌─────────────────────┐ ┌───────────────┴───────────────┐
│ data/raw/ │ │ src/hea_bench/ │
│ - borg2020/ │───►│ - benchmark/ │
│ - pei2020/ │ │ consolidate.py │
│ - peivaste/ │ │ coverage.py │
│ (per-source READMEs│ │ loaders/{borg,pei,...}.py│
│ + provenance) │ │ - descriptors/{size, vec, │
└─────────────────────┘ │ melting, miedema, omega} │
│ - rules/{yeh, zhang, │
│ guo, yang} │
│ - classifiers/ │
│ diagnostic_stats.py │
│ - evaluate.py │
└──────────────┬────────────────┘
│
│ also runs in
▼
┌──────────────────────────────┐
│ web/ (Pyodide front-end) │
│ - index.html │
│ - dist/hea_bench-*.whl │
└──────────────────────────────┘
What's in the benchmark
data/consolidated/v0.1.0/consolidated.csv — 7,784 unique
compositions × 14 columns:
composition_key— alphabetically sorted element symbols + 4-decimal mole fractions, the canonical join keyn_elements,sources(semicolon-separated)canonical_phase— one ofBCC/FCC/HCP/multi-phase(blank when the contributing sources disagree)has_conflict— 1 when the canonical_phase is blank because of a source-label disagreement- Per-source canonical and raw labels preserved verbatim
borg_processing,borg_doi,source_row_idsfor provenance
100 of the 7,784 compositions are cross-source label conflicts — flagged for downstream resolution rather than silently picked. The sources are: Borg 2020 (740 alloys), Pei 2020 (1,209 alloys), Peivaste 2023 (7,747 alloys).
See data/consolidated/v0.1.0/README.md
for the full schema, per-source attribution, and a complete
description of the consolidation rules. See
data/raw/ for per-source provenance, licenses, and
SHA-256s.
What's covered
- 86.7% of the 7,784 compositions are scorable by every descriptor
(δ, VEC, T_m, ΔS_mix, ΔH_mix, Ω) with the current 24-element
ELEMENTAL_DATAtable - 99.6% are scorable for Miedema-based descriptors only (the vendored matminer pair table covers 75 elements)
- Top elements whose addition would lift coverage to ~95%: Mg, C, Zn, B, Sn, Re (all already in the matminer pair table — pending v0.2.0 data release)
Re-run the coverage analysis on your own version of the dataset with:
python -m hea_bench.benchmark.coverage
Sources and attribution
Every primary source is cited per-row in the consolidated CSV. The
data files in data/raw/ carry per-source READMEs with
DOIs, licenses, and acquisition SHA-256s.
| Source | Citation | License | Status |
|---|---|---|---|
| Borg 2020 | Sci. Data 7, 430 (doi:10.1038/s41597-020-00768-9) | CC-BY-4.0 | Mirrored |
| Pei 2020 | npj Comput. Mater. 6, 50 (doi:10.1038/s41524-020-0308-7) | CC-BY-4.0 | Mirrored |
| Peivaste 2023 | Sci. Rep. 13, 22556 + GitHub | none on data | Pointer-only (fetch.py) |
| Miedema pair enthalpies | matminer MiedemaLiquidDeltaHf.tsv |
BSD-3-Clause | Vendored (see descriptors/data/) |
Project layout
hea-bench/
├── data/
│ ├── raw/ per-source data with READMEs, licenses, SHAs
│ └── consolidated/ versioned benchmark releases (v0.1.0 here)
├── src/hea_bench/
│ ├── benchmark/ loaders, consolidator, coverage analysis
│ ├── descriptors/ ΔS_mix, δ, VEC, T_m, ΔH_mix, Ω + data tables
│ ├── rules/ four canonical empirical rules as classifiers
│ ├── classifiers/ diagnostic-stats machinery
│ ├── composition.py formula parser, normalizer
│ ├── constants.py R = 8.314
│ ├── evaluate.py orchestrator: rules vs benchmark → headline stats
│ └── cli.py command-line entry point
├── tests/ 155 tests, all passing
├── web/ Pyodide browser frontend
└── pyproject.toml
Development
git clone <repo>
cd hea-bench
pip install -e ".[dev,data]"
python -m pytest tests/ -q
After modifying any descriptor code or vendored data, rebuild the Pyodide wheel:
python -m pip wheel . --no-deps -w web/dist
Contributing and support
Contributions, bug reports, and dataset additions are welcome. See
CONTRIBUTING.md for development setup, the
testing convention, and the data-provenance policy. To report a bug
or ask a question, open a GitHub issue; for direct contact, email
the maintainer at davjfies@gmail.com. Participation is governed by
the Code of Conduct.
License
MIT. The vendored
matminer Miedema data files remain
under their upstream BSD-3-Clause license, preserved at
descriptors/data/LICENSE.matminer.txt.
Citation
Citation metadata in CITATION.cff. When citing
hea-bench, please also cite the original source datasets (Borg, Pei,
Peivaste) and matminer — see data/raw/<source>/README.md for each
source's preferred citation.
hea-bench is archived on Zenodo. The concept DOI 10.5281/zenodo.20346287 always resolves to the latest version; v0.1.0 specifically is 10.5281/zenodo.20346288.
Acknowledgements
All numerical parameters, formulas, threshold values, and benchmark numbers are derived from cited primary sources or computed in this codebase from documented inputs; the author verified outputs against the cited literature.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file hea_bench-0.1.0.tar.gz.
File metadata
- Download URL: hea_bench-0.1.0.tar.gz
- Upload date:
- Size: 328.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
85401203f34c184a8a008991d92f657da8ad14b5c59541fe5fbfc0b1e2bc5617
|
|
| MD5 |
cce2d51117d1063a00995a9cbea6d618
|
|
| BLAKE2b-256 |
e41fed46a1ed5c64426115beb66b619d07ef54f0deb789cce83cbd01da59a6e9
|
File details
Details for the file hea_bench-0.1.0-py3-none-any.whl.
File metadata
- Download URL: hea_bench-0.1.0-py3-none-any.whl
- Upload date:
- Size: 63.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
daffb0970fb050b52e936bc1220d14849743b464006de84dc26c4069f2e395d5
|
|
| MD5 |
0d0a6b23b143db8f11096026a2bd0fb6
|
|
| BLAKE2b-256 |
9fa70803807ecb9ad329b60c051a731317e4870eaa588e8c02607ba4b0152e14
|