Atom-HiFi: atomistic high-fidelity representative-set selection framework

These details have not been verified by PyPI

Project links

Project description

Atom-HiFi

Atomistic High-Fidelity representative-set selection framework.

Applications include:

MLIP training-set curation and active-learning loops
Chemical motif identification and distribution analysis
Diversity-aware structure sampling from large databases

What is Atom-HiFi?

Atom-HiFi finds the smallest subset S of a structure library that achieves high Fidelity — meaning S covers the library's atomic-environment diversity efficiently, without redundancy. Agnostic to the downstream task.

Key concepts

Fidelity = L / R

Fidelity is the single optimisation objective. Like a HiFi audio system, it has two channels — L (Left) and R (Right) — whose ratio is maximised. High Fidelity means the selection is both faithful to the library distribution (high Likeness) and compact (low Redundancy).

L — Likeness measures how faithfully S reproduces the library's atomic-environment distribution. Each atom is assigned to a microstate (Voronoi cell in whitened descriptor space from k-means); L is the Shannon entropy ratio over those populations:

L = H(sub) / H(lib)      H = -Σ p_i ln p_i

Shannon entropy H measures distributional diversity — how evenly the population is spread across microstates. L = 1: S perfectly reproduces the library's diversity. L < 1: some environments are under-represented; e.g. L = 0.95 means S retains 95% of the library's distributional diversity.

R — Redundancy measures how many atoms are packed per occupied microstate, relative to the full library:

R = (N_sub / k_occ^sub) / (N_lib / k_occ^lib)

R = 1: same atoms-per-microstate density as the full library (no compression). R < 1: redundancy has been removed; e.g. R = 0.4 means 60% of redundant atoms are eliminated while the occupied microstate coverage is preserved.

The scan sweeps a bandwidth c (scaling factor on ε_noise) and finds c* that maximises Fidelity subject to L ≥ L_TOL (default 0.90). The optimal c* sits at the elbow of the L/R curve — the point where further reducing redundancy begins to cost meaningful distributional diversity.

ED-SOAP descriptor

Embedded Double SOAP — two concatenated SOAP power-spectrum vectors per atom: one short-range (bonding geometry) and one long-range (coordination shell), normalised by a system-specific lengthscale. No GPU required. The full parameter set is exposed in hifi_workflow_tutorial.py under the EDS_* variables.

Installation

Step 1 — install decaf (Descriptor Embedding and Clustering for Atomistic-environment Framework — the clustering backend; not on PyPI):

pip install git+https://gitlab.mpcdf.mpg.de/klai/decaf.git

Step 2 — install Atom-HiFi:

pip install atom-hifi

Python ≥ 3.9 required.

Quick start

pip install atom-hifi installs the atom-hifi command. Write a starter config, edit it, and run:

atom-hifi init                 # writes a commented config.yaml
# edit config.yaml (at minimum: paths.lib_path, paths.focus_elements)
atom-hifi run config.yaml 2>&1 | tee run.out

The generated config.yaml documents every setting inline. The minimum to edit:

paths:
  lib_path: train_structs.xyz   # ASE-readable structure library
  focus_elements: [Ni, O]       # elements to cluster on
  output_dir: fr_results
descriptor:
  kind: eds                     # 'eds' or 'ace'

Python API / custom descriptors

The CLI supports the eds and ace descriptors. A custom descriptor is a Python callable and is supplied via the Python API. hifi_workflow_tutorial.py is the annotated example (included in the repo; pip-only users can fetch it):

curl -O https://gitlab.mpcdf.mpg.de/yhsong/atom-hifi/-/raw/main/hifi_workflow_tutorial.py

Edit its top-level variables (including DESCRIPTOR_FN) and run python hifi_workflow_tutorial.py, or call the runner directly:

from atom_hifi.runner import run
run({'paths': {'lib_path': 'train_structs.xyz', 'focus_elements': ['Ni', 'O']},
     'descriptor': {'kind': 'custom', 'custom_fn': my_descriptor_fn}})

Output files

File	Description
`representatives.xyz`	Selected representative structures
`fine_scan.out`	L, R, F (=L/R), \|S\|, atoms for every fine-scan point
`hifi_final.png`	Coarse + fine Fidelity (F = L/R) scan diagnostic plot
`learning_curve.png`	AL loop convergence (only with `RUN_LOOP=True`)
`eps_noise_raw.npz`	Cached per-element ε_noise values
`desc_lib.pkl`	Cached per-structure descriptors
`surroundings_{el}.xyz`	Per-group coordination spheres (`EXTRACT_SURROUNDINGS=True`)

Configuration reference

All settings live in config.yaml (run atom-hifi init to generate a fully commented template). Keys are grouped:

Group	Keys
paths	`lib_path`, `patient_path`, `focus_elements`, `output_dir`
descriptor	`kind`, `eds.{lengthscale, s_cut, s_nmax, s_lmax, l_cut, l_nmax, l_lmax, periodic, r_cut}`, `ace.{model_path, device, r_cut}`
selection	`method` (`mu_tiebreak` recommended)
scan	`l_tol`, `n_coarse`, `n_fine`, `n_jobs`, `c_factor_range`
eps_noise	`per_species`, `temperature` (K; sets σ_thermal ∝ √T/√mass for ε_noise calibration)
loop / grid / nsga2	`run` + per-stage tuning
refit	`delta`, `grid_point`
output	`delta_pick`, `extract_surroundings`

Unknown keys are rejected. The same configuration can be passed as a nested dict to atom_hifi.runner.run(...); hifi_workflow_tutorial.py is the annotated Python-API equivalent.

Advanced usage

Active-learning loop (RUN_LOOP=True)

Iteratively expands the training pool by sampling batches from the full library. Inner iterations use a coarse scan only; one final fine scan runs at the end. Set INITIAL_SAMPLE and LOOP_SKIP_FINE_SCAN to control the initial pool size and inner-scan resolution.

Per-element ND grid scan (RUN_GRID_SCAN=True)

Sweeps independent c-factors per focus element on a Cartesian grid, reusing cached per-element DECAF fits from the 1-D scan. Cost is O(n^N_el) cover evaluations instead of O(n^N_el × N_el) DECAF fits — tractable for N_el ≤ 3–4. Results in scan_grid.csv and scan_grid_report.png.

NSGA-II Pareto optimisation (RUN_NSGA2=True)

Stochastic multi-objective optimisation of per-element c-factors via NSGA-II (requires pymoo). Use when the grid is too large (N_el ≥ 4) or you want a continuous Pareto front. Results in pareto_front.csv and three diagnostic PNGs.

Representative environment extraction (EXTRACT_SURROUNDINGS=True)

Exports the local coordination sphere around the centroid-closest atom of each DECAF group. Two modes: 'sphere' (non-periodic ASE Atoms cluster) and 'full_structure' (original cell with center/neighbour/rest tags). Output: surroundings_{el}.xyz per focus element.

Citation

If you use Atom-HiFi in your research, please cite:

[paper in preparation — citation will be added upon publication]

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.6.0

Jun 1, 2026

0.5.1 yanked

Jun 1, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

atom_hifi-0.6.0.tar.gz (91.1 kB view details)

Uploaded Jun 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

atom_hifi-0.6.0-py3-none-any.whl (79.4 kB view details)

Uploaded Jun 1, 2026 Python 3

File details

Details for the file atom_hifi-0.6.0.tar.gz.

File metadata

Download URL: atom_hifi-0.6.0.tar.gz
Upload date: Jun 1, 2026
Size: 91.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for atom_hifi-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`a00f57944fac4c533718f83b4130ad3215c78d8421c97b359590026d50c9f48c`
MD5	`3577a707d6909fbd687301f492fe2c2e`
BLAKE2b-256	`49bae88320f4c242190f44b1b330ee17d749aeb92f61effded6ea2bc643c0ffe`

See more details on using hashes here.

File details

Details for the file atom_hifi-0.6.0-py3-none-any.whl.

File metadata

Download URL: atom_hifi-0.6.0-py3-none-any.whl
Upload date: Jun 1, 2026
Size: 79.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.6

File hashes

Hashes for atom_hifi-0.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`c59f6225a4f628ae63d762fbb3ee3db49c009da83cf0a59f370b74bcce25b4da`
MD5	`ec8d5fc42ecc5b31906f40331693215a`
BLAKE2b-256	`d2968f8eab0d6bffee53030c9b7b2dabac0f38010715c11c4d81c8f527ce5039`

See more details on using hashes here.

atom-hifi 0.6.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Atom-HiFi

What is Atom-HiFi?

Key concepts

Fidelity = L / R

ED-SOAP descriptor

Installation

Quick start

Python API / custom descriptors

Output files

Configuration reference

Advanced usage

Citation

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes