Pointless Atom STructure with Entropy Diagnostics — structure fuzzer for QC/ML-potential codes
Project description
PASTED
Pointless Atom STructure with Entropy Diagnostics
PASTED is a structure fuzzer for quantum chemistry (QC) and machine-learning potential (MLP) codes. It generates intentionally random, physically meaningless atomic structures and quantifies their disorder through a suite of 13 structural metrics. Useful for stress-testing structure optimizers, generating worst-case inputs for QC codes, or exploring what "maximum chaos" looks like in structural space.
Features
- Four placement modes — random gas (
gas), chain-growth (chain), coordination-complex-like (shell), and maximum-entropy (maxent) - 13 disorder metrics computed per structure, all usable as output filters
- Element pool specified by atomic number (Z = 1–106); composition sampled randomly per structure
- Guaranteed atom count — post-placement L-BFGS repulsion relaxation
ensures
--n-atomsatoms are always delivered regardless of initial density - Auto-scaled
--cutoff— defaults tocov_scale × 1.5 × median(r_i + r_j)over the element pool; all graph, Steinhardt, ring, and charge metrics share this single cutoff - Structure optimizer —
StructureOptimizerruns simulated annealing or basin-hopping on an existing structure to maximize a user-defined disorder objective - Charge/multiplicity parity validation, reproducible via
--seed, incremental output viastream()
Requirements
Python >= 3.10
numpy
scipy
A C++17 compiler is required to build the optional acceleration extensions
(~25x speedup for compute_all_metrics at N=1000). If no compiler is
available, the package falls back to pure Python/NumPy transparently.
Installation
pip install pasted
or from source:
git clone https://github.com/ss0832/pasted.git
cd pasted
pip install -e .
# or run directly without installing:
python pasted.py --help
Verify that the C++ extensions compiled successfully:
from pasted._ext import HAS_RELAX, HAS_MAXENT, HAS_STEINHARDT, HAS_GRAPH
print(HAS_RELAX, HAS_MAXENT, HAS_STEINHARDT, HAS_GRAPH)
# True True True True -> all acceleration active
Quick Start
# 10 atoms drawn from H-Zn, placed randomly in a sphere
pasted --n-atoms 10 --elements 1-30 --charge 0 --mult 1 \
--mode gas --region sphere:8
# Chain structure (C/N/O), 20 samples, filter by disorder
pasted --n-atoms 15 --elements 6,7,8 --charge 0 --mult 1 \
--mode chain --branch-prob 0.4 --n-samples 20 \
--filter "H_total:2.0:-" -o organic_junk.xyz
# Coordination-complex-like structure with Fe center
pasted --n-atoms 12 --elements 6,7,8,26 --charge 0 --mult 1 \
--mode shell --center-z 26 --coord-range 4:6 --n-samples 10
# Stop as soon as 10 disordered structures are found
pasted --n-atoms 15 --elements 1-30 --charge 0 --mult 1 \
--mode gas --region sphere:8 \
--filter "H_total:2.0:-" --n-success 10 --n-samples 500 \
-o disordered.xyz
# Select spatially random electronegativity arrangements (Moran's I near 0)
pasted --n-atoms 50 --elements 1-30 --charge 0 --mult 1 \
--mode gas --region sphere:12 --n-samples 200 \
--filter "moran_I_chi:-0.1:0.1" -o random_en.xyz
Placement Modes
gas (default)
Atoms placed uniformly at random inside a sphere or box. No clash checking at placement time — repulsion relaxation resolves all violations afterward.
--region sphere:R sphere of radius R Angstrom
--region box:L cube of side L Angstrom
--region box:LX,LY,LZ orthorhombic box
chain
Atoms grow one by one from a seed via a random walk with directional persistence. Produces elongated, tree-like structures.
--branch-prob FLOAT branching probability (default: 0.3)
--chain-persist FLOAT directional persistence 0.0-1.0 (default: 0.5)
--chain-bias FLOAT global axis drift; higher -> more rod-like (default: 0.0)
--bond-range LO:HI bond length range Angstrom (default: 1.2:1.6)
shell
One center atom surrounded by a coordination shell, plus tail atoms grown from shell members. Resembles coordination complexes.
--center-z Z atomic number of center atom (default: random)
--coord-range MIN:MAX coordination number range (default: 4:8)
--shell-radius LO:HI shell radius range Angstrom (default: 1.8:2.5)
--bond-range LO:HI tail bond length range Angstrom (default: 1.2:1.6)
maxent
Atoms start from a random gas placement and are repositioned by gradient descent on an angular repulsion potential, spreading neighbor directions as uniformly over the sphere as the distance constraints allow.
--region SPEC same as gas mode (required)
--maxent-steps N gradient-descent iterations (default: 300)
--maxent-lr LR learning rate (default: 0.05)
--maxent-cutoff-scale S neighbour cutoff scale factor (default: 2.5)
Element Pool
--elements SPEC
| Syntax | Meaning |
|---|---|
1-30 |
Z = 1 through 30 (H to Zn) |
6,7,8 |
Z = 6, 7, 8 (C, N, O) |
1-10,26,28 |
Z = 1-10 plus Fe(26) and Ni(28) |
| (omitted) | all Z = 1-106 |
If H (Z = 1) is in the pool and the sampled composition contains no hydrogen,
a random number of H atoms is automatically appended. Disable with
--no-add-hydrogen.
Interatomic Distance Control
PASTED enforces a minimum interatomic distance using Pyykkoe single-bond covalent radii (Pyykkoe & Atsumi, Chem. Eur. J. 15, 186-197, 2009):
d_min(i, j) = cov_scale x (r_i + r_j)
Default --cov-scale 1.0. Post-placement relaxation uses L-BFGS to minimize
a harmonic penalty energy until all violations are resolved (or
--relax-cycles is exhausted).
Disorder Metrics
All 13 metrics are computed for every structure and embedded in the XYZ
comment line. All are usable in --filter.
| Metric | Description | Range |
|---|---|---|
H_atom |
Shannon entropy of element composition | >= 0 |
H_spatial |
Shannon entropy of pairwise-distance histogram | >= 0 |
H_total |
w_atom * H_atom + w_spatial * H_spatial |
>= 0 |
RDF_dev |
RMS deviation of empirical g(r) from ideal-gas baseline | >= 0 |
shape_aniso |
Relative shape anisotropy from gyration tensor | [0, 1] |
Q4, Q6, Q8 |
Steinhardt bond-order parameters | [0, 1] |
graph_lcc |
Largest connected-component fraction at cutoff |
[0, 1] |
graph_cc |
Mean clustering coefficient at cutoff |
[0, 1] |
ring_fraction |
Fraction of atoms in at least one cycle in the cutoff-adjacency graph | [0, 1] |
charge_frustration |
Variance of | delta-chi |
moran_I_chi |
Moran's I spatial autocorrelation for Pauling electronegativity | unbounded |
Unified cutoff
Five metrics share a single adjacency definition: a pair (i, j) is
"adjacent" when d_ij <= cutoff. These are graph_lcc, graph_cc,
ring_fraction, charge_frustration, and moran_I_chi. Using a unified
cutoff prevents the zero-value pathology that occurs when a covalent-radius
threshold is used for bond detection in relaxed structures
(relax_positions guarantees d_ij >= cov_scale * (r_i + r_j)).
The auto cutoff is printed to stderr:
[cutoff] 2.130 Ang (auto: cov_scale=1.0 x 1.5 x median(r_i+r_j)=1.420 Ang)
Override with --cutoff FLOAT when needed.
Moran's I interpretation
moran_I_chi measures how randomly Pauling electronegativity is distributed
in space:
| Value | Meaning |
|---|---|
| I near 0 | Random spatial arrangement — the target for disordered structures |
| I > 0 | Atoms of similar electronegativity cluster spatially (phase separation) |
| I < 0 | Alternating high/low electronegativity (NaCl-like ionic order) |
Note: Moran's I is not bounded to [-1, 1] for sparse weight matrices.
ring_fraction and charge_frustration
ring_fraction counts the fraction of atoms that belong to at least one
cycle in the cutoff-adjacency graph (detected via Union-Find spanning tree).
charge_frustration measures the variance of |delta-chi| across all
adjacent pairs — high values indicate strongly heterogeneous electrostatic
environments.
Filtering
--filter METRIC:MIN:MAX
Use - for an open bound. Multiple flags are ANDed together.
--filter "H_total:2.0:-" # H_total >= 2.0
--filter "Q6:-:0.3" # Q6 <= 0.3
--filter "shape_aniso:0.5:-" # rod-like structures
--filter "graph_lcc:0.8:-" # well-connected graph
--filter "moran_I_chi:-0.1:0.1" # spatially random electronegativity
Output Format
12
sample=3 mode=chain charge=+0 mult=1 comp=[C:4,N:5,O:3] H_atom=1.0986 ... moran_I_chi=-0.0312
C 1.234567 -0.987654 2.345678
N -1.456789 3.210987 -0.123456
...
pasted ... -o out.xyz # XYZ to file, progress to terminal
pasted ... 2>/dev/null | tool # pipe XYZ, discard progress
pasted ... -o /dev/null # progress only (check filter hit rate)
Python API
Functional API
from pasted import generate
structures = generate(
n_atoms=12, charge=0, mult=1,
mode="gas", region="sphere:9",
elements="1-30", n_samples=50, seed=42,
filters=["H_total:2.0:-"],
)
for s in structures:
print(s) # Structure(n=14, comp='C2H8N2O2', mode='gas', H_total=2.341)
print(s.to_xyz())
Class API with n_success
from pasted import StructureGenerator
gen = StructureGenerator(
n_atoms=15, charge=0, mult=1,
mode="gas", region="sphere:8",
elements="1-30",
n_success=10, # stop when 10 structures pass
n_samples=500, # give up after 500 attempts
filters=["H_total:2.0:-"],
seed=42,
)
structures = gen.generate()
Streaming output
for s in gen.stream():
s.write_xyz("out.xyz") # written immediately on each PASS
Structure attributes
s = structures[0]
s.atoms # ['C', 'N', 'H', ...]
s.positions # [(x, y, z), ...]
s.metrics # {'H_atom': 1.09, 'moran_I_chi': -0.03, ...}
s.charge # 0
s.mult # 1
s.mode # 'gas'
s.sample_index # 1
len(s) # 12
Structure optimizer
from pasted import StructureOptimizer
opt = StructureOptimizer(
n_atoms=50, charge=0, mult=1,
objective={"H_total": 1.0, "Q6": -2.0},
elements="24,25,26,27,28", # Cantor alloy
method="annealing",
max_steps=5000,
lcc_threshold=0.8,
seed=42,
)
best = opt.run()
Accessing metrics
s = structures[0]
print(s.metrics["H_total"])
print(s.metrics["moran_I_chi"]) # new in v0.1.12
print(s.metrics["ring_fraction"]) # non-zero in v0.1.13+
Full Option Reference
required:
--n-atoms N number of atoms per structure
--charge INT total system charge
--mult INT spin multiplicity 2S+1
placement mode:
--mode {gas,chain,shell,maxent}
--region SPEC [gas/maxent] sphere:R | box:L | box:LX,LY,LZ
--branch-prob FLOAT [chain] branching probability (default: 0.3)
--chain-persist FLOAT [chain] directional persistence 0.0-1.0 (default: 0.5)
--chain-bias FLOAT [chain] global axis drift (default: 0.0)
--bond-range LO:HI [chain/shell] bond length range Ang (default: 1.2:1.6)
--center-z Z [shell] fix center atom by atomic number
--coord-range MIN:MAX [shell] coordination number range (default: 4:8)
--shell-radius LO:HI [shell] shell radius range Ang (default: 1.8:2.5)
--maxent-steps N [maxent] gradient-descent iterations (default: 300)
--maxent-lr LR [maxent] learning rate (default: 0.05)
--maxent-cutoff-scale S [maxent] neighbour cutoff scale (default: 2.5)
elements:
--elements SPEC atomic-number spec (default: all Z=1-106)
physical constraints:
--cov-scale FLOAT d_min = cov_scale x (r_i + r_j) (default: 1.0)
--relax-cycles INT max L-BFGS iterations for repulsion relaxation (default: 1500)
--no-add-hydrogen disable automatic H augmentation
sampling:
--n-samples INT number of structures to attempt (default: 1)
--n-success INT stop after this many passing structures
--seed INT random seed
metrics:
--n-bins INT histogram bins for H_spatial and RDF_dev (default: 20)
--w-atom FLOAT H_atom weight in H_total (default: 0.5)
--w-spatial FLOAT H_spatial weight in H_total (default: 0.5)
--cutoff FLOAT unified adjacency cutoff Ang for all five cutoff-based
metrics (default: auto = cov_scale x 1.5 x median(r_i+r_j))
filtering:
--filter METRIC:MIN:MAX repeatable; use - for open bound
output:
--validate check charge/mult against one random composition, then exit
-o / --output FILE XYZ output file (default: stdout)
--verbose print per-sample metrics to stderr
Notes and Limitations
- Repulsion relaxation uses L-BFGS (harmonic penalty energy, convergence
criterion E < 1e-12). If
[warn] relax_positions did not convergeappears, the structure may contain marginal distance violations. Increase--relax-cycles. - Unified cutoff: the five cutoff-based metrics all use the same
cutoffparameter. Ring detection and charge frustration are computed on the cutoff-adjacency graph, not the covalent-radius bond graph, so they yield informative non-zero values in relaxed structures. - Moran's I range: not bounded to [-1, 1] for sparse weight matrices. Use it as a relative indicator.
- Pyykkoe radii: for Z > 86 (Fr through Sg), same-group proxies are used.
- Noble gas EN: He/Ne/Ar/Rn = 4.0; Kr = 3.0; Xe = 2.6 (literature estimates from Allen/Allred-Rochow scale).
License
MIT License. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file pasted-0.3.0.tar.gz.
File metadata
- Download URL: pasted-0.3.0.tar.gz
- Upload date:
- Size: 123.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1d89371ef29be838565dd06789900d8c9d1701d2b3faca035fa82ef00194bc04
|
|
| MD5 |
c9b54bdbeccbd95647a6a0cc05cd8984
|
|
| BLAKE2b-256 |
c4acac79f845d768d26361260e0b58c9d2b76d7d685244d576039cb376ab745e
|