Pointless Atom STructure with Entropy Diagnostics — structure fuzzer for QC/ML-potential codes
Project description
PASTED
Pointless Atom STructure with Entropy Diagnostics
PASTED is a structure fuzzer for quantum chemistry and machine learning potential codes.
A CLI tool that generates intentionally random, physically meaningless atomic structures and evaluates their degree of disorder through a suite of structural metrics. Useful for stress-testing structure optimizers, generating worst-case inputs for quantum chemistry codes, or studying what "maximum chaos" looks like in structural space.
Features
- Three placement modes — fully random (
gas), chain-growth (chain), coordination-complex-like (shell) - 10 disorder metrics computed per structure, all usable as output filters
- Element pool specified by atomic number (Z = 1–106, H through Sg); composition sampled randomly per structure
- Always outputs
--n-atomsatoms — placement is unrestricted; Pyykkö covalent radii are enforced by mandatory post-placement repulsion relaxation - Auto-scaled
--cutoff— defaults tocov_scale × 1.5 × median(r_i + r_j)over the element pool, so graph and Steinhardt metrics are meaningful regardless of which elements are used - Charge/multiplicity parity validation before any geometry is generated
- Multi-structure batch generation with
--n-samples; per-structure progress on stderr, XYZ on stdout - Reproducible runs via
--seed
Requirements
Python >= 3.10
numpy
scipy
pip install numpy scipy
Installation
git clone https://github.com/ss0832/pasted.git
cd pasted
# no build step required; run directly
python pasted.py --help
or
pip install pasted
pasted --help
Quick Start
# 10 atoms drawn from H–Zn, placed randomly in a sphere of radius 8 Å
python pasted.py --n-atoms 10 --elements 1-30 --charge 0 --mult 1 \
--mode gas --region sphere:8
# If you installed it via Pip, you can use the command below.
pasted --n-atoms 10 --elements 1-30 --charge 0 --mult 1 \
--mode gas --region sphere:8
# Organic-looking chain structure (C/N/O only)
python pasted.py --n-atoms 15 --elements 6,7,8 --charge 0 --mult 1 \
--mode chain --branch-prob 0.4 --n-samples 20 -o organic_junk.xyz
# Coordination-complex-like structure with Fe as center
python pasted.py --n-atoms 12 --elements 6,7,8,26 --charge 0 --mult 1 \
--mode shell --center-z 26 --coord-range 4:6 --n-samples 10
# Generate 100 structures, keep only the most disordered ones
python pasted.py --n-atoms 12 --elements 1-30 --charge 0 --mult 1 \
--mode gas --region sphere:9 --n-samples 100 \
--filter H_total:2.0:- --filter shape_aniso:0.3:- -o disordered.xyz
Placement Modes
gas (default)
Atoms are placed independently and uniformly at random inside the specified region.
Closest to true spatial randomness; highest expected H_spatial.
--region sphere:R sphere of radius R Å
--region box:L cube of side L Å
--region box:LX,LY,LZ orthorhombic box
chain
Atoms grow one by one from a seed atom via a random walk with directional persistence.
At each step, a random active tip is selected and the new atom is placed at a random bond length.
The direction of each step is constrained by --chain-persist to avoid self-tangling.
A branching probability controls whether the old tip is kept (branch) or replaced (linear advance).
Produces elongated, tree-like structures.
--branch-prob FLOAT branching probability (default: 0.3)
--chain-persist FLOAT directional persistence 0.0–1.0 (default: 0.5)
0.0 = fully random (may self-tangle)
0.5 = rear 120° cone excluded
1.0 = front hemisphere only, nearly straight
--bond-range LO:HI bond length range in Å (default: 1.2:1.6)
shell
One atom is placed at the origin as the "center", surrounded by a coordination shell at a random radius, followed by tail atoms growing from shell members. Produces structures that superficially resemble coordination complexes.
--center-z Z atomic number of center atom
(default: random from the sample's composition)
--coord-range MIN:MAX coordination number range (default: 4:8)
--shell-radius LO:HI shell radius range in Å (default: 1.8:2.5)
--bond-range LO:HI tail bond length range in Å (default: 1.2:1.6)
The center atom and its Z are recorded in the XYZ comment line as center=Fe(Z=26).
Element Pool
--elements SPEC
Elements are specified by atomic number. Omit to use all supported elements (Z = 1–106).
| Syntax | Meaning |
|---|---|
1-30 |
Z = 1 through 30 (H to Zn) |
6,7,8 |
Z = 6, 7, 8 (C, N, O) |
1-10,26,28 |
Z = 1–10 plus Fe(26) and Ni(28) |
72-80 |
5d metals Hf through Hg |
| (omitted) | all Z = 1–106 |
For each structure, --n-atoms elements are drawn independently and uniformly from this pool.
The resulting composition varies per sample.
If H (Z = 1) is in the pool and the sampled composition contains no hydrogen, a random number of H atoms is automatically appended (approximately 1 + uniform(0,1) × n_atoms × 1.2). This can be disabled with --no-add-hydrogen.
Charge and Multiplicity
--charge and --mult are required and apply to every generated structure.
Before placement, PASTED checks two conditions against the randomly sampled composition:
- Total electron count
N_e = Σ Z − charge > 0 - Parity:
N_e % 2 == (mult − 1) % 2
Structures that fail either check are logged as [invalid] and skipped.
High-spin vs. low-spin selection is not enforced; that is the user's responsibility.
Because composition is random, parity failures are common when the element pool contains many odd-Z elements and mult=1 is specified. Increasing --n-samples or using --mult 2 reduces this.
Interatomic Distance Control
PASTED enforces a minimum interatomic distance using Pyykkö single-bond covalent radii (Pyykkö & Atsumi, Chem. Eur. J. 15, 186–197, 2009).
The threshold for each atom pair (i, j) is:
d_min(i, j) = cov_scale × (r_i + r_j)
- Default
--cov-scale 1.0= exact sum of covalent radii. - Values below 1.0 allow closer contacts; values above 1.0 enforce additional clearance.
- Z > 86 (Fr through Sg): no single-bond literature values are available. PASTED uses the same-group nearest lighter element as a proxy (e.g. Fr → Cs, U → Nd, Rf → Hf).
Post-placement repulsion relaxation
Placement does not check for distance violations — atoms are placed freely in the requested geometry (region/chain/shell). After placement, a mandatory repulsion relaxation step resolves all violations iteratively: for each pair below the threshold, both atoms are pushed apart along their connecting vector by half the deficit. This repeats until no violations remain or --relax-cycles is exhausted.
This design guarantees that --n-atoms atoms are always placed, regardless of how crowded the initial configuration is. If relaxation does not converge within --relax-cycles, a [warn] line is printed to stderr and the structure is output as-is.
Disorder Metrics
All metrics are computed for every structure and embedded in the XYZ comment line.
All are usable in --filter.
| Metric | Description | Range |
|---|---|---|
H_atom |
Shannon entropy of element composition | 0 (single element) to ln(k) |
H_spatial |
Shannon entropy of the pairwise-distance histogram | higher = more uniform distances |
H_total |
Weighted sum: w_atom · H_atom + w_spatial · H_spatial |
— |
RDF_dev |
RMS deviation of empirical g(r) from ideal-gas baseline | 0 = perfectly random |
shape_aniso |
Relative shape anisotropy from the gyration tensor | 0 = spherical, 1 = rod-like |
Q4, Q6, Q8 |
Steinhardt bond-orientational order parameters (averaged over atoms) | 0 = disordered |
graph_lcc |
Fraction of atoms in the largest connected component at --cutoff |
0–1 |
graph_cc |
Mean clustering coefficient at --cutoff |
0–1 |
Distance cutoff for graph and Steinhardt metrics
The --cutoff parameter determines which atom pairs are considered "connected" for graph_lcc, graph_cc, and Q4/Q6/Q8. Setting this too small relative to the actual interatomic distances causes all metrics to collapse to zero (no neighbours found); setting it too large makes every atom a neighbour of every other.
By default, --cutoff is set automatically to:
cutoff = cov_scale × 1.5 × median(r_i + r_j) over all element-pool pairs
This scales with the element pool: light-element pools (e.g. C/N/O) get a cutoff around 2.1 Å; 5d-metal pools get around 3.8 Å. The auto value is printed to stderr at startup:
[cutoff] 2.130 Å (auto: cov_scale=1.0 × 1.5 × median(r_i+r_j)=1.420 Å)
Override with --cutoff FLOAT when needed.
Other metric tuning
--n-bins N histogram bins for H_spatial and RDF_dev (default: 20)
--w-atom FLOAT weight of H_atom in H_total (default: 0.5)
--w-spatial FLOAT weight of H_spatial in H_total (default: 0.5)
Filtering
--filter METRIC:MIN:MAX
Only structures whose metric falls in [MIN, MAX] are written to output.
Use - for an open bound.
The flag is repeatable; all conditions must be satisfied simultaneously.
# Keep structures with high total entropy
--filter H_total:2.0:-
# Keep elongated structures (rod-like)
--filter shape_aniso:0.5:-
# Keep well-connected chains
--filter graph_lcc:0.8:- --filter graph_cc:0.4:-
# Keep structures with low local order (no accidental crystallinity)
--filter Q6:-:0.4
Output Format
Structures are written as a concatenated multi-structure XYZ file.
Progress and statistics are written to stderr; the XYZ data goes to stdout (or --output).
12
sample=3 mode=chain charge=+0 mult=1 comp=[C:4,N:5,O:3] H_atom=1.0986 H_spatial=2.7812 H_total=1.9399 RDF_dev=3.2451 shape_aniso=0.5123 Q4=0.5210 Q6=0.5880 Q8=0.6014 graph_lcc=1.0000 graph_cc=0.5714
C 1.234567 -0.987654 2.345678
N -1.456789 3.210987 -0.123456
...
# XYZ to file, progress to terminal
python pasted.py ... -o out.xyz
# pipe XYZ, discard progress
python pasted.py ... 2>/dev/null | downstream_tool
# progress only (dry run to check filter hit rate)
python pasted.py ... -o /dev/null
Full Option Reference
required:
--n-atoms N number of atoms per structure
--charge INT total system charge
--mult INT spin multiplicity 2S+1
placement mode:
--mode {gas,chain,shell}
--region SPEC [gas] sphere:R | box:L | box:LX,LY,LZ
--branch-prob FLOAT [chain] branching probability (default: 0.3)
--chain-persist FLOAT [chain] directional persistence 0.0–1.0 (default: 0.5)
--bond-range LO:HI [chain/shell] bond length range Å (default: 1.2:1.6)
--center-z Z [shell] fix center atom by atomic number
--coord-range MIN:MAX [shell] coordination number range (default: 4:8)
--shell-radius LO:HI [shell] shell radius range Å (default: 1.8:2.5)
elements:
--elements SPEC atomic-number spec (default: all Z=1-106)
placement:
--cov-scale FLOAT min dist = cov_scale × (r_i + r_j), Pyykkö radii (default: 1.0)
--relax-cycles INT max cycles for post-placement repulsion relaxation (default: 1500)
--no-add-hydrogen disable automatic H augmentation
sampling:
--n-samples INT number of structures to attempt (default: 1)
--seed INT random seed
metrics:
--n-bins INT histogram bins (default: 20)
--w-atom FLOAT H_atom weight in H_total (default: 0.5)
--w-spatial FLOAT H_spatial weight in H_total (default: 0.5)
--cutoff FLOAT distance cutoff Å for Q_l / graph_*
(default: auto = cov_scale × 1.5 × median(r_i+r_j))
filtering:
--filter METRIC:MIN:MAX repeatable; use - for open bound
output:
--validate check charge/mult against one random composition, then exit
-o / --output FILE XYZ output file (default: stdout)
Notes and Limitations
- Interatomic distances use Pyykkö (2009) single-bond covalent radii. For Z > 86 (Fr through Sg), same-group proxies are used (e.g. Fr → Cs, U → Nd, Rf → Hf).
- Repulsion relaxation guarantees that no pair falls below
cov_scale × (r_i + r_j)when it converges. If[warn] relax_positions did not convergeappears, the structure may contain marginal violations but is still output. Increase--relax-cyclesif convergence is important. - Auto cutoff is computed from the element pool before any structures are generated and is fixed for the entire run. If the actual composition drawn per sample is much lighter or heavier than the pool median, the effective neighbour count may still be low or high. Use
--cutoffto override when needed. - RDF_dev is a finite-system approximation; treat it as a relative indicator.
- Charge/mult parity failures are common with large element pools and
mult=1. Increase--n-samplesor use--mult 2to compensate.
Retrieving Structures from the Class API
Basic usage: receive a list via generate()
from pasted import StructureGenerator
gen = StructureGenerator(
n_atoms=12, charge=0, mult=1,
mode="gas", region="sphere:9",
elements="1-30", n_samples=50, seed=42,
)
structures = gen.generate() # list[Structure]
Attributes available on a Structure object
s = structures[0]
# Element symbols
s.atoms # ['C', 'N', 'H', 'O', ...]
# Cartesian coordinates (Å)
s.positions # [(x, y, z), (x, y, z), ...]
# Disorder metrics
s.metrics # {'H_atom': 1.09, 'H_spatial': 2.73, ...}
# Charge and spin multiplicity
s.charge # 0
s.mult # 1
# Placement mode / sequential index / seed
s.mode # 'gas'
s.sample_index # 1
s.seed # 42
# Shell mode only
s.center_sym # 'Fe'
# Number of atoms
len(s) # 12
XYZ output
# As a string
xyz_str = s.to_xyz()
print(xyz_str)
# Write to file (append multiple structures)
for i, s in enumerate(structures):
s.write_xyz("out.xyz", append=(i > 0))
Converting positions to a numpy array
positions is a list of tuples. To obtain a numpy array:
import numpy as np
coords = np.array(s.positions) # shape (n_atoms, 3)
Iterating directly over the generator
for s in gen: # __iter__ calls generate() internally
print(s)
print(s.metrics["H_total"])
Filtering structures by metric
gen = StructureGenerator(
n_atoms=12, charge=0, mult=1,
mode="gas", region="sphere:9",
elements="6,7,8", n_samples=100,
filters=["H_total:2.0:-", "shape_aniso:0.5:-"],
)
disordered = gen.generate() # contains only structures that passed all filters
License
MIT License. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pasted-0.1.2.tar.gz.
File metadata
- Download URL: pasted-0.1.2.tar.gz
- Upload date:
- Size: 45.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
16712e5bed6561fb51c52386851c3250ebdd747045f53ca0188985843b36308a
|
|
| MD5 |
8d7d008e7970aa508becbab63b468508
|
|
| BLAKE2b-256 |
90adeb73fc5df59ba6636615ed58711055e3ff967c828ff2bfef2bbdbd3fb208
|
File details
Details for the file pasted-0.1.2-py3-none-any.whl.
File metadata
- Download URL: pasted-0.1.2-py3-none-any.whl
- Upload date:
- Size: 36.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cc343ad91d257513b8c261eb5ae872d4d8efa6cb9cc09a59f584c904a63de1e6
|
|
| MD5 |
3467b08413c80c3521e3a8c2d9bb810f
|
|
| BLAKE2b-256 |
de6ffa6edadc1b432a6fff08790615a0ed8ecd9c9b9c025af4549071f520e791
|