Skip to main content

Compact, lazy-readable HDF5 trajectories with incremental atomistic property storage.

Project description

dumpDUCK

dumpDUCK stores atomistic trajectories as compact, lazy-readable HDF5 files. It is designed for large MD trajectories where you want to read one frame at a time, and for incremental labelling workflows where new properties are added after the trajectory already exists.

Installation

pip install -e .

Optional Zstandard/Blosc compression:

pip install -e '.[compression]'

Convert a trajectory

LAMMPS dump:

dumpduck convert 4-azif_hda_512FUs_seed4-quenched_1500K_to_300K_rate0.1Kps-equilibrated300K_5ns.dump 4-azif_hda_512FUs_seed4-quenched_1500K_to_300K_rate0.1Kps-equilibrated300K_5ns.h5 \
  --format lammpstrj \
  --type-map '1:C,2:H,3:N,4:Zn' \
  --compression gzip \
  --compression-level 6 \
  --float-dtype float32 \
  --chunk-frames 16
dumpduck convert azif_rmc_2010_nmr_300K_10fs.lammpstrj azif_rmc_2010_nmr_nvt_300K_10fs.h5 \
  --format lammpstrj \
  --type-map '1:C,2:H,3:N,4:Zn' \
  --compression blosc-zstd \
  --compression-level 9 \
  --float-dtype float32 \
  --chunk-frames 64
TYPE_MAP='1:Zn,2:Zn'
TYPE_MAP="${TYPE_MAP},3:H,4:H,5:H,6:H,7:H,8:H,9:H,10:H,11:H,12:H,13:H,14:H"
TYPE_MAP="${TYPE_MAP},15:C,16:C,17:C,18:C,19:C,20:C,21:C,22:C,23:C,24:C,25:C,26:C"
TYPE_MAP="${TYPE_MAP},27:N,28:N,29:N,30:N,31:N,32:N,33:N,34:N"

dumpduck convert \
  zif4_2x2x2_300K_nvt_nmr_nvt_300K_1ns_10fs.lammpstrj \
  zif4_2x2x2_300K_nvt_nmr_nvt_300K_1ns_10fs.h5 \
  --format lammpstrj \
  --type-map "${TYPE_MAP}" \
  --compression blosc-zstd \
  --compression-level 7 \
  --float-dtype float32 \
  --chunk-frames 100 \
  --n-frames 100000
dumpduck info zif4_2x2x2_300K_nvt_nmr_nvt_300K_1ns_10fs.h5

ASE-readable trajectory:

dumpduck convert trajectory.xyz trajectory.h5 --chunk-frames 16

Inspect a file

dumpduck info trajectory.h5

Example output:

file: trajectory.h5
format: dumpduck-hdf5
version: 0.2.0
frames: 100001
atoms: 4352

core datasets:
  positions        shape=(100001, 4352, 3) dtype=float32 chunks=(16, 4352, 3) compression=gzip

properties:
  atomic/shielding_tensors
    shape: (100001, 4352, 3, 3)
    dtype: float32
    valid frames: 2183 / 100001
    units: ppm

Lazy reading

from dump_duck import H5Trajectory

with H5Trajectory('trajectory.h5') as traj:
    atoms = traj[0]

    for atoms in traj.iter_frames(start=0, stop=1000, step=10):
        print(atoms.info['timestep'], atoms.positions.shape)

Only the requested frame is read from disk.

Incremental properties

Properties live under /properties/atomic/<name> or /properties/frame/<name>. Each property has:

data   # actual data
valid  # bool mask saying which frames have been written

This allows sparse labelling: the property can exist for all frames, while only a subset has been computed.

NMR shielding tensors, one frame at a time

from dump_duck import H5Trajectory

with H5Trajectory('trajectory.h5', mode='r+') as traj:
    if not traj.has_property('shielding_tensors', kind='atomic'):
        traj.create_property(
            'shielding_tensors',
            kind='atomic',
            frame_shape=(3, 3),
            dtype='float32',
            units='ppm',
            description='Per-atom NMR shielding tensors',
            compression='gzip',
            compression_level=6,
            chunk_frames=1,
        )

    for i, atoms in enumerate(traj.iter_frames()):
        if traj.property_valid('shielding_tensors', i, kind='atomic'):
            continue

        shielding = calculator.predict_shielding_tensors(atoms)  # shape: (n_atoms, 3, 3)
        traj.write_property('shielding_tensors', i, shielding, kind='atomic')

Chemical shifts

with H5Trajectory('trajectory.h5', mode='r+') as traj:
    traj.create_property(
        'chemical_shifts',
        kind='atomic',
        frame_shape=(),
        dtype='float32',
        units='ppm',
        description='Per-atom NMR chemical shifts',
    )

    traj.write_property('chemical_shifts', 0, shifts, kind='atomic')  # shape: (n_atoms,)

Frame-wise energies

with H5Trajectory('trajectory.h5', mode='r+') as traj:
    traj.create_property('energy', kind='frame', dtype='float64', units='eV')
    traj.write_property('energy', 0, 123.4, kind='frame')

Extract frames

dumpduck extract trajectory.h5 frame_1000.xyz --index 1000

With valid properties included as ASE arrays/info:

dumpduck extract trajectory.h5 labelled.xyz --start 0 --stop 100 --step 10 --include-properties

Compression notes

Portable built-in options:

none, lzf, gzip

Optional plugin options with dumpduck[compression]:

zstd, blosc-zstd

For MD trajectories, a good default is:

gzip level 6, float32, chunk_frames 16

For single-frame random access, use smaller chunks. For better compression and sequential reading, use larger chunks such as 32 or 64.

HDF5 layout

/
  atomic_numbers        (n_atoms,)
  ids                   (n_atoms,)
  lammps_types          optional, (n_atoms,)
  mol_ids               optional, (n_atoms,)

  positions             (n_frames, n_atoms, 3)
  cells                 (n_frames, 3, 3)
  pbc                   (n_frames, 3)
  timesteps             (n_frames,)

  properties/
    atomic/
      <name>/
        data            (n_frames, n_atoms, *frame_shape)
        valid           (n_frames,)
    frame/
      <name>/
        data            (n_frames, *frame_shape)
        valid           (n_frames,)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dumpduck-0.2.3.tar.gz (29.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dumpduck-0.2.3-py3-none-any.whl (20.3 kB view details)

Uploaded Python 3

File details

Details for the file dumpduck-0.2.3.tar.gz.

File metadata

  • Download URL: dumpduck-0.2.3.tar.gz
  • Upload date:
  • Size: 29.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dumpduck-0.2.3.tar.gz
Algorithm Hash digest
SHA256 9163f8455fc9caece1d1a9d3627c3fa4ae3816c3af40b9ff6cd6474b03c03820
MD5 6a10d2288744397d71cc72db3f7a15fa
BLAKE2b-256 6e32e42cde815eafa94a6cf83da88dff578994cfa2e83b07fe3020e4fcb64c47

See more details on using hashes here.

Provenance

The following attestation bundles were made for dumpduck-0.2.3.tar.gz:

Publisher: publish.yaml on tcnicholas/dump-duck

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dumpduck-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: dumpduck-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 20.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dumpduck-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 c2e1d44ec09dd00c99e48d02280ef1c6676b97f220b4d9d7fab4c9d12ef2ded5
MD5 455006c1869186f51bc44a1613ec5fb0
BLAKE2b-256 a26c94fcada844c23623ea06e595903b7da0b9f39a50e6d1f36574bd0e0142d1

See more details on using hashes here.

Provenance

The following attestation bundles were made for dumpduck-0.2.3-py3-none-any.whl:

Publisher: publish.yaml on tcnicholas/dump-duck

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page