Compact, lazy-readable HDF5 trajectories with incremental atomistic property storage.

These details have not been verified by PyPI

Project description

dumpDUCK

dumpDUCK stores atomistic trajectories as compact, lazy-readable HDF5 files. It is designed for large MD trajectories where you want to read one frame at a time, and for incremental labelling workflows where new properties are added after the trajectory already exists.

Installation

pip install -e .

Optional Zstandard/Blosc compression:

pip install -e '.[compression]'

Convert a trajectory

LAMMPS dump:

dumpduck convert 4-azif_hda_512FUs_seed4-quenched_1500K_to_300K_rate0.1Kps-equilibrated300K_5ns.dump 4-azif_hda_512FUs_seed4-quenched_1500K_to_300K_rate0.1Kps-equilibrated300K_5ns.h5 \
  --format lammpstrj \
  --type-map '1:C,2:H,3:N,4:Zn' \
  --compression gzip \
  --compression-level 6 \
  --float-dtype float32 \
  --chunk-frames 16

dumpduck convert azif_rmc_2010_nmr_300K_10fs.lammpstrj azif_rmc_2010_nmr_nvt_300K_10fs.h5 \
  --format lammpstrj \
  --type-map '1:C,2:H,3:N,4:Zn' \
  --compression blosc-zstd \
  --compression-level 9 \
  --float-dtype float32 \
  --chunk-frames 64

TYPE_MAP='1:Zn,2:Zn'
TYPE_MAP="${TYPE_MAP},3:H,4:H,5:H,6:H,7:H,8:H,9:H,10:H,11:H,12:H,13:H,14:H"
TYPE_MAP="${TYPE_MAP},15:C,16:C,17:C,18:C,19:C,20:C,21:C,22:C,23:C,24:C,25:C,26:C"
TYPE_MAP="${TYPE_MAP},27:N,28:N,29:N,30:N,31:N,32:N,33:N,34:N"

dumpduck convert \
  zif4_2x2x2_300K_nvt_nmr_nvt_300K_1ns_10fs.lammpstrj \
  zif4_2x2x2_300K_nvt_nmr_nvt_300K_1ns_10fs.h5 \
  --format lammpstrj \
  --type-map "${TYPE_MAP}" \
  --compression blosc-zstd \
  --compression-level 7 \
  --float-dtype float32 \
  --chunk-frames 100 \
  --n-frames 100000

dumpduck info zif4_2x2x2_300K_nvt_nmr_nvt_300K_1ns_10fs.h5

ASE-readable trajectory:

dumpduck convert trajectory.xyz trajectory.h5 --chunk-frames 16

Inspect a file

dumpduck info trajectory.h5

Example output:

file: trajectory.h5
format: dumpduck-hdf5
version: 0.2.0
frames: 100001
atoms: 4352

core datasets:
  positions        shape=(100001, 4352, 3) dtype=float32 chunks=(16, 4352, 3) compression=gzip

properties:
  atomic/shielding_tensors
    shape: (100001, 4352, 3, 3)
    dtype: float32
    valid frames: 2183 / 100001
    units: ppm

Lazy reading

from dump_duck import H5Trajectory

with H5Trajectory('trajectory.h5') as traj:
    atoms = traj[0]

    for atoms in traj.iter_frames(start=0, stop=1000, step=10):
        print(atoms.info['timestep'], atoms.positions.shape)

Only the requested frame is read from disk.

Incremental properties

Properties live under /properties/atomic/<name> or /properties/frame/<name>. Each property has:

data   # actual data
valid  # bool mask saying which frames have been written

This allows sparse labelling: the property can exist for all frames, while only a subset has been computed.

NMR shielding tensors, one frame at a time

from dump_duck import H5Trajectory

with H5Trajectory('trajectory.h5', mode='r+') as traj:
    if not traj.has_property('shielding_tensors', kind='atomic'):
        traj.create_property(
            'shielding_tensors',
            kind='atomic',
            frame_shape=(3, 3),
            dtype='float32',
            units='ppm',
            description='Per-atom NMR shielding tensors',
            compression='gzip',
            compression_level=6,
            chunk_frames=1,
        )

    for i, atoms in enumerate(traj.iter_frames()):
        if traj.property_valid('shielding_tensors', i, kind='atomic'):
            continue

        shielding = calculator.predict_shielding_tensors(atoms)  # shape: (n_atoms, 3, 3)
        traj.write_property('shielding_tensors', i, shielding, kind='atomic')

Chemical shifts

with H5Trajectory('trajectory.h5', mode='r+') as traj:
    traj.create_property(
        'chemical_shifts',
        kind='atomic',
        frame_shape=(),
        dtype='float32',
        units='ppm',
        description='Per-atom NMR chemical shifts',
    )

    traj.write_property('chemical_shifts', 0, shifts, kind='atomic')  # shape: (n_atoms,)

Frame-wise energies

with H5Trajectory('trajectory.h5', mode='r+') as traj:
    traj.create_property('energy', kind='frame', dtype='float64', units='eV')
    traj.write_property('energy', 0, 123.4, kind='frame')

Extract frames

dumpduck extract trajectory.h5 frame_1000.xyz --index 1000

With valid properties included as ASE arrays/info:

dumpduck extract trajectory.h5 labelled.xyz --start 0 --stop 100 --step 10 --include-properties

Compression notes

Portable built-in options:

none, lzf, gzip

Optional plugin options with dumpduck[compression]:

zstd, blosc-zstd

For MD trajectories, a good default is:

gzip level 6, float32, chunk_frames 16

For single-frame random access, use smaller chunks. For better compression and sequential reading, use larger chunks such as 32 or 64.

HDF5 layout

/
  atomic_numbers        (n_atoms,)
  ids                   (n_atoms,)
  lammps_types          optional, (n_atoms,)
  mol_ids               optional, (n_atoms,)

  positions             (n_frames, n_atoms, 3)
  cells                 (n_frames, 3, 3)
  pbc                   (n_frames, 3)
  timesteps             (n_frames,)

  properties/
    atomic/
      <name>/
        data            (n_frames, n_atoms, *frame_shape)
        valid           (n_frames,)
    frame/
      <name>/
        data            (n_frames, *frame_shape)
        valid           (n_frames,)

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.2.3

May 11, 2026

This version

0.2.2

May 11, 2026

0.2.0

May 11, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dumpduck-0.2.2.tar.gz (29.8 MB view details)

Uploaded May 11, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

dumpduck-0.2.2-py3-none-any.whl (20.2 kB view details)

Uploaded May 11, 2026 Python 3

File details

Details for the file dumpduck-0.2.2.tar.gz.

File metadata

Download URL: dumpduck-0.2.2.tar.gz
Upload date: May 11, 2026
Size: 29.8 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dumpduck-0.2.2.tar.gz
Algorithm	Hash digest
SHA256	`ac53d691ac88222db7f6d4f848f3c3d3593b52f7572e5e800ec8aac39ef368fb`
MD5	`58b9e7fb74777eadae78a6491b7c4587`
BLAKE2b-256	`3a740aa21b42f7664108e06c8e3d38fa77858d562a47228982a52c02398bd961`

See more details on using hashes here.

Provenance

The following attestation bundles were made for dumpduck-0.2.2.tar.gz:

Publisher: publish.yaml on tcnicholas/dump-duck

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: dumpduck-0.2.2.tar.gz
- Subject digest: ac53d691ac88222db7f6d4f848f3c3d3593b52f7572e5e800ec8aac39ef368fb
- Sigstore transparency entry: 1507777294
- Sigstore integration time: May 11, 2026
Source repository:
- Permalink: tcnicholas/dump-duck@6153104666688027624e5f77893097030d9fa5c6
- Branch / Tag: refs/tags/v0.2.2
- Owner: https://github.com/tcnicholas
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yaml@6153104666688027624e5f77893097030d9fa5c6
- Trigger Event: release

File details

Details for the file dumpduck-0.2.2-py3-none-any.whl.

File metadata

Download URL: dumpduck-0.2.2-py3-none-any.whl
Upload date: May 11, 2026
Size: 20.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for dumpduck-0.2.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`849a54a7f569abe4baec0ab95acea9fe12d16d14c33ed4c7bf52578dd7ba0de3`
MD5	`0b52fe75c6d0d97f9aaad20991f2865f`
BLAKE2b-256	`b83e15fa44bd571e8ffa7c01ccedd1b687c8d9431e95f541bfcd72cd9fd64f2b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for dumpduck-0.2.2-py3-none-any.whl:

Publisher: publish.yaml on tcnicholas/dump-duck

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: dumpduck-0.2.2-py3-none-any.whl
- Subject digest: 849a54a7f569abe4baec0ab95acea9fe12d16d14c33ed4c7bf52578dd7ba0de3
- Sigstore transparency entry: 1507777415
- Sigstore integration time: May 11, 2026
Source repository:
- Permalink: tcnicholas/dump-duck@6153104666688027624e5f77893097030d9fa5c6
- Branch / Tag: refs/tags/v0.2.2
- Owner: https://github.com/tcnicholas
- Access: private
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yaml@6153104666688027624e5f77893097030d9fa5c6
- Trigger Event: release

dumpduck 0.2.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

dumpDUCK

Installation

Convert a trajectory

Inspect a file

Lazy reading

Incremental properties

NMR shielding tensors, one frame at a time

Chemical shifts

Frame-wise energies

Extract frames

Compression notes

HDF5 layout

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance