LMDB-based storage for ASE.

These details have not been verified by PyPI

Project links

Project description

asebytes

Storage-agnostic, lazy-loading interface for ASE Atoms objects. Pluggable backends (LMDB, Zarr, HDF5/H5MD, HuggingFace Datasets, ASE file formats) behind a single list-like API with pandas-style column views.

pip install asebytes[lmdb]      # LMDB backend (recommended)
pip install asebytes[zarr]      # Zarr backend (fast compression)
pip install asebytes[h5md]      # HDF5/H5MD backend
pip install asebytes[hf]        # HuggingFace Datasets backend

Quick Start

from asebytes import ASEIO

# Write
db = ASEIO("data.lmdb")
db.extend(atoms_list)           # bulk append
db[0] = new_atoms               # replace row
db.update(0, calc={"energy": -10.5})  # partial update

# Read
atoms = db[0]                   # ase.Atoms
atoms = db[-1]                  # negative indexing

Backend is auto-detected from the file extension:

Extension	Backend	Install extra
`*.lmdb`	`LMDBBackend`	`asebytes[lmdb]`
`*.zarr`	`ZarrBackend`	`asebytes[zarr]`
`.h5` / `.h5md`	`H5MDBackend`	`asebytes[h5md]`
`.xyz` / `.extxyz` / `*.traj`	`ASEReadOnlyBackend`	(none)

Lazy Views

Indexing with slices, lists, or strings returns lazy views that load data on demand.

# Row views — lazy, stream one frame at a time
view = db[5:100]                # slice → RowView (nothing loaded yet)
view = db[[0, 42, 99]]         # list of indices → RowView
for atoms in view:
    process(atoms)

# Chunked iteration — loads N rows per batch for throughput
for atoms in db[:].chunked(1000):
    process(atoms)

# Column views — avoid constructing full Atoms objects
energies = db["calc.energy"].to_list()
cols = db[["calc.energy", "calc.forces"]].to_dict()
# → {"calc.energy": [...], "calc.forces": [...]}

# Chaining — slice rows, then select columns
db[0:500]["calc.energy"].to_list()

Persistent Read-Through Cache

For slow or remote sources, cache_to creates a persistent local cache. First pass reads from source and fills the cache; all subsequent reads are served from cache.

db = ASEIO("colabfit://dataset", split="train", cache_to="cache.lmdb")

for atoms in db:    # epoch 1: reads source, populates cache
    train(atoms)
for atoms in db:    # epoch 2+: all reads from local cache
    train(atoms)

Accepts a file path (auto-creates backend) or any WritableBackend instance. No invalidation — delete the cache file to reset.

HuggingFace Datasets

Stream or download datasets from the HuggingFace Hub via URI schemes.

# ColabFit (auto-selects column mapping, streams by default)
db = ASEIO("colabfit://mlearn_Cu_train", split="train")

# OPTIMADE (e.g. LeMaterial)
db = ASEIO("optimade://LeMaterial/LeMat-Bulk", split="train", name="compatible_pbe")

# Generic HuggingFace (requires explicit column mapping)
from asebytes import ColumnMapping
mapping = ColumnMapping(
    positions="pos", numbers="nums",
    calc={"energy": "total_energy"},
)
db = ASEIO("hf://user/dataset", mapping=mapping, split="train")

# Downloaded mode for faster access
db = ASEIO("colabfit://dataset", split="train", streaming=False)

Zarr

Zarr backend with flat layout and Blosc/LZ4 compression. Offers compact file sizes and fast read performance. Supports variable particle counts via NaN padding, append-only writes.

db = ASEIO("trajectory.zarr")
db.extend(atoms_list)

# Custom compression
from asebytes import ZarrBackend
db = ASEIO(ZarrBackend("data.zarr", compressor="zstd", clevel=9))

HDF5 / H5MD

H5MD-standard files with support for variable particle counts, per-frame PBC, and bond connectivity.

db = ASEIO("trajectory.h5", author_name="Jane Doe", compression="gzip")
db.extend(atoms_list)

# Multi-group files
from asebytes import H5MDBackend
groups = H5MDBackend.list_groups("multi.h5")
db = ASEIO("multi.h5", particles_group="solvent")

Key Convention

All data follows a flat namespace:

Prefix	Content	Examples
`arrays.*`	Per-atom arrays	`arrays.positions`, `arrays.numbers`, `arrays.forces`
`calc.*`	Calculator results	`calc.energy`, `calc.stress`
`info.*`	Frame metadata	`info.smiles`, `info.label`
(top-level)	`cell`, `pbc`, `constraints`

from asebytes import atoms_to_dict, dict_to_atoms

d = atoms_to_dict(atoms)   # Atoms → flat dict (~5x faster than encode/decode)
atoms = dict_to_atoms(d)   # flat dict → Atoms

Custom Backends

Implement ReadableBackend for read-only or WritableBackend for read-write:

from asebytes import ASEIO, ReadableBackend

class MyBackend(ReadableBackend):
    def __len__(self): ...
    def columns(self, index=0): ...
    def read_row(self, index, keys=None): ...

db = ASEIO(MyBackend())

Benchmarks

1000 frames each on two datasets — ethanol conformers (small molecules, fixed size) and LeMat-Traj (periodic structures, variable atom counts). All frames include energy, forces, and stress. Compared against aselmdb, znh5md, extxyz, and SQLite.

# LeMat-Traj benchmark data
lemat = list(ASEIO("optimade://LeMaterial/LeMat-Traj", split="train", name="compatible_pbe")[:1000])

Note: HDF5 performance is heavily influenced by compression and chunking settings. Both asebytes H5MD and znh5md use gzip compression by default, which reduces file size at the cost of read/write speed. The Zarr backend uses Blosc/LZ4 compression, which achieves compact file sizes with faster decompression than gzip.

Write

Write Performance

Sequential Read

Read Performance

Random Access

Random Access Performance

Column Access

Column Access Performance

File Size

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.3.3

Jun 1, 2026

0.3.2

May 7, 2026

0.3.1

Mar 10, 2026

0.3.0

Mar 10, 2026

0.3.0a3 pre-release

Mar 1, 2026

0.3.0a2 pre-release

Mar 1, 2026

0.3.0a1 pre-release

Feb 28, 2026

0.2.1

Feb 26, 2026

This version

0.2.0

Feb 20, 2026

0.1.7

Dec 13, 2025

0.1.6

Nov 14, 2025

0.1.5

Nov 7, 2025

0.1.4

Nov 7, 2025

0.1.3

Nov 7, 2025

0.1.2

Nov 7, 2025

0.1.1

Nov 7, 2025

0.1.0

Nov 6, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

asebytes-0.2.0.tar.gz (39.8 kB view details)

Uploaded Feb 20, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

asebytes-0.2.0-py3-none-any.whl (49.3 kB view details)

Uploaded Feb 20, 2026 Python 3

File details

Details for the file asebytes-0.2.0.tar.gz.

File metadata

Download URL: asebytes-0.2.0.tar.gz
Upload date: Feb 20, 2026
Size: 39.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.1 {"installer":{"name":"uv","version":"0.10.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for asebytes-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`e80bfde31b5bdb4a085bd46ed375ee14fb26ea66d3f89915328262a5b81fc48f`
MD5	`5f22597e6d4d83350f7f1e8d786deac6`
BLAKE2b-256	`3f9f85114f55f5ec7c963bf406c371378831b61900faf62d1d11c64491d5233e`

See more details on using hashes here.

File details

Details for the file asebytes-0.2.0-py3-none-any.whl.

File metadata

Download URL: asebytes-0.2.0-py3-none-any.whl
Upload date: Feb 20, 2026
Size: 49.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.10.1 {"installer":{"name":"uv","version":"0.10.1","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for asebytes-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ba9ce34d5871b791b3c467383275bd3673f6947fb0badba6be1fd34e757b02a8`
MD5	`e7336fee3964a23e4a22f685e05c7b1e`
BLAKE2b-256	`a83414ea3cd9c063ed99a32568d8508a1ba166057768416c511ce61133fdf012`

See more details on using hashes here.

asebytes 0.2.0

Navigation

Verified details

Owner

Unverified details

Project links

Meta

Project description

asebytes

Quick Start

Lazy Views

Persistent Read-Through Cache

HuggingFace Datasets

Zarr

HDF5 / H5MD

Key Convention

Custom Backends

Benchmarks

Write

Sequential Read

Random Access

Column Access

File Size

Project details

Verified details

Owner

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes