lyncs-io·PyPI

I/O functions for Python and LQCD file formats

These details have not been verified by PyPI

Project links

Project description

I/O functions for Python and LQCD file formats

Lyncs IO offers two high-level functions load and save (or dump as alias of save).

The main features of this module are

Seamlessly IO, reading and writing made simple. In most of the cases, after saving save(obj, filename), loading obj=load(filename) returns the original Python object. This feature is already ensured by formats like pickle, but we try to ensure it as much as possible also for other formats.
Many formats supported. The file format can be specified either via the filename's extension or with the option format passed to load/save. The structure of the package is flexible enough to easily accomodate new/customized file formats as these arise. See [Adding a file format] for guidelines.
Support for archives. In case of archives, e.g. HDF5, zip etc., the content can be accessed directly by specifying it in the path. For instance with directory/file.h5/content, directory/file.h5 is the file path, and the remaining is content to be accessed that will be searched inside the file (this is inspired by h5py).
Support for Parallel IO. Where possible and implemented, parallel IO is supported. This is enabled either via MPI providing a valid communicator with the option comm, or via Dask providing the option chunks (see Dask's Array).
Omission of extension. When saving, if the extension is omitted, the optimal file format is deduced from the data type and the extension is added to the filename. When loading, any extension is considered, i.e. filename.*, and if only one match is available, the file is loaded.

Installation

The package can be installed via pip:

pip install [--user] lyncs_io

NOTE: for enabling parallel IO, lyncs_io requires a working MPI installation. This can be installed via apt-get:

sudo apt-get install libopenmpi-dev openmpi-bin

OR using conda:

conda install -c anaconda mpi4py

Parallel IO can then be enabled via

pip install [--user] lyncs_io[mpi]

Documentation

The package provides three high-level functions:

head: loads the metadata of a file (e.g. shape, dtype, etc)
load: loads the content of a file
save or dump: saves data to file

import numpy as np
import lyncs_io as io

arr1 = np.random.rand(10,10,10)
io.save(arr1, "data.npy")

arr2 = io.load("data.npy")

assert (arr1 == arr2).all()

NOTE: for save we use the order data, filename. This is the opposite of what done in numpy but consistent with pickle's dump. This order is preferred because the function can be used directly as a method for a class since self, i.e. the data, would be passed as the first argument of save.

Supported file formats

Format	Extensions	Binary	Archive	Parallel MPI	Parallel Dask
pickle	pkl	yes	no	no	no
dill	dll	yes	no	no	no
JSON	json	no	no	no	no
ASCII	txt	no	no	no	no
Numpy	npy	yes	no	yes	yes
Numpyz	npz	yes	yes	TODO	TODO
HDF5	hdf5,h5	yes	yes	yes	TODO
lime	lime	yes	TODO	yes	yes
Tar	tar, tar.*	-	yes	yes	no
openqcd	oqcd	yes	no	TODO	TODO

IO with HDF5

import numpy as np
import lyncs_io as io

arr1 = np.random.rand(10,10,10)
io.save(arr1, "data.h5/random")

arr2 = np.zeros_like(arr1)
io.save(arr2, "data.h5/zeros")

arrs = io.load("data.h5")
assert (arr1 == arrs["random"]).all()
assert (arr2 == arrs["zeros"]).all()

Also the content of nested dictionary can be stored with HDF5:

import numpy as np
import lyncs_io as io

mydict = {
        "random": {
            "arr0": np.random.rand(10,10,10),
            "arr1": np.random.rand(5,5),
        },
        "zeros":  np.zeros((4, 4, 4, 4)),
    }
# once a dictionary (or mapping) is passed it is written as a group
io.save(mydict, "data.h5")

# all the datasets in the .h5 file are loaded here using all_data argument
loaded_dict = io.load("data.h5", all_data=True)

assert (mydict["random"]["arr0"] == loaded_dict["random"]["arr0"]).all()
assert (mydict["random"]["arr1"] == loaded_dict["random"]["arr1"]).all()
assert (mydict["zeros"] == loaded_dict["zeros"]).all()

Parallel IO via MPI can be enabled with a parallel installation of HDF5. For doing so, please refer to https://docs.h5py.org/en/stable/mpi.html.

IO with MPI

import numpy as np
import lyncs_io as io
from mpi4py import MPI

# Assume 2D cartesian topology
comm = MPI.COMM_WORLD
dims = (2,2) # e.g. 4 procs
cartesian2d = comm.Create_cart(dims=dims)

oarr = np.random.rand(6, 4, 2, 2)
io.save(oarr, "pario.npy", comm=cartesian2d)
iarr = io.load("pario.npy", comm=cartesian2d)

assert (iarr == oarr).all()

NOTE: Parallel IO is enabled once a valid cartesian communicator is passed to load or save routines, otherwise Serial IO is performed. Currently only numpy format supports this functionality.

IO with Dask

import lyncs_io as io
import dask
from distributed import Client, progress

client = Client(n_workers=2, threads_per_worker=1)

x = da.arange(0,128).reshape((16, 8)).rechunk(chunks=(8,4))

xout_lazy = io.save(x, "pario.npy")
xin_lazy = io.load("pario.npy", chunks=(8,4))

assert (x.compute() == xin_lazy.compute()).all()
client.shutdown()

NOTE: Parallel IO with Dask is enabled once a valid chunk size is passed to load routine using chunks parameter. For save routine, the DaskIO is enabled only if the array passed is a Dask Array. Currently only numpy format supports this functionality.

IO with Tar

import numpy as np
import lyncs_io as io

arr1 = np.random.rand(10,10,10)
io.save(arr1, "data.tar/random.npy")

arr2 = np.zeros_like(arr1)
io.save(arr2, "data.tar/zeros.npy")

arrs = io.load("data.tar")

assert (arr1 == arrs["random.npy"]).all()
assert (arr2 == arrs["zeros.npy"]).all()

Also the content of nested dictionary can be stored with Tar:

mydict = {
  "random": {
		"arr0.npy": np.random.rand(10,10,10),
		"arr1.npy": np.random.rand(5,5),
	},
	"zeros.npy": np.zeros((4, 4, 4, 4)),
}

io.save(mydict, 'data.npy')

loaded_dict = io.load('data.npy', all_data=True)

assert (mydict["random"]["arr0.npy"] == loaded_dict["random"]["arr0.npy"]).all()
assert (mydict["random"]["arr1.npy"] == loaded_dict["random"]["arr1.npy"]).all()
assert (mydict["zeros.npy"] == loaded_dict["zeros.npy"]).all()

Note:

Some formats inside a Tar are not currently supported. (See Issues)
When loading/saving a file in series, it's done directly on the memory. When in parallel, files are first written on the secondary storage before being saved/loaded.

Adding a file format

New file formats can be registered providing providing where possible the respective head, load and save functions. For example the pickle file format can be registered as follow:

import pickle
from lyncs_io.formats import register

register(
    "pickle",                         # Name of the format
    extensions=["pkl"],               # List of extensions
    head=None,                        # Function called by head (omitted)
    load=pickle.load,                 # Function called by load
    save=pickle.dump,                 # Function called by save
    description="Pickle file format", # Short description
)

Acknowledgments

Authors

Simone Bacchio (sbacchio)
Christodoulos Stylianou (cstyl)
Alexandros Angeli (alexandrosangeli)

Fundings

PRACE-6IP, Grant agreement ID: 823767, Project name: LyNcs.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.2.3

Jun 15, 2022

0.2.2

Mar 24, 2022

0.2.1

Mar 24, 2022

0.2.0

Aug 14, 2021

0.1.0

Jul 22, 2021

0.0.4

May 18, 2021

0.0.3

Mar 29, 2021

0.0.2

Feb 24, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lyncs_io-0.2.3.tar.gz (30.2 kB view details)

Uploaded Jun 15, 2022 Source

File details

Details for the file lyncs_io-0.2.3.tar.gz.

File metadata

Download URL: lyncs_io-0.2.3.tar.gz
Upload date: Jun 15, 2022
Size: 30.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.1 CPython/3.10.4

File hashes

Hashes for lyncs_io-0.2.3.tar.gz
Algorithm	Hash digest
SHA256	`82590ec0a3b8a4addfb8055018339abba6ae4b068d9d6ec6003e72136876dce4`
MD5	`bb5ab5748c52748e22222d92a5a49188`
BLAKE2b-256	`4c7381e7f2d982ecb6f1913ebb37eda9c445a40b143f3c02f805789896bb8161`

See more details on using hashes here.

lyncs-io 0.2.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

I/O functions for Python and LQCD file formats

Installation

Documentation

Supported file formats

IO with HDF5

IO with MPI

IO with Dask

IO with Tar

Note:

Adding a file format

Acknowledgments

Authors

Fundings

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes