Skip to main content

Spatial loading for AREPO.

Project description

SpAREPO

PyPI version

Spatial hasthtable building for AREPO. Built to be extremely lightweight and low-code, with an easily understandable hashtable file.

Purpose

The AREPO snapshots are stored by halo, rather than by spatial region. This makes it complex to load spatially, for instance getting all particles within a 10 Mpc radius of a given point. To do so typically requires loading all data in all snapshot chunks, and rejecting the vast majority of the data. It also means that even if you just want some particle property (like e.g. the internal energies), you still need to load all of the co-ordinates.

sparepo solves this by building a 'hashtable' file, which coarse-grains the particles onto a top-level grid, by interating through the file once. Then, at any point afterwards, you can request any set of cells to be read out of the snapshot, already 'knowing' which files (and which positions in file!) that the particles are in.

sparepo provides utilities to build the hashtables, but also provides the utilities to read said hashtables in python. The file format, described below, for the hashtable is simple so it can be used in other implementations if required.

Requirements

It is recommended that you use a recent version of python for sparepo, at least 3.8 (as that is the lowest that we will test on), and no effort is made to maintain compatibility with any versions of python before this. sparepo requires:

  • numba (and hence llvmlite)
  • numpy
  • h5py
  • attrs

File formatting is taken care of by black and isort.

File Format

The spatial hashtable that is created has the following file structure:

Header/
    Attrs: {
        BoxSize: The box size in given units including h factors.
        NumberOfChunks: M
        Units: Units that length scales are in
        HubbleParam: Hubble parameter
        HubbleParamScaling: For the length units, the exponent of h
    }
Cells/
    Centrers: Nx3 Array of Cell Centers
    Counts/
        PartTypeX : NxM length array of total counts, with M the number of chunks.
    Attrs: {
        Size: 1D size of the cells
        NumberOfCells: N, Total number of cells
        CellsPerAxis: cbrt(N), number of cells per axis.
    }
PartTypeX/
    CellY/
        FileZ: Length O array of indicies.
            Attrs: {
                FileName: Pathless filename of this file.
                FileNumber: Integer file number of this file, helpful
                            for indexing the arrays.
            }

with the indexing into the cell array being specified as:

x_cell * (number_of_cells)**2 + y_cell * number_of_cells + z_cell

Hashtable Creation

Creating a hashtable can be done using the create_hashtable function,

from sparepo import create_hashtable

create_hashtable(
    snapshot="snapdir_099/snapshot_099.0.hdf5",
    cells_per_axis=14,
    hashtable="snapdir_099/spatial_hashtable_099.hdf5"
)

This may take some time, as you might expect. For a 240^3 box, it takes a few seconds and should in principle scale linearly.

Hashtable Reading

Reading from the hashtable is again designed to be simple. Currently two loading strategies are implemented:

  • CartesianSpatialRegion(x=[low, high], y=[low, high], z=[low, high])
  • SphericalSpatialRegion(center=[x, y, z], radius=r)

These can then be used with the SpatialLoader object to load particles from file. Note that the majority of the data (the post-processed hashtables read from file) are stored in the region objects.

from sparepo import SphericalSpatialRegion, SpatialLoader, ParticleType

region = SphericalSpatialRegion(
    center=[16000.0, 16000.0, 16000.0],
    radius=6000.0
)

loader = SpatialLoader(
    hashtable="snapdir_099/spatial_hashtable_099.hdf5",
    snapshot="snapdir_099/snapshot_099.0.hdf5",
)

start = time.time()

x, y, z = loader.read_dataset(
    ParticleType.GAS, field_name="Coordinates", region=region
).T

This will load cells containing at least the particles in a sphere centered on [16000.0, 16000.0, 16000.0] with radius 6000.0. Additional particles will definitel be loaded, as the loading is cell-by-cell rather than particle-by-particle for performance reasons. If you require a strict mask, we encourage you to do that by post-processing the co-ordinates.

The main thing to note here is that particle types are accesed through the ParticleType enum, rather than the usual passing of 'magical' integers.

The second thing to note is that the first time that read_dataset is called it builds a compressed hashtable and reads the required data from the hashtable from file (which is then cached), though the time to do this is typically shorter than the time required to read the data from file.

Note that the reading performance here is actually limited by the loop over indicies (and having to call a h5py read for each of them). Contiguous ranges are read together, which improves performance significantly, so the read performance is actually entirely limited by the data locality. More complex reading schemes may be able to vastly improve the speed of data loading.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

sparepo-0.1.1-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file sparepo-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: sparepo-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.7

File hashes

Hashes for sparepo-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5e57f279c48a5b5756073af976bf05a1b6abcc7bd0506f201e6e9c238d9ea6ac
MD5 e68d16541bc4b3c2a4a84a6e452c1a00
BLAKE2b-256 cfbcc613caa5466f87547b4a86089f65bd27b0955ed4d788a354e7e5a58b2b7b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page