PackingCubes octree implementation for astro data
Project description
packingcubes
Compact octree implementation used for Socket
packingcubes aims to provide a fast, minimal-memory-usage octree
implementation, specialized for use in astronomical/astrophysical contexts.
It's written in pure python, with Numba-based
acceleration of the critical code paths.
View the documentation at packingcubes.readthedocs.io!
Requirements
is required. Python versions outside this range may work, but their usage is
not supported, and at least some features are >Python 3.12.
Python packages
numpy- required for core routinesnumba- required for core routines (and most of the speed)h5py- required for reading snapshot data and savingCubesandPackedTreesxxhash- required for core routines (packed metadata format)
Optional packages
Visualization (the viz group):
matplotlib- to use basic octree visualization and plotting (see Basic_Usage in the Examples)pygfx- to do interactive octree visualization (see Example_PackedTree in the Examples)rendercanvas- to do interactive octree visualizationpyside6- performant interactive octree visualization (uses Qt, other options includewgpu-py,pyodide. See the rendercanvas backends documentation for more details
Jupyter (the jupyter group):
jupyter-rfb- for interactive octree visualization in a notebook (see the Visualization section, above)
Benchmark (the benchmark group):
scipy- we benchmark againstscipy'sKDTreeunyt- for unit-aware timing purposesmatplotlib- for benchmark visualization
The all group combines all of the above.
Basic Usage
Installation
We're on PyPI, so installation is as simple as
pip install packingcubes
or
uv pip install packingcubes
or
pixi add packingcubes --pypi
Additional package requirements can be installed via optional dependencies (see the requirements section for the lists). Examples:
pip install "packingcubes[viz, jupyter]"
or
pixi add packingcubes[all] --pypi
Construction
From a snapshot on disk
From the command line:
packcubes SNAPSHOT OUTPUT
will generate a Cubes data structure and store it in the OUTPUT hdf5 file
along with the sorted positions and shuffle-list.
Alternatively:
import packingcubes
cubes = packingcubes.Cubes("path/to/snapshot.hdf5")
cubes.save("path/to/output.hdf5")
# sorted positions/indices are stored in .snapshot_sorted.hdf5 by default
# to save elsewhere, use
dataset = packingcubes.HDF5Dataset(
"path/to/snapshot.hdf5", sorted_filepath="path/to/output.hdf5"
)
cubes = packingcubes.Cubes(dataset)
From positions in memory
If you already have positions_data as an Nx3 matrix, you can use
cubes = packingcubes.Cubes(positions_data)
Note: this data will be sorted in place! You may want to make a copy first.
You can also do the following for easy saving. The data will still not be copied (by default)
dataset = packingcubes.InMemory(positions=positions_data)
cubes = packingcubes.Cubes(dataset)
dataset.save("path/to/output.hdf5")
Several configuration options are available, see packcubes --help or help(packingcubes.Cubes) for more information.
Loading
You can load the saved Cubes with
import packingcubes
cubes = packingcubes.Cubes("path/to/saved_cubes.hdf5")
Searching
Currently, packingcubes provides multiple public methods for searching your
dataset:
indices = cubes.get_indices_in_sphere(center, radius)
# particle_types is a string or Sequence[str] that maps to a particle type in
# the snapshot
# center is anything that can be converted by numpy's array method to an (3,)
# array, and is the sphere's center
# radius is a float
indices = cubes.get_indices_in_box(box)
# particle_types is a string or Sequence[str] that maps to a particle type in
# the snapshot
# box is anything that can be converted by numpy's array method to an (6,)
# float array where the first 3 elements are the front-left-bottom corner,
# and the second 3 elements are the box width, depth, and height
# (aka [x, y, z, dx, dy, dz])
For both methods, the returned object is an array with 3 columns. Each row can
be considered a chunk of data, which looks like [start, stop, partial]
representing the start and stop indices of contiguous data in the
sorted dataset. The third column, partial denotes whether the chunk
was partially (1) or entirely (0) contained within the sphere/box.
So the search pipeline might go something like:
# cubes are associated with HDF5Dataset dataset created from orig_dataset
# which is an h5py File object
indices = cubes.get_indices_in_sphere(center, radius)
velocities_list = []
with h5py.File(orig_dataset_file) as orig_dataset:
for start, stop, _ in indices:
shuffle = dataset.index[start:stop]
# WARNING: the following could be very slow if shuffle gets
# large (>10000)
v = orig_dataset["PartType0/Velocity"][shuffle,:]
velocities_list.extend(v)
velocities = np.fromiter(velocities_list)
Warning: this could become slow if the shuffle sections get big
(len(shuffle)>1000)! This is because HDF5 loading is inefficient in this
manner (the v = line), not because of the search (the indices =
and shuffle = lines).
The sorting is already performed for the positions information (it was
necessary to construct the tree), but packingcubes does not apply the sort to
the other fields in the snapshot.
So if you only need positional information,
indices = cubes.get_indices_in_sphere(center, radius)
positions_list = []
# We don't need to open the orig_dataset file
dataset.particle_type = "PartType0"
for start,stop in indices_dict["PartType0"]:
positions_list.extend(dataset.positions[start:stop])
positions = np.fromiter(positions_list)
should be very fast.
If you need a different field, (like the velocities, as above), we recommend
preloading the entire velocities array, using the shuffle list to presort it,
and then treating it analogously to the dataset.positions field or saving it
back out:
indices = cubes.get_indices_in_sphere(center, radius)
with h5py.File(orig_dataset_path) as orig_dataset:
loaded_velocities = orig_dataset["PartType0/velocity"]
# Just to make sure we're looking at the correct particle type
dataset.particle_type = "PartType0"
loaded_velocities = loaded_velocities[dataset.index,:]
# use directly
velocities_list = []
for start,stop in indices:
velocities_list.extend(loaded_velocities[start:stop])
# or save it back out
with h5py.File(sorted_velocity_path, "a") as outfile:
outfile["PartType0/velocity"] = loaded_velocities
...
velocities_list = []
with h5py.File(sorted_velocity_path) as vel_dataset:
for start, stop in indices:
velocities_list.extend(
vel_dataset["PartType0/velocity"][start:stop, :]
)
velocities = np.fromiter(velocities_list)
If this seems like a hassle, consider the GUSTEAU project, which among many other improvements, will do all of this additional field sorting (and the original cubing) for you!
KDTree
We have also reimplemented some of the API from the KDTree in scipy.spatial,
notably the query_ball_point method (This is also what the benchmarks compare
against). Example usage modified from the scipy.spatial.KDTree documentation:
>>> import numpy as np
>>> from packingcubes import OpTree as KDTree # this is the only change you need to make
>>> x, y = np.mgrid[0:5, 0:5]
>>> points = np.c_[x.ravel(), y.ravel()]
>>> tree = KDTree(points)
>>> sorted(tree.query_ball_point([2, 0], 1))
[5, 10, 11, 15]
Caveats:
- The provided dataset may be sorted in-place. See the
OpTreeconstructor'sdataandcopy_dataarguments for more information on when that occurs. - Only
queryandquery_ball_pointare fully supported.query_ball_tree,query_pairsandcount_neighborsare a work in progress and will raiseNotImplementedErrorsuntil fully implemented.sparse_distance_matrixis not planned and will always raiseNotImplementedErrors. - Only
query_ball_pointhas been benchmarked and is guaranteed to be performant. We are currently working onquery's performance, but will not plan on beatingscipy. - A number of optional constructor and method arguments are not supported (for
example, setting the distance metric
pto a number other than 2). Some of these may be supported in the future (likep=3), others will not (likebalanced_tree). For the most part, we emit warnings if an argument or its value does not make sense in thepackingcubescontext and try to only raise errors if there is no possible analog and/or a significant change in behavior. - Only 1, 2, and 3D data is supported. PackedTrees specifically expect 3D data
(since it's an octree implementation), thus 1 and 2D data will be copied and
padded with 0s (e.g.
[[1, 2], [3, 4]]will become[[1, 2, 0], [3, 4, 0]]). This should not signficantly impact memory usage (beyond the copy) and the PackedTree should just function as a binary or quadtree. - For performance reasons, some of the
packingcubedefault output formats may not match thescipyoutput formats (arrays instead oflists, for example). For those methods where that's a possibility, additional arguments can be provided which will enforce the the "proper" format (e.g. by settingreturn_lists=True) at a small performance penalty. - Likewise for performance reasons, some of the the default
packingcubeoutput indices will be in terms of the sorted dataset (remember, the dataset will be sorted in-place unless specified otherwise). For those methods, you may be able to specifyreturn_data_inds=Falseto get indices into the unsorted dataset. Alternatively, reference thesort_indexproperty.
Development requirements
uv
To get ready for development, create a virtual enviroment and install the package:
uv venv --python=3.12
source .venv/bin/activate
uv pip install -e ".[dev]"
pre-commit install
We use ruff for formatting. When you go to commit your code, it will automatically be formatted thanks to the pre-commit hook.
Tests are performed using pytest.
pixi
Using with pixi is pretty easy, simply
pixi shell
To look at visualizations, run tests, or develop, simply specify the corresponding environment
# visualizations
pixi shell -e viz
# testing (also includes viz)
pixi shell -e test
# developing (also includes viz & test)
pixi shell -e dev
pre-commit install
and e.g. to run tests, say
pixi run test
which runs pytest --cov=packingcubes in the dev environment.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file packingcubes-0.4.0.tar.gz.
File metadata
- Download URL: packingcubes-0.4.0.tar.gz
- Upload date:
- Size: 820.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
729199e7efd9c5ec15aef296f9a0050269e516eca0ae492735704109b0a4edf1
|
|
| MD5 |
d30b0a5566e37342e7bf3ed2258821e4
|
|
| BLAKE2b-256 |
e9142c8c8dc09d3360b08d77d13c7788504130eba55d6eced01a56758c2256d1
|
Provenance
The following attestation bundles were made for packingcubes-0.4.0.tar.gz:
Publisher:
pypi-publish.yml on astrosocket/packingcubes
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
packingcubes-0.4.0.tar.gz -
Subject digest:
729199e7efd9c5ec15aef296f9a0050269e516eca0ae492735704109b0a4edf1 - Sigstore transparency entry: 1375688542
- Sigstore integration time:
-
Permalink:
astrosocket/packingcubes@e3f751f43fe0d5ea05eb68dbcf012274fdaf8753 -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/astrosocket
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@e3f751f43fe0d5ea05eb68dbcf012274fdaf8753 -
Trigger Event:
release
-
Statement type:
File details
Details for the file packingcubes-0.4.0-py3-none-any.whl.
File metadata
- Download URL: packingcubes-0.4.0-py3-none-any.whl
- Upload date:
- Size: 96.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
564ed3fa132350cd1ad74b2cc2b3d994c2d266f62d797076caf99f1b4efdd2dc
|
|
| MD5 |
343cbdf8d8194811ae402bcba2fdaf02
|
|
| BLAKE2b-256 |
6a3bae5278ece2430cb96bcdfa8cdd6133a71a69eb83fa94ffa17c94024db5bb
|
Provenance
The following attestation bundles were made for packingcubes-0.4.0-py3-none-any.whl:
Publisher:
pypi-publish.yml on astrosocket/packingcubes
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
packingcubes-0.4.0-py3-none-any.whl -
Subject digest:
564ed3fa132350cd1ad74b2cc2b3d994c2d266f62d797076caf99f1b4efdd2dc - Sigstore transparency entry: 1375688605
- Sigstore integration time:
-
Permalink:
astrosocket/packingcubes@e3f751f43fe0d5ea05eb68dbcf012274fdaf8753 -
Branch / Tag:
refs/tags/v0.4.0 - Owner: https://github.com/astrosocket
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yml@e3f751f43fe0d5ea05eb68dbcf012274fdaf8753 -
Trigger Event:
release
-
Statement type: