PyActiveStorage

A Python client for Active Storage

These details have been verified by PyPI

Project links

Code
Issues

GitHub Statistics

Maintainers

davidhassell valeriupredoi

These details have not been verified by PyPI

Project description

pyactivestoragelogo

PyActiveStorage

Create virtual environment

Use a Miniconda3 package maintainer tool, download for Linux.

(base) conda install -c conda-forge mamba
(base) mamba env create -n activestorage -f environment.yml
conda activate activestorage

Install with `pip`

pip install -e .

Run tests

pytest -n 2

Main dependencies

Python versions supported: 3.10, 3.11, 3.12, 3.13. Fully compatible with numpy >=2.0.0.
Pyfive needs to be pinned >=0.5.0 (first fully upgraded Pyfive version).

Active Storage Data Interface

This package provides

the class Active, which is a shimmy to NetCDF4 (and HDF5) via a Pyfive.File file object
The actual reads are done in the methods of storage.py or reductionist.py, which are called from within an Active.__getitem__.

Example usage is in the test files, depending on the case:

but it's basically this simple:

active = Active(file.Path | Pyfive.Dataset, ncvar="some_var")
active._version = 2
result = active.mean[0:2, 4:6, 7:9]

where result will be the mean of the appropriate slice of the hyperslab in some_var variable data.

There are some (relatively obsolete) documents from our exploration of zarr internals in the docs4understanding, but they are not germane to the usage of the Active class.

Storage types

PyActiveStorage is designed to interact with various storage backends. The storage backend is automatically detected, but can still be specified using the interface_type argument to the Active constructor. There are two main integration points for a storage backend:

#. Load netCDF metadata #. Perform a reduction on a storage chunk (the reduce_chunk function)

Local file

The default storage backend is a local file. To use a local file, use a interface_type of None, which is its default value. netCDF metadata is loaded using the netCDF4 library. The chunk reductions are implemented in activestorage.storage using NumPy.

S3-compatible object store

We now have support for Active runs with netCDF4 files on S3, from PR 89. To achieve this we integrate with Reductionist, an S3 Active Storage Server. Reductionist is typically deployed "near" to an S3-compatible object store and provides an API to perform numerical reductions on object data. To use Reductionist, use a interface_type of s3.

To load metadata, netCDF files are opened using s3fs, with h5netcdf used to put the open file (which is nothing more than a memory view of the netCDF file) into an hdf5/netCDF-like object format. Chunk reductions are implemented in activestorage.reductionist, with each operation resulting in an API request to the Reductionist server. From there on, Active works as per normal.

HTTPS-compatible on an NGINX server

The same infrastructure as for S3, but the file is passed in as an https URI.

Testing overview

We have written unit and integration tests, and employ a coverage measurement tool - Codecov, see PyActiveStorage test coverage with current coverage of 87%; our Continuous Integration (CI) testing is deployed on Github Actions, and we have nightly tests that run the entire testing suite, to be able to detect any issues introduced by updated versions of our dependencies. Github Actions (GA) tests also test the integration of various storage types we currently support; as such, we have dedicated tests that test Active Storage with S3 storage (by creating and running a MinIO client from within the test, and deploying and testing PyActiveStorage with data shipped to the S3 client).

Of particular interest are performance tests, and we have started using tests that measure system run time and resident memory (RES); we use pytest-monitor for this purpose, inside the GA CI testing environemnt. So far, performance testing showed us that HDF5 chunking is paramount for performance ie a large number of small HDF5 chunks leads to very long system run times, and high memory consumption; however, larger HDF5 chunks significantly increase performance – as an example, running PyActiveStorage on an uncompressed netCDF4 file of size 1GB on disk (500x500x500 data elements, float64 each), with optimal HDF5 chunking (eg 75 data elements per chunk, on each dimesnional axis) takes order 0.1s for a local POSIX storage and 0.3s for the case when the file is on an S3 server; the same run needs only order approx. 100MB of RES memory for each of the two storage options see test result; the same types of runs with much smaller HDF5 chunks (eg 20x smaller) will need order a factor of 300 more time to complete, and order a few GB of RES memory.

Testing HDF5 chunking

Test No. 1 specs

netCDF4 1.1GB file (on disk, local)
no compression, no filters
data shape = (500, 500, 500)
chunks = (75, 75, 75)

Ran a null test

(only test module for imports and fixtures)

Ran 30 instances = 101-102M max RES

Run kerchunk's translator to JSON

Ran 30 instances = 103M max RES

Ran an Active v1 test

30 tests = 107-108M max RES

So kerchunking only takes 1-2M of RES memory; Active in total ~7M RES memory!

Test No. 2 specs

netCDF4 1.1GB file (on disk, local)
no compression, no filters
data shape = (500, 500, 500)
chunks = (25, 25, 25)

Run kerchunk's translator to JSON

Ran 30 instances = 111M max RES

Ran an Active v1 test

30 tests = 114-115M max RES

Kerchunking needs 9MB and Active v1 in total 13-14M of max RES memory

Test No. 3 specs

netCDF4 1.1GB file (on disk, local)
no compression, no filters
data shape = (500, 500, 500)
chunks = (8, 8, 8)

Run kerchunk's translator to JSON

Ran 30 instances = 306M max RES

Ran an Active v1 test

30 tests = 307M max RES

Kerchunking needs ~200MB same as Active in total - kerchunking is memory-dominant in the case of tiny HDF5 chunks.

Some conclusions

HDF5 chunking is make or break
Memory appears to grow expentially of form F(M) = M0 + C x M ^ b where M0 is the startup memory (module imports, test fixtures etc - here, about 100MB RES), C is a constant (probably close to 1), and b is the factor at which chunks decrease in size (along one axis, eg 3 here)

Documentation

See available Sphinx documentation. To build locally the documentation run:

sphinx-build -Ea doc doc/build

Docs are webhooked to build on Pull Requests, and pushes.

Code coverage (test coverage)

We monitor test coverage via the Codecov app and employ a bot that displays coverage changes introduced in every PR; the bot posts a comment directly to the PR, in which coverage variations introduced by the proposed code changes are displayed.

Project details

These details have been verified by PyPI

Project links

Code
Issues

GitHub Statistics

Maintainers

davidhassell valeriupredoi

These details have not been verified by PyPI

Release history Release notifications | RSS feed

0.7.0

Apr 22, 2026

This version

0.6.0

Mar 6, 2026

0.5.0

Oct 20, 2025

0.4.0

Jul 14, 2025

0.1.0rc5 pre-release

Jul 9, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyactivestorage-0.6.0.tar.gz (8.7 MB view details)

Uploaded Mar 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pyactivestorage-0.6.0-py3-none-any.whl (23.6 kB view details)

Uploaded Mar 6, 2026 Python 3

File details

Details for the file pyactivestorage-0.6.0.tar.gz.

File metadata

Download URL: pyactivestorage-0.6.0.tar.gz
Upload date: Mar 6, 2026
Size: 8.7 MB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyactivestorage-0.6.0.tar.gz
Algorithm	Hash digest
SHA256	`6d280ae29a0923e9d959c419ad4dcfabe76dd0fc28af7116e0aab0e3cfb4300f`
MD5	`9000ca126194e846f2d5469cb27847a7`
BLAKE2b-256	`cb236953fa6c65f03477c8de5b9b4c72b70a36531950580d97acf8bcb494a1eb`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyactivestorage-0.6.0.tar.gz:

Publisher: build-and-deploy-on-pypi.yml on NCAS-CMS/PyActiveStorage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pyactivestorage-0.6.0.tar.gz
- Subject digest: 6d280ae29a0923e9d959c419ad4dcfabe76dd0fc28af7116e0aab0e3cfb4300f
- Sigstore transparency entry: 1051024450
- Sigstore integration time: Mar 6, 2026
Source repository:
- Permalink: NCAS-CMS/PyActiveStorage@ecf3da5a576eb533abf9fb825ec7c4ba0e0b1497
- Branch / Tag: refs/tags/v0.6.0
- Owner: https://github.com/NCAS-CMS
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: build-and-deploy-on-pypi.yml@ecf3da5a576eb533abf9fb825ec7c4ba0e0b1497
- Trigger Event: release

File details

Details for the file pyactivestorage-0.6.0-py3-none-any.whl.

File metadata

Download URL: pyactivestorage-0.6.0-py3-none-any.whl
Upload date: Mar 6, 2026
Size: 23.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pyactivestorage-0.6.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`58fb157d1227129036c912e98f355be9ff7c3d88b4dad2846fb0663bb61ca2dc`
MD5	`903b3a98e3c17e31e2f76605bf79f4d2`
BLAKE2b-256	`7ab356267941d7d089d7bf84c27303c344ad36c4b5405c47c0488688efb4b486`

See more details on using hashes here.

Provenance

The following attestation bundles were made for pyactivestorage-0.6.0-py3-none-any.whl:

Publisher: build-and-deploy-on-pypi.yml on NCAS-CMS/PyActiveStorage

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: pyactivestorage-0.6.0-py3-none-any.whl
- Subject digest: 58fb157d1227129036c912e98f355be9ff7c3d88b4dad2846fb0663bb61ca2dc
- Sigstore transparency entry: 1051024559
- Sigstore integration time: Mar 6, 2026
Source repository:
- Permalink: NCAS-CMS/PyActiveStorage@ecf3da5a576eb533abf9fb825ec7c4ba0e0b1497
- Branch / Tag: refs/tags/v0.6.0
- Owner: https://github.com/NCAS-CMS
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: build-and-deploy-on-pypi.yml@ecf3da5a576eb533abf9fb825ec7c4ba0e0b1497
- Trigger Event: release

PyActiveStorage 0.6.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

PyActiveStorage

Create virtual environment

Install with pip

Run tests

Main dependencies

Active Storage Data Interface

Storage types

Local file

S3-compatible object store

HTTPS-compatible on an NGINX server

Testing overview

Testing HDF5 chunking

Test No. 1 specs

Ran a null test

Run kerchunk's translator to JSON

Ran an Active v1 test

Test No. 2 specs

Run kerchunk's translator to JSON

Ran an Active v1 test

Test No. 3 specs

Run kerchunk's translator to JSON

Ran an Active v1 test

Some conclusions

Documentation

Code coverage (test coverage)

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Install with `pip`