Skip to main content

bagofholding - browsable, partially-reloadable serialization for pickleable python objects.

Project description

bagofholding

bagofholding is designed to be an easy stand-in for pickle serialization for python object that is transparent, flexible, and suitable for long-term storage.

Advantages

Drop-in replacement

bagofholding stores (almost) any pickle-able python object, and can be easily used as a drop-in replacement for pickle serialization:

>>> import bagofholding as boh
>>>
>>> boh.H5Bag.save(42, "file.h5")
>>> print(boh.H5Bag("file.h5").load())
42

Browseable

The contents of stored objects can be browsed without actually re-instantiating any of the stored data. In the example above, we saw that saving is a class-method, while loading is an instance method. We can grab the "bag" instance and use it to peek at what's inside!

Let's use a slightly more complex object. Readers familiar with pickle will be able to see that the "reduced" structure of the object is captured in the structure of the storage itself:

>>> class MyThing:
...     def __init__(self, answer: int, question: str):
...         self.answer = answer
...         self.question = question
>>>
>>> something = MyThing(42, "still computing...")
>>> boh.H5Bag.save(something, "something.h5")
>>> bag = boh.H5Bag("something.h5")
>>> bag.list_paths()
['object', 'object/args', 'object/args/i0', 'object/constructor', 'object/item_iterator', 'object/kv_iterator', 'object/state', 'object/state/answer', 'object/state/question']

Item-access on the bag object gives access to metadata stored alongside the actual serialized information:

>>> bag["object"]
Metadata(content_type='bagofholding.content.Reducible', qualname='MyThing', module='__main__', version=None, meta=None)

For Jupyter users, we power-up browsing capabilities with a widget under bag.browse() which lets you navigate the tree and see both metadata values and stored types:

Partial-loading

Stored objects can also be re-instantiated in part by leveraging their storage path:

>>> bag.load("object/state/answer")
42

Note that we didn't re-instantiate any part of the object other than this one integer!

This feature is incredibly useful for long-term storage and data transferability, as the loading environment does not need to fully match the saving environment -- only the environment required to load the actual piece of data desired matches. Consider some complex object which, ultimately, contains important or expensive-to-calculate numeric data, e.g. in the form of numpy array. With bagofholding, you can pass this data to a colleague running a different python environment, or come back to it years later. With only bagofholding and numpy installed, the end user can browse through the stored object, access, and load only the valuable numeric data without re-installing the entire original environment.

Version control

In the examples above, we saw that version (and of course package) information is part of the stored metadata. This is useful post-facto for knowing what packages need to be installed to properly load your serialized data. You can also specify at load-time how strict or relaxed bagofholding should be in re-instantiating data if a stored version does not match the currently installed version, thus protecting you from flawed re-instantiations.

bagofholding also provides tools to act on this data a-priori. To increase the likelihood that stored data will be accessible in the future, you can outlaw any (sub)objects coming from particular modules:

import bagofholding.exception
>> > try:
    ...
boh.H5Bag.save(something, "will_fail.h5", forbidden_modules=("__main__",))
... except bagofholding.exception.ModuleForbiddenError as e:
...
print(e)
Module
'__main__' is forbidden as a
source
of
stored
objects.Change
the
`forbidden_modules` or move
this
object
to
an
allowed
module.

And/or demand that all objects have an identifiable version that:

import bagofholding.exception
>> > try:
    ...
boh.H5Bag.save(something, "will_fail.h5", require_versions=True)
... except bagofholding.exception.NoVersionError as e:
...
print(e)
Could
not find
a
version
for __main__.Either disable `require_versions`, use `version_scraping` to find an existing version for this package, or add versioning to the unversioned package.

Of course, metadata for the bag itself is also stored. We saw this in the GUI snapshot above, but it can also be accessed directly by code:

>>> boh.H5Bag.get_bag_info()
H5Info(qualname='H5Bag', module='bagofholding.h5.bag', version='...', libver_str='latest')

(In reality you will see a version code, it is omitted here because this example is executed automatically in the test suite.)

Going further

For a more in-depth look at the above features and to explore other aspects of bagofholding, check out the tutorial notebook.

Finally, bagofholding prioritizes transparency in what is stored and ease-of-use for both savers and loaders/browsers. As such, the current hdf5-based implementation is likely to be significantly less performant than raw pickling, due to the creation of many small datasets that allow the h5 file to directly replicate the underlying structure of the python objects being saved. For objects which contain large numpy arrays, this disadvantage is significantly alleviated as we benefit from the very efficient treatment of such arrays in hdf5 and h5py. For all other objects, the current bagofholding.H5Bag is still an appropriate choice when the robustness of long term storage is more pressing than optimizing storage space. Other bag types may be available in the future.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bagofholding-0.post0.dev1.tar.gz (27.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bagofholding-0.post0.dev1-py3-none-any.whl (35.7 kB view details)

Uploaded Python 3

File details

Details for the file bagofholding-0.post0.dev1.tar.gz.

File metadata

  • Download URL: bagofholding-0.post0.dev1.tar.gz
  • Upload date:
  • Size: 27.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for bagofholding-0.post0.dev1.tar.gz
Algorithm Hash digest
SHA256 5122cbdc4152decc94059003da5acb87c6976c8e54d477fd235983fe35d66f5e
MD5 329ad40bd66f46dd494ea367a6673f87
BLAKE2b-256 5712bfd97cba2abb3562d95e7ee930125b997ab20a7ba4989e7e8c94678f4d3b

See more details on using hashes here.

File details

Details for the file bagofholding-0.post0.dev1-py3-none-any.whl.

File metadata

File hashes

Hashes for bagofholding-0.post0.dev1-py3-none-any.whl
Algorithm Hash digest
SHA256 c38bc298ea0688068ea7e4d4860b3ab5f2b38043d10922ddaf98a6bc3fcc304a
MD5 1eebcc411c1362aeae8706c98140d919
BLAKE2b-256 3d9438060242ac75189d0914914f9a6b48b4391f0614f6642367dfc096bb5702

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page