Skip to main content

A library for reading and writing hierarchical data files

Project description

richfile

A more natural approach to saving hierarchical data structures.

richfile saves any Python object using directory structures on disk, and loads them back again into the same Python objects.

richfile can save any atomic Python object, including custom classes, so long as you can write a function to save and load it. It is intended as a replacement for things like: pickle, json, yaml, HDF5, Parquet, netCDF, zarr, numpy, etc. when you want to save a complex data structure in a human-readable and editable format. We find the richfile format ideal to use when you are building a data processing pipeline and you want to contain intermediate results in a format that allows for custom data types, is insensitive to version changes (pickling issues), allows for easy debugging, and is human readable.

It is easy to use, the code is simple and pure python, and the operations follow ACID principles.

Installation

pip install richfile

Examples

Try out the examples in the demo_notebook.ipynb file.

Usage

Saving and loading data is simple:

## Given some complex data structure
data = {
    "name": "John Doe",
    "age": 25,
    "address": {
        "street": "1234 Elm St",
        "zip": None
    },
    "siblings": [
        "Jane",
        "Jim"
    ],
    "data": [1,2,3],
    (1,2,3): "complex key",
}

## Save it
import richfile as rf
r = rf.RichFile("path/to/data.richfile").save(data)

## Load it back
data = rf.RichFile("path/to/data.richfile").load()

You can also load just a part of the data:

r = rf.RichFile("path/to/data.richfile")
first_sibling = r["siblings"][0].load()  ## Lazily load a single item using pythonic indexing
print(f"First sibling: {first_sibling}")

>>> First sibling: Jane

View the contents of a richfile directory without loading it:

r.view_directory_structure()

Output:

Directory structure:
Viewing tree structure of richfile at path: ~/path/data.richfile (dict)
├── name.dict_item (dict_item)
|   ├── key.json (str)
|   ├── value.json (str)
|   
├── age.dict_item (dict_item)
|   ├── key.json (str)
|   ├── value.json (int)
|   
├── address.dict_item (dict_item)
|   ├── key.json (str)
|   ├── value.dict (dict)
|   |   ├── street.dict_item (dict_item)
|   |   |   ├── key.json (str)
|   |   |   ├── value.json (str)
|   |   |   
|   |   ├── zip.dict_item (dict_item)
|   |   |   ├── key.json (str)
|   |   |   ├── value.json (None)
|   |   |   
|   |   
|   
├── siblings.dict_item (dict_item)
|   ├── key.json (str)
|   ├── value.list (list)
|   |   ├── 0.json (str)
|   |   ├── 1.json (str)
|   |   
|   
├── data.dict_item (dict_item)
|   ├── key.json (str)
|   ├── value.list (list)
|   |   ├── 0.json (int)
|   |   ├── 1.json (int)
|   |   ├── 2.json (int)
|   |   
|   
├── 5.dict_item (dict_item)
|   ├── key.tuple (tuple)
|   |   ├── 0.json (int)
|   |   ├── 1.json (int)
|   |   ├── 2.json (int)
|   |   
|   ├── value.json (str)
|   

You can also add new data types easily:

## Add type to a RichFile object
r = rf.RichFile("path/to/data.richfile")
r.register_type(
    type_name='numpy_array',
    function_load=lambda path: np.load(path),
    function_save=lambda path, obj: np.save(path, obj),
    object_class=np.ndarray,
    library='numpy',
    suffix='npy',
)

## OR
## Add type to the global workspace / kernel so that all new RichFile objects can use it
rf.functions.register_type(
    type_name='numpy_array',
    function_load=lambda path: np.load(path),
    function_save=lambda path, obj: np.save(path, obj),
    object_class=np.ndarray,
    library='numpy',
    suffix='npy',
)

Installation from source

git clone https://github.com/RichieHakim/richfile
cd richfile
pip install -e .

Considerations and Limitations

  • Inversibility: When creating custom data types, it is important to consider whether the saving and loading operations are exactly reversible.
  • ACID principles are reasonably followed via the use of temporary files, file locks, and atomic operations. However, the library is not a database, and therefore cannot guarantee the same level of ACID compliance as a database. In addition, atomic replacements of existing non-empty directories require two operations, which reduces atomicity.
  • Performance: Data structures with many branches will require many files and operations, which may become slow. Consider packaging highly branched data structures into a single file that supports hierarchical data, such as JSON, HDF5, Parquet, netCDF, zarr, numpy, etc. and making a custom data type for it.

TODO:

  • Tests
  • Documentation
  • Examples
  • Readme
  • License
  • PyPi
  • Hashing
  • Item assignment (safely)
  • Custom saving/loading functions
  • Put the library imports in the function calls
  • Add handling for data without a known type
  • Change name of library to something more descriptive
  • Test out memmap stuff
  • Make it a .zip type
  • Add mutability

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

richfile-0.5.5.tar.gz (29.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

richfile-0.5.5-py3-none-any.whl (24.8 kB view details)

Uploaded Python 3

File details

Details for the file richfile-0.5.5.tar.gz.

File metadata

  • Download URL: richfile-0.5.5.tar.gz
  • Upload date:
  • Size: 29.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for richfile-0.5.5.tar.gz
Algorithm Hash digest
SHA256 d4bb900586b5603e81f7d2bf1b0b9e0b9efdb5472590ae2dd45304e70d281264
MD5 677c4e37e04d22779d6433ba546e1087
BLAKE2b-256 90582471abc70faf2b56bbaf2967d1d1d45bd63d50c9849b465e4c2c55b9eba6

See more details on using hashes here.

Provenance

The following attestation bundles were made for richfile-0.5.5.tar.gz:

Publisher: pypi_release.yml on RichieHakim/richfile

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file richfile-0.5.5-py3-none-any.whl.

File metadata

  • Download URL: richfile-0.5.5-py3-none-any.whl
  • Upload date:
  • Size: 24.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for richfile-0.5.5-py3-none-any.whl
Algorithm Hash digest
SHA256 e18ece938b8764c76be7b7b0d2cf3e9824adbdff5e5639435d6973b069f1ff1e
MD5 eedb9f509d07002781cf74c3a0adfae5
BLAKE2b-256 ec6ef6b615af210a2a7ccb4b8f5b6cd100bd7bbfc36093ef03fd22eb9ba1380d

See more details on using hashes here.

Provenance

The following attestation bundles were made for richfile-0.5.5-py3-none-any.whl:

Publisher: pypi_release.yml on RichieHakim/richfile

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page