A library for reading and writing hierarchical data files

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

richiehakim

These details have not been verified by PyPI

Project description

richfile

A more natural approach to saving hierarchical data structures.

richfile saves any Python object to disk and loads it back into the same Python objects.

Four backends are available:

backend="directory": classic richfile directory trees (default).
backend="sqlar": single-file SQLite archive (.sqlar) with no compression.
backend="zip": single-file ZIP archive (.zip) in stored mode (no compression).
backend="tar": single-file plain TAR archive (.tar) with no compression.

richfile can save any atomic Python object, including custom classes, so long as you can write a function to save and load it. It is intended as a replacement for things like: pickle, json, yaml, HDF5, Parquet, netCDF, zarr, numpy, etc. when you want to save a complex data structure in a human-readable and editable format. We find the richfile format ideal to use when you are building a data processing pipeline and you want to contain intermediate results in a format that allows for custom data types, is insensitive to version changes (pickling issues), allows for easy debugging, and is human readable.

It is easy to use, the code is simple and pure python, and the operations follow ACID principles.

Installation

pip install richfile

Examples

Try out the examples in the demo_notebook.ipynb file.

Usage

Saving and loading data is simple:

## Given some complex data structure
data = {
    "name": "John Doe",
    "age": 25,
    "address": {
        "street": "1234 Elm St",
        "zip": None
    },
    "siblings": [
        "Jane",
        "Jim"
    ],
    "data": [1,2,3],
    (1,2,3): "complex key",
}

## Save it
import richfile as rf
r = rf.RichFile("path/to/data.richfile").save(data)

## Load it back
data = rf.RichFile("path/to/data.richfile").load()

Backends

By default, richfile will use the 'directory' backend. However, you can use other backends:

'directory': places the contents into a directory. Slow saving, fast loading. Unwieldy when there are many leaf elements.
'sqlar': single-file SQLite archive (.sqlar). Best general use choice. Fast and allows for random access, but does not allow easy navigating in a file browser.
'zip': single-file ZIP archive (.zip) in stored mode (no compression). Slower than sqlar, but still performant, and easy to handle.
'tar': single-file plain TAR archive (.tar) with no compression. Slower than sqlar, but still performant, and fairly easy to handle.

Save and load using the SQLAR backend:

import richfile as rf

rf.RichFile("path/to/data.sqlar", backend="sqlar").save(data)
data = rf.RichFile("path/to/data.sqlar", backend="sqlar").load()

Convert between backends (raw byte-preserving conversion):

import richfile as rf

## Archive -> directory-style richfile
rf.extract_backend_to_directory(
    path_source="path/to/data.zip",
    backend_source="zip",
    path_directory_out="path/to/data.richfile",
    overwrite=True,
)

## Directory-style richfile -> archive backend
rf.pack_directory_to_backend(
    path_directory_in="path/to/data.richfile",
    backend_target="sqlar",
    path_target="path/to/data.sqlar",
    overwrite=True,
)

## Generic backend -> backend conversion
rf.convert_backend(
    path_source="path/to/data.sqlar",
    backend_source="sqlar",
    path_target="path/to/data.tar",
    backend_target="tar",
    mode="raw",        ## "raw" (byte-preserving) or "semantic" (load/save)
    overwrite=True,
)

You can also load just a part of the data:

r = rf.RichFile("path/to/data.richfile")  ## Path to an existing richfile
first_sibling = r["siblings"][0].load()  ## Lazily load a single item using pythonic indexing
print(f"First sibling: {first_sibling}")

>>> First sibling: Jane

View the contents of a richfile directory without loading it:

r.view_directory_tree()

Output:

Directory structure:
Viewing tree structure of richfile at path: ~/path/data.richfile (dict)
├── name.dict_item (dict_item)
|   ├── key.json (str)
|   ├── value.json (str)
|   
├── age.dict_item (dict_item)
|   ├── key.json (str)
|   ├── value.json (int)
|   
├── address.dict_item (dict_item)
|   ├── key.json (str)
|   ├── value.dict (dict)
|   |   ├── street.dict_item (dict_item)
|   |   |   ├── key.json (str)
|   |   |   ├── value.json (str)
|   |   |   
|   |   ├── zip.dict_item (dict_item)
|   |   |   ├── key.json (str)
|   |   |   ├── value.json (None)
|   |   |   
|   |   
|   
├── siblings.dict_item (dict_item)
|   ├── key.json (str)
|   ├── value.list (list)
|   |   ├── 0.json (str)
|   |   ├── 1.json (str)
|   |   
|   
├── data.dict_item (dict_item)
|   ├── key.json (str)
|   ├── value.list (list)
|   |   ├── 0.json (int)
|   |   ├── 1.json (int)
|   |   ├── 2.json (int)
|   |   
|   
├── 5.dict_item (dict_item)
|   ├── key.tuple (tuple)
|   |   ├── 0.json (int)
|   |   ├── 1.json (int)
|   |   ├── 2.json (int)
|   |   
|   ├── value.json (str)
|

You can also add new data types easily:

## Add type to a RichFile object
r = rf.RichFile("path/to/data.richfile")
r.register_type(
    type_name='numpy_array',
    function_load=lambda path: np.load(path),
    function_save=lambda path, obj: np.save(path, obj),
    object_class=np.ndarray,
    library='numpy',
    suffix='npy',
)

## OR
## Add type to the global workspace / kernel so that all new RichFile objects can use it
rf.functions.register_type(
    type_name='numpy_array',
    function_load=lambda path: np.load(path),
    function_save=lambda path, obj: np.save(path, obj),
    object_class=np.ndarray,
    library='numpy',
    suffix='npy',
)

Installation from source

git clone https://github.com/RichieHakim/richfile
cd richfile
pip install -e .

Considerations and Limitations

Inversibility: When creating custom data types, it is important to consider whether the saving and loading operations are exactly reversible.
ACID principles are reasonably followed via the use of temporary files, file locks, and atomic operations. However, the library is not a database, and therefore cannot guarantee the same level of ACID compliance as a database. In addition, atomic replacements of existing non-empty directories require two operations, which reduces atomicity.
Backend selection: You can pass backend explicitly ("directory", "sqlar", "zip", "tar"), or rely on backend="auto" for loading from existing paths. Path suffixes remain informational only.
Archive performance tradeoff: SQLAR/ZIP/TAR store raw bytes without compression in v1 for faster save behavior. This can increase on-disk size compared with compressed formats.
Archive scope in v1: SQLAR/ZIP/TAR currently support root-object save and lazy/query load behavior. Nested path mutation/append APIs are intentionally deferred.
TAR scope in v1: TAR backend writes plain .tar only (no .tar.gz, .tgz, .tar.bz2, or .tar.xz output modes).
Custom type compatibility in archive backends: Custom function_save(path, ...) and function_load(path, ...) callbacks are supported via a selective temporary-path bridge when needed.
Backend conversion: convert_backend(..., mode="raw") performs byte-preserving layout conversion and does not deserialize objects. mode="semantic" performs load() + save() and requires matching type registrations.

TODO:

Tests
Documentation
Examples
Readme
License
PyPi
~~Hashing~~
~~Item assignment (safely)~~
Custom saving/loading functions
~~Put the library imports in the function calls~~
Add handling for data without a known type
Change name of library to something more descriptive
Test out memmap stuff
~~Make it a .zip type~~
Add mutability
Archive packing

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

richiehakim

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.6.2

Mar 15, 2026

0.6.1

Mar 15, 2026

0.6.0

Mar 15, 2026

0.5.5

Mar 14, 2026

0.5.4

Oct 5, 2025

0.5.3

Jul 20, 2025

0.5.2

Apr 18, 2025

0.5.1 yanked

Apr 18, 2025

0.5.0 yanked

Apr 18, 2025

0.4.6

Apr 18, 2025

0.4.5

Oct 5, 2024

0.4.4

Oct 4, 2024

0.4.3

Sep 30, 2024

0.3.3

Sep 23, 2024

0.3.2

Sep 23, 2024

0.3.0

Sep 22, 2024

0.2.0

Sep 21, 2024

0.1.1

Sep 17, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

richfile-0.6.2.tar.gz (56.5 kB view details)

Uploaded Mar 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

richfile-0.6.2-py3-none-any.whl (50.6 kB view details)

Uploaded Mar 15, 2026 Python 3

File details

Details for the file richfile-0.6.2.tar.gz.

File metadata

Download URL: richfile-0.6.2.tar.gz
Upload date: Mar 15, 2026
Size: 56.5 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for richfile-0.6.2.tar.gz
Algorithm	Hash digest
SHA256	`8441d07b9650e14052c8807f3835e93ed0394d11c9ce0474afa68a5cfca9ccfa`
MD5	`911390dbaae471edf853e9fb5fa2aed0`
BLAKE2b-256	`47b91a92c05aaa5e4a8daebac3d54acdcc0beede0f972c433e1778a8d734c104`

See more details on using hashes here.

Provenance

The following attestation bundles were made for richfile-0.6.2.tar.gz:

Publisher: pypi_release.yml on RichieHakim/richfile

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: richfile-0.6.2.tar.gz
- Subject digest: 8441d07b9650e14052c8807f3835e93ed0394d11c9ce0474afa68a5cfca9ccfa
- Sigstore transparency entry: 1106617546
- Sigstore integration time: Mar 15, 2026
Source repository:
- Permalink: RichieHakim/richfile@d8f1a96490425184bf76ece9d13a3d6cf01c1f4c
- Branch / Tag: refs/heads/main
- Owner: https://github.com/RichieHakim
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi_release.yml@d8f1a96490425184bf76ece9d13a3d6cf01c1f4c
- Trigger Event: workflow_dispatch

File details

Details for the file richfile-0.6.2-py3-none-any.whl.

File metadata

Download URL: richfile-0.6.2-py3-none-any.whl
Upload date: Mar 15, 2026
Size: 50.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for richfile-0.6.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`131f9cab7d9b1b39fa21226e92ba32ee88ab52b1ce3ac7a86c7031c3c2543374`
MD5	`ab0394294365042eb2407946b61762ca`
BLAKE2b-256	`616129a27e16ddec5bc33367709b50a99156097f07e772489ddb0e297633bfc6`

See more details on using hashes here.

Provenance

The following attestation bundles were made for richfile-0.6.2-py3-none-any.whl:

Publisher: pypi_release.yml on RichieHakim/richfile

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: richfile-0.6.2-py3-none-any.whl
- Subject digest: 131f9cab7d9b1b39fa21226e92ba32ee88ab52b1ce3ac7a86c7031c3c2543374
- Sigstore transparency entry: 1106617549
- Sigstore integration time: Mar 15, 2026
Source repository:
- Permalink: RichieHakim/richfile@d8f1a96490425184bf76ece9d13a3d6cf01c1f4c
- Branch / Tag: refs/heads/main
- Owner: https://github.com/RichieHakim
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: pypi_release.yml@d8f1a96490425184bf76ece9d13a3d6cf01c1f4c
- Trigger Event: workflow_dispatch

richfile 0.6.2

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Project description

richfile

Installation

Examples

Usage

Backends

Installation from source

Considerations and Limitations

TODO:

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance