Project description

Epic serialize - Easy Python objects serialization

What is it?

The epic-serialize Python library provides a method for easily and efficiently serializing multiple Python objects into a single file, and then deserializing them back in random-like access.

Here's a quick example:

from epic.serialize import SpocReader, SpocWriter

with SpocWriter("myfile.spoc") as spocw:
    spocw.write(map(str, range(10)))
    spocw.write("--done--")

assert list(SpocReader("myfile.spoc")[-3:]) == ["8", "9", "--done--"]

The SPOC format

Spoc, or Serialized Python Objects Chunks, is a multi-serialization file format that allows to easily serialize multiple Python objects into a file which later allows quick reading and slicing.

Consider the case where a dict with 100M items is to be serialized to disk. By simply using pickle.dump, for example, the entire serialization will be done first in memory, and only then written to file, which may be slow and cause out-of-memory errors. In addition, it's sometimes desirable to serialize objects one-by-one (for example, while traversing an iterator).

In addition to serializing objects (in "chunks", as the name implies), Spoc also applies compression before writing to disk, thus potentially saving a lot of space.

The two main classes are SpocWriter, which can either create a new file or append to an existing one, and SpocReader, which allows reading, iteration and random access via index or slice.

SpocReader slices may also be passed to other process (they are picklable), which is useful when dividing jobs over multiple slices of the same Spoc file.

The SpocReader/SpocWriter classes fully support the 'with' statement, and this is the preferred syntax.

Currently, the following serialization schemes are supported:

pickle (builtin)
dill (requires installation)

Currently, the following compression algorithms are supported:

zlib (builtin)
bz2 (builtin)
gzip (builtin)
lzma (builtin)
lz4 (requires installation)

These parameters can be passed (by name or class) to the SpocWriter, but will only be used if the file is new or overwritten (not appended to). SpocReader as well as SpocWriter, when used to append to an existing file, interpret the serializer and compressor from the file's header.

Usage examples

Create new file with explicit serialization/compression:

with SpocWriter(filename, serialization="pickle", compression="gzip") as spocw:
    for item in items:
        spocw.write(item)

Open an existing file for appending:

with SpocWriter(filename, append=True) as spocw:
    spocw.write(more_items)

Read all items:

with SpocReader(filename) as spocr:
    read_items = list(spocr)

Read sliced items:

sliced_items = list(SpocReader(filename)[10:-1])
sliced_item = SpocReader(filename)[1000]

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
Intended Audience
- Developers
Operating System
Programming Language
- Python :: 3 :: Only
- Python :: 3.10

Release history Release notifications | RSS feed

This version

1.0

Nov 30, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

epic-serialize-1.0.tar.gz (32.9 kB view details)

Uploaded Nov 30, 2023 Source

File details

Details for the file epic-serialize-1.0.tar.gz.

File metadata

Download URL: epic-serialize-1.0.tar.gz
Upload date: Nov 30, 2023
Size: 32.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.10.0

File hashes

Hashes for epic-serialize-1.0.tar.gz
Algorithm	Hash digest
SHA256	`f108d172c8fdcdab14da6ae04309161a2eae93b1a012cf44ecc5d2360aa33cc0`
MD5	`966bd9a93d01cd92b49ff3ee95f279e4`
BLAKE2b-256	`b107b3167f85a0e5512b3f372645fb9f007c175446c6b05e3be71b8919edf16b`