Skip to main content

Easy serialization of Python objects with multiprocessing support

Project description

Epic serialize - Easy Python objects serialization

Epic-serialize CI

What is it?

The epic-serialize Python library provides a method for easily and efficiently serializing multiple Python objects into a single file, and then deserializing them back in random-like access.

Here's a quick example:

from epic.serialize import SpocReader, SpocWriter

with SpocWriter("myfile.spoc") as spocw:
    spocw.write(map(str, range(10)))
    spocw.write("--done--")

assert list(SpocReader("myfile.spoc")[-3:]) == ["8", "9", "--done--"]

The SPOC format

Spoc, or Serialized Python Objects Chunks, is a multi-serialization file format that allows to easily serialize multiple Python objects into a file which later allows quick reading and slicing.

Consider the case where a dict with 100M items is to be serialized to disk. By simply using pickle.dump, for example, the entire serialization will be done first in memory, and only then written to file, which may be slow and cause out-of-memory errors. In addition, it's sometimes desirable to serialize objects one-by-one (for example, while traversing an iterator).

In addition to serializing objects (in "chunks", as the name implies), Spoc also applies compression before writing to disk, thus potentially saving a lot of space.

The two main classes are SpocWriter, which can either create a new file or append to an existing one, and SpocReader, which allows reading, iteration and random access via index or slice.

SpocReader slices may also be passed to other process (they are picklable), which is useful when dividing jobs over multiple slices of the same Spoc file.

The SpocReader/SpocWriter classes fully support the 'with' statement, and this is the preferred syntax.

Currently, the following serialization schemes are supported:

  • pickle (builtin)
  • dill (requires installation)

Currently, the following compression algorithms are supported:

  • zlib (builtin)
  • bz2 (builtin)
  • gzip (builtin)
  • lzma (builtin)
  • lz4 (requires installation)

These parameters can be passed (by name or class) to the SpocWriter, but will only be used if the file is new or overwritten (not appended to). SpocReader as well as SpocWriter, when used to append to an existing file, interpret the serializer and compressor from the file's header.

Usage examples

Create new file with explicit serialization/compression:

with SpocWriter(filename, serialization="pickle", compression="gzip") as spocw:
    for item in items:
        spocw.write(item)

Open an existing file for appending:

with SpocWriter(filename, append=True) as spocw:
    spocw.write(more_items)

Read all items:

with SpocReader(filename) as spocr:
    read_items = list(spocr)

Read sliced items:

sliced_items = list(SpocReader(filename)[10:-1])
sliced_item = SpocReader(filename)[1000]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

epic_serialize-1.1.1.tar.gz (9.9 kB view details)

Uploaded Source

File details

Details for the file epic_serialize-1.1.1.tar.gz.

File metadata

  • Download URL: epic_serialize-1.1.1.tar.gz
  • Upload date:
  • Size: 9.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for epic_serialize-1.1.1.tar.gz
Algorithm Hash digest
SHA256 4711719ca44c109102d1d93c57b0a1b05f614e1169fadd0422056a649449d203
MD5 35e6618be461b3a4e8d0853da410039a
BLAKE2b-256 f2c96df246e7b7227d99c1d18bf3e1e7690c4006fdb7d129851b891494f1726d

See more details on using hashes here.

Provenance

The following attestation bundles were made for epic_serialize-1.1.1.tar.gz:

Publisher: release.yml on epic-framework/epic-serialize

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page