Robust serialization support for NamedTuple & @dataclass data types.
Project description
pywise
Contains functions that provide general utility and build upon the Python 3 standard library. It has no external dependencies.
serialization
: serialization & deserialization forNamedTuple
-deriving &@dataclass
decorated classesarchives
: uncompress tar archivescommon
: utilitiesschema
: obtain adict
-like structure describing the fields & types for any serialzable type (helpful to view as JSON)
This project's most notable functionality are the serialize
and deserialize
funtions of core_utils.serialization
.
Take a look at the end of this document for example use.
Development Setup
This project uses poetry
for virtualenv and dependency management. We recommend using brew
to install poetry
system-wide.
To install the project's dependencies, perform:
poetry install
Every command must be run within the poetry
-managed environment.
For instance, to open a Python shell, you would execute:
poetry run python
Alternatively, you may activate the environment by performing poetry shell
and directly invoke Python programs.
Testing
To run tests, execute:
poetry run pytest -v
To run tests against all supported environments, use tox
:
poetry run tox -p
NOTE: To run tox
, you must have all necessary Python interpreters available.
We recommend using pyenv
to manage your Python versions.
Dev Tools
This project uses black
for code formatting, flake8
for linting, and
mypy
for type checking. Use the following commands to ensure code quality:
# formats all code in-place
black .
# typechecks
mypy --ignore-missing-imports --follow-imports=silent --show-column-numbers --warn-unreachable .
# lints code
flake8 --max-line-length=100 --ignore=E501,W293,E303,W291,W503,E203,E731,E231,E721,E722,E741 .
Documentation via Examples
Nested @dataclass and NamedTuple
Lets say you have an address book that you want to write to and from JSON.
We'll define our data types for our AddressBook
:
from typing import Optional, Union, Sequence
from dataclasses import dataclass
from enum import Enum, auto
@dataclass(frozen=True)
class Name:
first: str
last: str
middle: Optional[str] = None
class PhoneNumber(NamedTuple):
area_code: int
number: int
extension: Optional[int] = None
@dataclass(frozen=True)
class EmailAddress:
name: str
domain: str
class ContactType(Enum):
personal, professional = auto(), auto()
class Emergency(NamedTuple):
full_name: str
contact: Union[PhoneNumber, EmailAddress]
@dataclass(frozen=True)
class Entry:
name: Name
number: PhoneNumber
email: EmailAddress
contact_type: ContactType
emergency_contact: Emergency
@dataclass(frozen=True)
class AddressBook:
entries: Sequence[Entry]
For illustration, let's consider the following instantiated AddressBook
:
ab = AddressBook([
Entry(Name('Malcolm', 'Greaves', middle='W'),
PhoneNumber(510,3452113),
EmailAddress('malcolm','world.com'),
contact_type=ContactType.professional,
emergency_contact=Emergency("Superman", PhoneNumber(262,1249865,extension=1))
),
])
We can convert our AddressBook
data type into a JSON-formatted string using serialize
:
from core_utils.serialization import serialize
import json
s = serialize(ab)
j = json.dumps(s, indent=2)
print(j)
And we can easily convert the JSON string back into a new instanitated AddressBook
using deserialize
:
from core_utils.serialization import deserialize
d = json.loads(j)
new_ab = deserialize(AddressBook, d)
print(ab == new_ab)
# NOTE: The @dataclass(frozen=True) is only needed to make this equality work.
# Any @dataclass decorated type is serializable.
Note that the deserialize
function needs the type to deserialize the data into. The deserizliation
type-matching is structural: it will work so long as the data type's structure (of field names and
associated types) is compatible with the supplied data.
Custom Serialization
In the event that one desires to use serialize
and deserialize
with data types from third-party libraries (e.g. numpy
arrays) or custom-defined class
es that are not decorated with @dataclass
or derive from NamedTuple
, one may supply a CustomFormat
.
CustomFormat
is a mapping that associates precise types with custom serialization functions. When supplied to serialize
, the values in the mapping accept an instance of the exact type and produces a serializable representation. In the deserialize
function, they convert such a serialized representation into a bonafide instance of the type.
To illustrate their use, we'll deine CustomFormat
dict
s that allow us to serialize numpy
multi-dimensional arrays:
import numpy as np
from core_utils.serialization import *
custom_serialization: CustomFormat = {
np.ndarray: lambda arr: arr.tolist()
}
custom_deserialization: CustomFormat = {
np.ndarray: lambda lst: np.array(lst)
}
Now, we may supply custom_{serialization,deserialization}
to our functions. We'll use them to perform a "round-trip" serialization of a four-dimensional array of floating point numbers to and from a JSON-formatted str
:
import json
v_original = np.random.random((1,2,3,4))
s = serialize(v_original, custom=custom_serialization)
j = json.dumps(s)
d = json.loads(j)
v_deser = deserialize(np.ndarray, d, custom=custom_deserialization)
print((v_original == v_deser).all())
It's important to note that, when supplying a CustomFormat
the serialization functions take priority over the default behavior (except for Any
, as it is always considered a pass-through). Moreover, types must match exactly to the keys in the mapping. Thus, if using a generic type, you must supply separate key-value entires for each distinct type parameterization.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.