A simple schema versioning system for Python dataclasses
Project description
Up-and-up :rocket:
upandup
is a simple schema versioning system for Python dataclasses.
Why?
In Python, dataclasses
are a great way to define data schemas. However, when the schema changes, you need to be able to update the old data to the latest version, or risk breaking the ability to load old data from JSON, YAML, or other formats.
upandup
provides a simple way to define how to update between different versions of a schema, and then load the latest version of the schema from old data.
Let's say you have a dataclass
like this:
@dataclass
class DataSchemaV1:
x: int
Users might end up serializing this to JSON:
{
"x": 1
}
Later, you decide to add a new field y
:
@dataclass
class DataSchema:
x: int
x_str: str
Now, users can no longer load the old data, because the schema has changed. You need to define how to update the old data to the new schema. upandup
provides a way to do this.
import upandup as upup
update = lambda cls_start, cls_end, obj_start: cls_end(x=obj_start.x, x_str="the value is: %d" % obj_start.x)
# Register the update
upup.register_updates("DataSchema", DataSchemaV1, DataSchema, fn_update=update)
In the end, upandup
exposes a load
method that users can call to load the latest version of the schema from old data every time.
data = { x: 1 }
obj = upup.load("DataSchema", data)
print(obj.x_str) # the value is: 1
This load
method can also be exposed as an anonymous function such as:
# In your package:
load_data_schema = upup.make_load_fn("DataSchema")
# In scripts using your package
data = { x: 1 }
obj = load_data_schema(data)
Serialization formats supported
- Dictionary - define
to_dict
andfrom_dict
methods on your dataclasses. - JSON - define
to_json
andfrom_json
methods on your dataclasses. - YAML - define
to_yaml
andfrom_yaml
methods on your dataclasses. - TOML - define
to_toml
andfrom_toml
methods on your dataclasses.
Example
First, define some dataclasses. Let's say you have a DataSchema
dataclass, which is the latest version, but also 2 older versions called DataSchemaV1
and DataSchemaV2
. These classes have to_json
and from_json
methods defined via the mashumaro
package by inheriting from DataClassDictMixin
(they could also be defined manually).
from dataclasses import dataclass
from mashumaro import DataClassDictMixin
@dataclass
class DataSchemaV1(DataClassDictMixin):
x: int
@dataclass
class DataSchemaV2(DataClassDictMixin):
x: int
y: int
z: int = 0
@dataclass
class DataSchema(DataClassDictMixin):
x: int
name: str
Here, the first version DataSchemaV1
has only one field x
, and the second version DataSchemaV2
has 3 fields x
, y
, and z
. The field y
has no default, so we will have to define how to update it. The field z
already has a default in the definition. In the final version, the fields y
and z
have been removed again, and a new field name
has been added.
Now, we can define how to update between the versions. We can use the upandup
package to do this.
import upandup as upup
# Define the functions to update between the versions
# The functions take the start and end classes, and the object to update
# The functions should return an object of the end class
# The functions can be lambdas or regular functions
# For the first update, we need to add a default value for the new field `y` (`z` already has a default).
update_1_to_2 = lambda cls_start, cls_end, obj_start: cls_end(x=obj_start.x, y=0)
# For the second update, we need to exclude the fields `y` and `z`, and add the new field `name` with a default value.
update_2_to_latest = lambda cls_start, cls_end, obj_start: cls_end(x=obj_start.x, name="default")
# Register the update under the label `DataSchema`
upup.register_updates("DataSchema", DataSchemaV1, DataSchemaV2, fn_update=update_1_to_2)
upup.register_updates("DataSchema", DataSchemaV2, DataSchema, fn_update=update_2_to_latest)
# Expose a helper function to load the latest version of the schema
# This makes a thin wrapper around upup.load
load_data_schema = upup.make_load_fn("DataSchema")
Finally, we can test the update.
# Test the update
data = {"x": 1}
obj = load_data_schema(data, options=upup.LoadOptions())
print("Result:")
print(f"Loaded object: {obj} of type {type(obj)}") # Loaded object: DataSchema(x=1, name='default') of type DataSchema
Advanced
Write intermediate versions
By default, the intermediate versions from updating to the latest are not written to the output. If you want to write them, you can set the write_intermediate
option to True
.
data = {"x": 1}
options = upup.LoadOptions(write_versions=True, write_version_prefix="version", write_versions_dir=".")
obj = upup.load("DataSchema", data, options=options)
This will write the files:
version_DataSchema.json
version_DataSchemaV1.json
version_DataSchemaV2.json
Example in a package
We can organize the same example above to demonstrate how to use it in a package.
Create the following files:
setup.py
mypackage/
__init__.py
data_latest.py
data_v1.py
data_v2.py
register_updates.py
run_example.py
The data schemas are defined by data_v1.py
, data_v2.py
, and data_latest.py
. The update functions between them are defined in register_updates.py
.
The package is installed by the setup.py
file:
from setuptools import setup, find_packages
setup(
name='mypackage',
version='0.1.0',
description='An example package',
packages=find_packages(),
install_requires=[
"loguru",
"mashumaro",
"setuptools",
"upandup"
],
python_requires='>=3.11',
)
The contents of data_v1.py
are:
from mashumaro import DataClassDictMixin
from dataclasses import dataclass
@dataclass
class DataSchemaV1(DataClassDictMixin):
x: int
The contents of data_v2.py
are:
from mashumaro import DataClassDictMixin
from dataclasses import dataclass
@dataclass
class DataSchemaV2(DataClassDictMixin):
x: int
y: int
z: int = 0
The contents of data_latest.py
are:
from mashumaro import DataClassDictMixin
from dataclasses import dataclass
@dataclass
class DataSchema(DataClassDictMixin):
x: int
name: str
The __init__.py
exposes only the latest version of the schema:
from .data_latest import DataSchema
from .register_updates import load_data_schema, Options
The register_updates.py
contains the update functions:
import upandup as upup
from mypackage.data_v1 import DataSchemaV1
from mypackage.data_v2 import DataSchemaV2
from mypackage.data_latest import DataSchema
update_1_to_2 = lambda cls_start, cls_end, obj_start: cls_end(x=obj_start.x, y=0)
update_2_to_latest = lambda cls_start, cls_end, obj_start: cls_end(x=obj_start.x, name="default")
# Register the update
upup.register_updates("DataSchema", DataSchemaV1, DataSchemaV2, fn_update=update_1_to_2)
upup.register_updates("DataSchema", DataSchemaV2, DataSchema, fn_update=update_2_to_latest)
# Expose the load function and options in a nicer way
load_data_schema = upup.make_load_fn("DataSchema")
Options = upup.LoadOptions
As noted in the __init__.py
, we also expose the load_data_schema
and Options
from register_updates.py
. This lets users easily load the latest version of the schema every time from any old version.
Finally, the run_example.py
contains the test code:
import mypackage as mp
# Test the update
data = {"x": 1}
obj = mp.load_data_schema(data, mp.Options())
print("Result:")
print(f"Loaded object: {obj} of type {type(obj).__name__}") # Loaded object: DataSchema(x=1, name='default') of type DataSchema
Note that the upandup
package itself did not have to be called.
Tests
Tests are included in the tests
directory and built on pytest
- from the root directory, run:
pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file upandup-0.1.0.tar.gz
.
File metadata
- Download URL: upandup-0.1.0.tar.gz
- Upload date:
- Size: 10.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4d9fd227d86a103e4993b984e8bfe3dac8274fb6c9b08d2b8a765c9e0e4cf3a4 |
|
MD5 | ac8e7fdd092c5f63c6d4a84207182e7a |
|
BLAKE2b-256 | a8085e0d70b062ff20166f3418ddb7c226cc64cb4811c779cd8a759c7c7767da |
File details
Details for the file upandup-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: upandup-0.1.0-py3-none-any.whl
- Upload date:
- Size: 7.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.7
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9deb5aad5afe0dc9b01fc199511f096a8d1cb5e34e8e03abf2707d5b76351caf |
|
MD5 | a17d192b7dd99087bd604a8a412f299f |
|
BLAKE2b-256 | 41f6b3e3209d1cfe03205e695de59887cb6fb08d684a9c4ef97bc0b2b57fbfdb |