Type-safe data interchange for Python data classes
Project description
Type-safe data interchange for Python
JSON is a popular message interchange format employed in API design for its simplicity, readability, flexibility and wide support. However, json.dump
and json.load
offer no direct support when working with Python data classes employing type annotations. This package offers services for working with strongly-typed Python classes: serializing objects to JSON, deserializing JSON to objects, and producing a JSON schema that matches the data class, e.g. to be used in an OpenAPI specification.
Unlike orjson, this package supports both serializing and deserializing complex types such as data classes, UUIDs, decimals, etc., and allows specifying custom serialization and deserialization hooks. It doesn't require introducing custom classes in your class inheritance chain (such as BaseModel
in pydantic dataclasses), making it suitable for operating on classes defined in third-party modules.
This package offers the following services:
- JSON serialization and de-serialization
- Generate a JSON object from a Python object (
) - Parse a JSON object into a Python object (
- Generate a JSON object from a Python object (
- JSON schema
- Generate a JSON schema from a Python type (
) - Validate a JSON object against a Python type (
- Generate a JSON schema from a Python type (
- Type information
- Extract documentation strings (a.k.a. docstring) from types (
) - Inspect types, including generics (package
- Extract documentation strings (a.k.a. docstring) from types (
These services come with full support for complex types like data classes, named tuples and generics.
In the context of this package, a JSON object is the (intermediate) Python object representation produced by json.loads
from a JSON string. In contrast, a JSON string is the string representation generated by json.dumps
from the (intermediate) Python object representation.
Use cases
- Writing a cloud function (lambda) that communicates with JSON messages received as HTTP payload or websocket text messages
- Verifying if an API endpoint receives well-formed input
- Generating a type schema for an OpenAPI specification to impose constraints on what messages an API can receive (see python-openapi)
- Parsing JSON configuration files into a Python object
Consider the following class definition:
class Example:
"A simple data class with multiple properties."
bool_value: bool = True
int_value: int = 23
float_value: float = 4.5
str_value: str = "string"
datetime_value: datetime.datetime = datetime.datetime(1989, 10, 23, 1, 45, 50)
guid_value: uuid.UUID = uuid.UUID("f81d4fae-7dec-11d0-a765-00a0c91e6bf6")
First, we serialize the object to JSON with
source = Example()
json_obj = object_to_json(source)
Here, the variable json_obj
has the value:
"bool_value": True,
"int_value": 23,
"float_value": 4.5,
"str_value": "string",
"datetime_value": "1989-10-23T01:45:50",
"guid_value": "f81d4fae-7dec-11d0-a765-00a0c91e6bf6",
Next, we restore the object from JSON with
target = json_to_object(Example, json_obj)
Here, target
holds the restored data class object:
datetime_value=datetime.datetime(1989, 10, 23, 1, 45, 50),
We can also produce the JSON schema corresponding to the Python class:
json_schema = json.dumps(classdef_to_schema(Example), indent=4)
which yields
"$schema": "https://json-schema.org/draft/2020-12/schema",
"type": "object",
"properties": {
"bool_value": {
"type": "boolean",
"default": true
"int_value": {
"type": "integer",
"default": 23
"float_value": {
"type": "number",
"default": 4.5
"str_value": {
"type": "string",
"default": "string"
"datetime_value": {
"type": "string",
"format": "date-time",
"default": "1989-10-23T01:45:50"
"guid_value": {
"type": "string",
"format": "uuid"
"additionalProperties": false,
"required": [
"title": "A simple data class with multiple properties."
If a type has a Python docstring, then title
and description
fields in the JSON schema are populated from the text in the documentation string.
For producing a JSON schema, the following JSON schema standards are supported:
Conversion table
The following table shows the conversion types the package employs:
Python type | JSON schema type | Behavior |
None | null | |
bool | boolean | |
int | integer | |
float | number | |
str | string | |
decimal.Decimal | number | |
bytes | string | represented with Base64 content encoding |
datetime | string | constrained to match ISO 8601 format 2018-11-13T20:20:39+00:00 |
date | string | constrained to match ISO 8601 format 2018-11-13 |
time | string | constrained to match ISO 8601 format 20:20:39+00:00 |
UUID | string | constrained to match UUID format f81d4fae-7dec-11d0-a765-00a0c91e6bf6 |
Enum | value type | stores the enumeration value type (typically integer or string) |
Optional[T] | depends on inner type | reads and writes T if present |
Union[T1, T2, ...] | depends on concrete type | serializes to the appropriate inner type; deserializes from the first matching type |
List[T] | array | recursive in T |
Dict[K, V] | object | recursive in V, keys are coerced into string |
Dict[Enum, V] | object | recursive in V, keys are of enumeration value type and coerced into string |
Set[T] | array | recursive in T, container has uniqueness constraint |
Tuple[T1, T2, ...] | array | array has fixed length, each element has specific type |
Literal[const] | type matching const | export the literal value as a constant value |
data class | object | iterates over fields of data class |
named tuple | object | iterates over fields of named tuple |
regular class | object | iterates over dir(obj) |
JsonArray | array | untyped JSON array |
JsonObject | object | untyped JSON object |
Any | oneOf | a union of all basic JSON schema types |
Annotated[T, ...] | depends on T | outputs value for T, applies constraints and format based on auxiliary type information |
JSON schema examples
Simple basic types
Python type | JSON schema |
bool | {"type": "boolean"} |
int | {"type": "integer"} |
float | {"type": "number"} |
str | {"type": "string"} |
bytes | {"type": "string", "contentEncoding": "base64"} |
Simple built-in types
Python type | JSON schema |
decimal.Decimal | {"type": "number"} |
datetime.date | {"type": "string", "format": "date"} |
uuid.UUID | {"type": "string", "format": "uuid"} |
Enumeration types
class Side(enum.Enum):
LEFT = "L"
{"enum": ["L", "R"], "type": "string"}
Container types
Python type | JSON schema |
List[int] | {"type": "array", "items": {"type": "integer"}} |
Dict[str, int] | {"type": "object", "additionalProperties": {"type": "integer"}} |
Set[int] | {"type": "array", "items": {"type": "integer"}, "uniqueItems": True}} |
Tuple[int, str] | {"type": "array", "minItems": 2, "maxItems": 2, "prefixItems": [{"type": "integer"}, {"type": "string"}]} |
Annotated types
Annotated[int, IntegerRange(23, 82)])
"type": "integer",
"minimum": 23,
"maximum": 82,
Annotated[decimal.Decimal, Precision(9, 6)])
"type": "number",
"multipleOf": 0.000001,
"exclusiveMinimum": -1000,
"exclusiveMaximum": 1000,
Fixed-width types
Fixed-width integer (e.g. uint64
) and floating-point (e.g. float32
) types are annotated types defined in the package strong_typing.auxiliary
. Their signature is recognized when generating a schema, and a format
property is written instead of minimum and maximum constraints.
int32 = Annotated[int, Signed(True), Storage(4), IntegerRange(-2147483648, 2147483647)]
{"format": "int32", "type": "integer"}
uint64 = Annotated[int, Signed(False), Storage(8), IntegerRange(0, 18446744073709551615)]
{"format": "uint64", "type": "integer"}
Any type
"oneOf": [
{"type": "null"},
{"type": "boolean"},
{"type": "number"},
{"type": "string"},
{"type": "array"},
{"type": "object"},
Custom serialization and de-serialization
If a composite object (e.g. a dataclass or a plain Python class) has a to_json
member function, then this function is invoked to produce a JSON object representation from an instance.
If a composite object has a from_json
class function (a.k.a. @classmethod
), then this function is invoked, passing the JSON object as an argument, to produce an instance of the corresponding type.
Custom types
It is possible to declare custom types when generating a JSON schema. For example, the following class definition has the annotation @json_schema_type
, which will register a JSON schema subtype definition under the path #/definitions/AzureBlob
, which will be referenced later with $ref
_regexp_azure_url = re.compile(
"type": "object",
"properties": {
"mimeType": {"type": "string"},
"blob": {
"type": "string",
"pattern": _regexp_azure_url.pattern,
"required": ["mimeType", "blob"],
"additionalProperties": False,
class AzureBlob(Blob):
You can use @json_schema_type
without the schema
parameter to register the type name but have the schema definition automatically derived from the Python type. This is useful if the type is reused across the type hierarchy:
class Image:
class Study:
left: Image
right: Image
Here, the two properties of Study
and right
) will refer to the same subtype #/definitions/Image
Union types
Serializing a union type entails serializing the active member type.
De-serializing discriminated (tagged) union types is based on a disjoint set of property values with type annotation Literal[...]
. Consider the following example:
class ClassA:
name: Literal["A", "a"]
value: str
class ClassB:
name: Literal["B", "b"]
value: str
Here, JSON representations of ClassA
and ClassB
are indistinguishable based on property names alone. However, the property name
for ClassA
can only take values "A"
and "a"
, and property name
for ClassB
can only take values "B"
and "b"
, hence a JSON object such as
{ "name": "A", "value": "string" }
uniquely identifies ClassA
, and can never match ClassB
. The de-serializer can instantiate the appropriate class, and populate properties of the newly created instance.
Tagged union types must have at least one property of a literal type, and the values for that type must be all different.
When de-serializing regular union types that have no type tags, the first successfully matching type is selected. It is a parse error if all union member types have been exhausted without a finding match.
Name mangling
If a Python class has a property augmented with an underscore (_
) as per PEP 8 to avoid conflict with a Python keyword (e.g. for
or in
), the underscore is removed when reading from or writing to JSON.