Simple creation of dataclasses from JSON
Project description
dataglasses
A small package to simplify creating dataclasses from JSON and validating that JSON.
Installation
$ pip install dataglasses
Requirements
Requires Python 3.10 or later.
If you wish to validate arbitrary JSON data against the generated JSON schemas in Python, consider installing jsonschema, though this is unnecessary when using dataglasses
to convert JSON into dataclasses.
Quick start
>>> from dataclasses import dataclass
>>> from dataglasses import from_dict, to_json_schema
>>> from json import dumps
>>> @dataclass
... class InventoryItem:
... name: str
... unit_price: float
... quantity_on_hand: int = 0
>>> from_dict(InventoryItem, { "name": "widget", "unit_price": 3.0})
InventoryItem(name='widget', unit_price=3.0, quantity_on_hand=0)
>>> print(dumps(to_json_schema(InventoryItem), indent=2))
print output...
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$ref": "#/$defs/InventoryItem",
"$defs": {
"InventoryItem": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"unit_price": {
"type": "number"
},
"quantity_on_hand": {
"type": "integer",
"default": 0
}
},
"required": [
"name",
"unit_price"
]
}
}
}
Objective
The purpose of this library is to speed up rapid development by making it trivial to populate type-annotated dataclasses with dictionary data extracted from JSON, as well as to perform basic validation on that data. The library contains just one file and two functions, so can even be directly copied into a project.
It is not intended for complex validation or high performance. For those, consider using pydantic.
Usage
The package contains just two functions:
def from_dict(
cls: type[T],
value: Any,
*,
strict: bool = False,
transform: Optional[TransformRules] = None,
) -> T
This converts a nested dictionary value
of input data into the given dataclass type cls
, raising an exception if the conversion is not possible. (The optional keyword arguments are described further down.)
def to_json_schema(
cls: type,
*,
strict: bool = False,
transform: Optional[TransformRules] = None,
) -> dict[str, Any]:
This generates a JSON schema representing valid inputs for the dataclass type cls
, raising an exception if the class cannot be represented in JSON. (Again, the optional keyword arguments are described further down.)
Below is a summary of the different supported use cases:
Nested structures
Dataclasses can be nested, using either global or local definitions.
>>> @dataclass
... clss TrackedItem:
...
... @dataclass
... class GPS:
... lat: float
... long: float
...
... item: InventoryItem
... location: GPS
>>> from_dict(TrackedItem, {
... "item": { "name": "pie", "unit_price": 42},
... "location": { "lat": 52.2, "long": 0.1 } })
TrackedItem(item=InventoryItem(name='pie', unit_price=42, quantity_on_hand=0),
location=TrackedItem.GPS(lat=52.2, long=0.1))
>>> print(dumps(to_json_schema(TrackedItem), indent=2))
print output...
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$ref": "#/$defs/TrackedItem",
"$defs": {
"TrackedItem": {
"type": "object",
"properties": {
"item": {
"$ref": "#/$defs/InventoryItem"
},
"location": {
"$ref": "#/$defs/TrackedItem.GPS"
}
},
"required": [
"item",
"location"
]
},
"InventoryItem": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"unit_price": {
"type": "number"
},
"quantity_on_hand": {
"type": "integer",
"default": 0
}
},
"required": [
"name",
"unit_price"
]
},
"TrackedItem.GPS": {
"type": "object",
"properties": {
"lat": {
"type": "number"
},
"long": {
"type": "number"
}
},
"required": [
"lat",
"long"
]
}
}
}
Collection types
There is automatic support for the generic collection types most compatible with JSON: list[T]
, tuple[...]
and Sequence[T]
(encoded as arrays) and dict[str, T]
and Mapping[str, T]
(encoded as objects).
>>> from collections.abc import Mapping, Sequence
>>> @dataclass
... class Catalog:
... items: Sequence[InventoryItem]
... publisher: tuple[str, int]
... purchases: Mapping[str, int]
>>> from_dict(Catalog, {
... "items": [{ "name": "widget", "unit_price": 3.0}],
... "publisher": ["ACME", 1982],
... "purchases": { "Wile E. Coyote": 52}})
Catalog(items=[InventoryItem(name='widget', unit_price=3.0, quantity_on_hand=0)],
publisher=('ACME', 1982), purchases={'Wile E. Coyote': 52})
>>> print(dumps(to_json_schema(Catalog), indent=2))
print output...
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$ref": "#/$defs/Catalog",
"$defs": {
"Catalog": {
"type": "object",
"properties": {
"items": {
"type": "array",
"items": {
"$ref": "#/$defs/InventoryItem"
}
},
"publisher": {
"type": "array",
"prefixItems": [
{
"type": "string"
},
{
"type": "integer"
}
],
"minItems": 2,
"maxItems": 2
},
"purchases": {
"type": "object",
"patternProperties": {
"^.*$": {
"type": "integer"
}
}
}
},
"required": [
"items",
"publisher",
"purchases"
]
},
"InventoryItem": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"unit_price": {
"type": "number"
},
"quantity_on_hand": {
"type": "integer",
"default": 0
}
},
"required": [
"name",
"unit_price"
]
}
}
}
Unrestricted types like list
or dict
(or set
or Any
) and mappings with non-str
keys can be used with from_dict
but not with to_json_schema
. Alternatively, these, alongside unsupported generic types like set[T]
, can be used with both from_dict
and to_json_schema
by defining an appropriate encoding transformation (see section below).
Optional and Union types
Union types (S | T
or Union[S, T, ...]
) are matched against all their permitted subtypes in order, returning the first successful match, or raising an exception if there are none. Optional types (T | None
or Optional[T]
) are handled similarly. Note that an optional type is not the same as an optional field (i.e. one with a default): a field with an optional type is still a required field unless it has a default value (which could be None
but could also be something else).
>>> from typing import Optional
>>> @dataclass
... class ItemPurchase:
... items: Sequence[InventoryItem | TrackedItem]
... invoice: Optional[int] = None
>>> from_dict(ItemPurchase, {
... "items": [{
... "item": { "name": "pie", "unit_price": 42},
... "location": { "lat": 52.2, "long": 0.1 } }],
... "invoice": 1234})
ItemPurchase(items=[TrackedItem(item=
InventoryItem(name='pie', unit_price=42, quantity_on_hand=0),
location=TrackedItem.GPS(lat=52.2, long=0.1))], invoice=1234)
>>> print(dumps(to_json_schema(ItemPurchase), indent=2))
print output...
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$ref": "#/$defs/ItemPurchase",
"$defs": {
"ItemPurchase": {
"type": "object",
"properties": {
"items": {
"type": "array",
"items": {
"anyOf": [
{
"$ref": "#/$defs/InventoryItem"
},
{
"$ref": "#/$defs/TrackedItem"
}
]
}
},
"invoice": {
"anyOf": [
{
"type": "integer"
},
{
"type": "null"
}
],
"default": null
}
},
"required": [
"items"
]
},
"InventoryItem": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"unit_price": {
"type": "number"
},
"quantity_on_hand": {
"type": "integer",
"default": 0
}
},
"required": [
"name",
"unit_price"
]
},
"TrackedItem": {
"type": "object",
"properties": {
"item": {
"$ref": "#/$defs/InventoryItem"
},
"location": {
"$ref": "#/$defs/TrackedItem.GPS"
}
},
"required": [
"item",
"location"
]
},
"TrackedItem.GPS": {
"type": "object",
"properties": {
"lat": {
"type": "number"
},
"long": {
"type": "number"
}
},
"required": [
"lat",
"long"
]
}
}
}
Enum and Literal types
Both Enum
and Literal
types can be used to match explicit enumerations. By default, Enum
types match both the values and symbolic names (preferring the former in case of a clash). This behaviour can be overridden using a transformation if desired (see section below).
>>> from enum import auto, StrEnum
>>> from typing import Literal
>>> class BuildType(StrEnum):
... DEBUG = auto()
... OPTIMIZED = auto()
>>> @dataclass
... class Release:
... build: BuildType
... approved: Literal["Yes", "No"]
>>> from_dict(Release, {"build": "debug", "confirmed": "Yes"})
Release(build=<Build.DEBUG: 'debug'>, approved='Yes')
>>> print(dumps(to_json_schema(Release), indent=2))
print output...
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$ref": "#/$defs/Release",
"$defs": {
"Release": {
"type": "object",
"properties": {
"build": {
"enum": [
"debug",
"optimized",
"DEBUG",
"OPTIMIZED"
]
},
"approved": {
"enum": [
"Yes",
"No"
]
}
},
"required": [
"build",
"confirmed"
]
}
}
}
Annotated types
Annotated
types can be used to populate the property "description"
annotations in the JSON schema.
>>> from typing import Annotated
>>> @dataclass
... class InventoryItem:
... name: Annotated[str, "item name"]
... unit_price: Annotated[float, "unit price"]
... quantity_on_hand: Annotated[int, "quantity on hand"] = 0
>>> from_dict(InventoryItem, { "name": "widget", "unit_price": 3.0})
InventoryItem(name='widget', unit_price=3.0, quantity_on_hand=0)
>>> print(dumps(to_json_schema(InventoryItem), indent=2))
print output...
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$ref": "#/$defs/InventoryItem",
"$defs": {
"InventoryItem": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "item name"
},
"unit_price": {
"type": "number",
"description": "unit price"
},
"quantity_on_hand": {
"type": "integer",
"description": "quantity on hand",
"default": 0
}
},
"required": [
"name",
"unit_price"
]
}
}
}
Forward references
Forward reference types (written as string literals or ForwardRef
objects) are handled automatically, permitting recursive dataclasses. Both global and local references are supported.
>>> @dataclass
... class Cons:
... head: int
... tail: Optional["Cons"] = None
...
... def __repr__(self):
... current, rep = self, []
... while isinstance(current, Cons):
... rep.append(str(current.head))
... current = current.tail
... return "(" + ",".join(rep) + ")"
>>> from_dict(Cons, { "head": 1, "tail": { "head": 2 } })
(1,2)
>> print(dumps(to_json_schema(Cons), indent=2))
print output...
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$ref": "#/$defs/Cons",
"$defs": {
"Cons": {
"type": "object",
"properties": {
"head": {
"type": "integer"
},
"tail": {
"anyOf": [
{
"$ref": "#/$defs/Cons"
},
{
"type": "null"
}
],
"default": null
}
},
"required": [
"head"
]
}
}
}
Strict mode
Both from_dict
and to_json_schema
default to ignoring additional properties that are not part of a dataclass (similar to additionalProperties
defaulting to true in JSON schemas). This can be disabled with the strict
keyword.
>>> value = { "name": "widget", "unit_price": 4.0, "comment": "too expensive"}
>>> from_dict(InventoryItem, value)
InventoryItem(name='widget', unit_price=4.0, quantity_on_hand=0)
>>> from_dict(InventoryItem, value, strict=True)
TypeError: Unexpected <class '__main__.InventoryItem'> fields {'comment'}
>>> print(dumps(to_json_schema(InventoryItem, strict=True), indent=2))
print output...
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$ref": "#/$defs/InventoryItem",
"$defs": {
"InventoryItem": {
"type": "object",
"properties": {
"name": {
"type": "string",
"description": "item name"
},
"unit_price": {
"type": "number",
"description": "unit price"
},
"quantity_on_hand": {
"type": "integer",
"description": "quantity on hand",
"default": 0
}
},
"required": [
"name",
"unit_price"
],
"additionalProperties": false
}
}
}
Transformations
Transformations allow you to override the handling of specific types or dataclass fields, and can be used to normalise inputs or convert them into different types, including ones that aren't normally supported. Transformations are specified with the transform
keyword, using a mapping:
- the mapping keys are either:
- a type used somewhere in the output dataclass: e.g.
str
orset[int]
- a dataclass field specified by a class-name tuple: e.g.
(InventoryItem, "name")
or(Cons, "head")
- a type used somewhere in the output dataclass: e.g.
- the mapping values are a tuple consisting of:
- the JSON-serialisable input type that we want to represent this output type or field
- a callable function to convert from that input type to the output type
Note that the input type can be the same as the output type. Conversely, note that transformations don't help with serialising the dataclasses back into JSON from non-serialisable types.
>>> @dataclass
... class Person:
... name : str
... aliases: set[str]
>>> transform = {
... str: (str, str.title),
... set[str]: (list[str], set),
... (Person, "name"): (str, lambda s: s + "!")}
>>> from_dict(Person, {"name": "robert", "aliases": ["bob", "bobby"]}, transform=transform)
Person(name='Robert!', aliases={'Bobby', 'Bob'})
>>> print(dumps(to_json_schema(Person, transform=transform), indent=2))
print output...
{
"$schema": "https://json-schema.org/draft/2020-12/schema",
"$ref": "#/$defs/Person",
"$defs": {
"Person": {
"type": "object",
"properties": {
"name": {
"type": "string"
},
"aliases": {
"type": "array",
"items": {
"type": "string"
}
}
},
"required": [
"name",
"aliases"
]
}
}
}
Contributions
Bug reports, feature requests and contributions are very welcome. Note that PRs must include tests with 100% code coverage and pass the quality checks defined here. More development details will be added shortly, once the project has stabilised...
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for dataglasses-0.5.2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 881dd8a0c236e72c367d54ebb8bc27a17eaaabfe80a2295704949637ca6b3cff |
|
MD5 | 20e830b24df4891e5c50b4efc3ae5eed |
|
BLAKE2b-256 | 7168de521e621706ac62b01b45f7cc9099b8c9a9f7137e06deb1fc8aaa3ece22 |