Skip to main content

Simple creation of dataclasses from JSON

Project description

dataglasses

PyPi Python License Actions status Codecov Ruff

A small package to simplify creating dataclasses from JSON and validating that JSON.

Installation

$ pip install dataglasses

Requirements

Requires Python 3.10 or later.

If you wish to validate arbitrary JSON data against the generated JSON schemas in Python, consider installing jsonschema, though this is unnecessary when using dataglasses to convert JSON into dataclasses.

Quick start

>>> from dataclasses import dataclass
>>> from dataglasses import from_dict, to_json_schema
>>> from json import dumps

>>> @dataclass
... class InventoryItem:
...     name: str
...     unit_price: float
...     quantity_on_hand: int = 0

>>> from_dict(InventoryItem, { "name": "widget", "unit_price": 3.0})
InventoryItem(name='widget', unit_price=3.0, quantity_on_hand=0)

>>> print(dumps(to_json_schema(InventoryItem), indent=2))
print output...
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$ref": "#/$defs/InventoryItem",
  "$defs": {
    "InventoryItem": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string"
        },
        "unit_price": {
          "type": "number"
        },
        "quantity_on_hand": {
          "type": "integer",
          "default": 0
        }
      },
      "required": [
        "name",
        "unit_price"
      ]
    }
  }
}

Objective

The purpose of this library is to speed up rapid development by making it trivial to populate dataclasses with dictionary data extracted from JSON (or elsewhere), as well as to perform basic validation on that data. The library contains just one file and two functions, so can even be directly copied into a project.

It is not intended for complex validation or high performance. For those, consider using pydantic.

Usage

The package contains just two functions:

def from_dict(
    cls: type[T],
    value: Any,
    *,
    strict: bool = False,
    transform: Optional[TransformRules] = None,
    local_refs: Optional[set[type]] = None,
) -> T

This converts a nested dictionary value of input data into the given dataclass type cls, raising an exception if the conversion is not possible. (The optional keyword arguments are described further down.)

def to_json_schema(
    cls: type,
    *,
    strict: bool = False,
    transform: Optional[TransformRules] = None,
    local_refs: Optional[set[type]] = None,
) -> dict[str, Any]:

This generates a 2020-12 JSON schema representing valid inputs for the dataclass type cls, raising an exception if the class cannot be represented in JSON. (Again, the optional keyword arguments are described further down.)

Below is a summary of the different supported use cases:

Nested structures

Dataclasses can be nested, using either global or local definitions.

>>> @dataclass
... class TrackedItem:
... 
...     @dataclass
...     class GPS:
...         lat: float
...         long: float
...         
...     item: InventoryItem
...     location: GPS

>>> from_dict(TrackedItem, {
...     "item": { "name": "pie", "unit_price": 42},
...     "location": { "lat": 52.2, "long": 0.1 } })
TrackedItem(item=InventoryItem(name='pie', unit_price=42, quantity_on_hand=0),
location=TrackedItem.GPS(lat=52.2, long=0.1))

>>> print(dumps(to_json_schema(TrackedItem), indent=2))
print output...
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$ref": "#/$defs/TrackedItem",
  "$defs": {
    "TrackedItem": {
      "type": "object",
      "properties": {
        "item": {
          "$ref": "#/$defs/InventoryItem"
        },
        "location": {
          "$ref": "#/$defs/TrackedItem.GPS"
        }
      },
      "required": [
        "item",
        "location"
      ]
    },
    "InventoryItem": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string"
        },
        "unit_price": {
          "type": "number"
        },
        "quantity_on_hand": {
          "type": "integer",
          "default": 0
        }
      },
      "required": [
        "name",
        "unit_price"
      ]
    },
    "TrackedItem.GPS": {
      "type": "object",
      "properties": {
        "lat": {
          "type": "number"
        },
        "long": {
          "type": "number"
        }
      },
      "required": [
        "lat",
        "long"
      ]
    }
  }
}

Collection types

There is automatic support for the generic collection types most compatible with JSON: list[T], tuple[...] and Sequence[T] (encoded as arrays) and dict[str, T] and Mapping[str, T] (encoded as objects).

>>> from collections.abc import Mapping, Sequence

>>> @dataclass
... class Catalog:
...     items: Sequence[InventoryItem]
...     publisher: tuple[str, int]
...     purchases: Mapping[str, int]
    
>>> from_dict(Catalog, {
...     "items": [{ "name": "widget", "unit_price": 3.0}],
...     "publisher": ["ACME", 1982],
...     "purchases": { "Wile E. Coyote": 52}})
Catalog(items=[InventoryItem(name='widget', unit_price=3.0, quantity_on_hand=0)],
publisher=('ACME', 1982), purchases={'Wile E. Coyote': 52})    

>>> print(dumps(to_json_schema(Catalog), indent=2))
print output...
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$ref": "#/$defs/Catalog",
  "$defs": {
    "Catalog": {
      "type": "object",
      "properties": {
        "items": {
          "type": "array",
          "items": {
            "$ref": "#/$defs/InventoryItem"
          }
        },
        "publisher": {
          "type": "array",
          "prefixItems": [
            {
              "type": "string"
            },
            {
              "type": "integer"
            }
          ],
          "minItems": 2,
          "maxItems": 2
        },
        "purchases": {
          "type": "object",
          "patternProperties": {
            "^.*$": {
              "type": "integer"
            }
          }
        }
      },
      "required": [
        "items",
        "publisher",
        "purchases"
      ]
    },
    "InventoryItem": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string"
        },
        "unit_price": {
          "type": "number"
        },
        "quantity_on_hand": {
          "type": "integer",
          "default": 0
        }
      },
      "required": [
        "name",
        "unit_price"
      ]
    }
  }
}

Unrestricted types like list or dict (or set or Any) and mappings with non-str keys can be used with from_dict but not with to_json_schema. Alternatively, these, alongside unsupported generic types like set[T], can be used with both from_dict and to_json_schema by defining an appropriate encoding transformation (see section below).

Optional and Union types

Union types (S | T or Union[S, T, ...]) are matched against all their permitted subtypes in order, returning the first successful match, or raising an exception if there are none. Optional types (T | None or Optional[T]) are handled similarly. Note that an optional type is not the same as an optional field (i.e. one with a default): a field with an optional type is still a required field unless it has a default value (which could be None but could also be something else).

>>> from typing import Optional

>>> @dataclass
... class ItemPurchase:
...     items: Sequence[InventoryItem | TrackedItem]
...     invoice: Optional[int] = None
    
>>> from_dict(ItemPurchase, {
...     "items": [{
...         "item": { "name": "pie", "unit_price": 42},
...         "location": { "lat": 52.2, "long": 0.1 } }],
...     "invoice": 1234})
ItemPurchase(items=[TrackedItem(item=
InventoryItem(name='pie', unit_price=42, quantity_on_hand=0),
location=TrackedItem.GPS(lat=52.2, long=0.1))], invoice=1234)

>>> print(dumps(to_json_schema(ItemPurchase), indent=2))
print output...
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$ref": "#/$defs/ItemPurchase",
  "$defs": {
    "ItemPurchase": {
      "type": "object",
      "properties": {
        "items": {
          "type": "array",
          "items": {
            "anyOf": [
              {
                "$ref": "#/$defs/InventoryItem"
              },
              {
                "$ref": "#/$defs/TrackedItem"
              }
            ]
          }
        },
        "invoice": {
          "anyOf": [
            {
              "type": "integer"
            },
            {
              "type": "null"
            }
          ],
          "default": null
        }
      },
      "required": [
        "items"
      ]
    },
    "InventoryItem": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string"
        },
        "unit_price": {
          "type": "number"
        },
        "quantity_on_hand": {
          "type": "integer",
          "default": 0
        }
      },
      "required": [
        "name",
        "unit_price"
      ]
    },
    "TrackedItem": {
      "type": "object",
      "properties": {
        "item": {
          "$ref": "#/$defs/InventoryItem"
        },
        "location": {
          "$ref": "#/$defs/TrackedItem.GPS"
        }
      },
      "required": [
        "item",
        "location"
      ]
    },
    "TrackedItem.GPS": {
      "type": "object",
      "properties": {
        "lat": {
          "type": "number"
        },
        "long": {
          "type": "number"
        }
      },
      "required": [
        "lat",
        "long"
      ]
    }
  }
}

Enum and Literal types

Both Enum and Literal types can be used to match explicit enumerations. By default, Enum types match both the values and symbolic names (preferring the former in case of a clash). This behaviour can be overridden using a transformation if desired (see section below).

>>> from enum import auto, StrEnum
>>> from typing import Literal

>>> class BuildType(StrEnum):
...     DEBUG = auto()
...     OPTIMIZED = auto()
    
>>> @dataclass
... class Release:
...     build: BuildType
...     approved: Literal["Yes", "No"]
    
>>> from_dict(Release, {"build": "debug", "confirmed": "Yes"})
Release(build=<Build.DEBUG: 'debug'>, approved='Yes')

>>> print(dumps(to_json_schema(Release), indent=2))
print output...
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$ref": "#/$defs/Release",
  "$defs": {
    "Release": {
      "type": "object",
      "properties": {
        "build": {
          "enum": [
            "debug",
            "optimized",
            "DEBUG",
            "OPTIMIZED"
          ]
        },
        "approved": {
          "enum": [
            "Yes",
            "No"
          ]
        }
      },
      "required": [
        "build",
        "confirmed"
      ]
    }
  }
}

Annotated types

Annotated types can be used to populate the property "description" annotations in the JSON schema.

>>> from typing import Annotated

>>> @dataclass
... class InventoryItem:
...     name: Annotated[str, "item name"]
...     unit_price: Annotated[float, "unit price"]
...     quantity_on_hand: Annotated[int, "quantity on hand"] = 0

>>> from_dict(InventoryItem, { "name": "widget", "unit_price": 3.0})
InventoryItem(name='widget', unit_price=3.0, quantity_on_hand=0)

>>> print(dumps(to_json_schema(InventoryItem), indent=2))
print output...
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$ref": "#/$defs/InventoryItem",
  "$defs": {
    "InventoryItem": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string",
          "description": "item name"
        },
        "unit_price": {
          "type": "number",
          "description": "unit price"
        },
        "quantity_on_hand": {
          "type": "integer",
          "description": "quantity on hand",
          "default": 0
        }
      },
      "required": [
        "name",
        "unit_price"
      ]
    }
  }
}

Forward references

Forward reference types (written as string literals or ForwardRef objects) are supported, permitting recursive dataclasses. Global and class-scoped references are handled automatically:

>>> @dataclass
... class Cons:
...     head: "Head"
...     tail: Optional["Cons"] = None
...     
...     @dataclass
...     class Head:
...         v: int
...         
...     def __repr__(self):
...         return f"{self.head.v}::{self.tail}"

>>> from_dict(Cons, {"head": {"v": 1}, "tail": {"head": {"v": 2}}})
1::2::None

>> print(dumps(to_json_schema(Cons), indent=2))
print output...
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$ref": "#/$defs/Cons",
  "$defs": {
    "Cons": {
      "type": "object",
      "properties": {
        "head": {
          "$ref": "#/$defs/Cons.Head"
        },
        "tail": {
          "anyOf": [
            {
              "$ref": "#/$defs/Cons"
            },
            {
              "type": "null"
            }
          ],
          "default": null
        }
      },
      "required": [
        "head"
      ]
    },
    "Cons.Head": {
      "type": "object",
      "properties": {
        "v": {
          "type": "integer"
        }
      },
      "required": [
        "v"
      ]
    }
  }
}

Locally-scoped references, however, must be specified using the local_refs keyword:

>>> def reverse_cons(seq):
... 
...     @dataclass
...     class Cons:
...         head: int
...         tail: Optional["Cons"] = None
... 
...         def __repr__(self):
...             return f"{self.head}::{self.tail}"
... 
...     value = None
...     for x in seq: value = { "head": x, "tail": value }
...     return from_dict(Cons, value, local_refs={Cons})

>>> reverse_cons([1,2,3])
3::2::1::None

Strict mode

Both from_dict and to_json_schema default to ignoring additional properties that are not part of a dataclass (similar to additionalProperties defaulting to true in JSON schemas). This can be disabled with the strict keyword.

>>> value = { "name": "widget", "unit_price": 4.0, "comment": "too expensive"}

>>> from_dict(InventoryItem, value)
InventoryItem(name='widget', unit_price=4.0, quantity_on_hand=0)
>>> from_dict(InventoryItem, value, strict=True)
TypeError: Unexpected <class '__main__.InventoryItem'> fields {'comment'}

>>> print(dumps(to_json_schema(InventoryItem, strict=True), indent=2))
print output...
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$ref": "#/$defs/InventoryItem",
  "$defs": {
    "InventoryItem": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string",
          "description": "item name"
        },
        "unit_price": {
          "type": "number",
          "description": "unit price"
        },
        "quantity_on_hand": {
          "type": "integer",
          "description": "quantity on hand",
          "default": 0
        }
      },
      "required": [
        "name",
        "unit_price"
      ],
      "additionalProperties": false
    }
  }
}

Transformations

Transformations allow you to override the handling of specific types or dataclass fields, and can be used to normalise inputs or convert them into different types, including ones that aren't normally supported. Transformations are specified with the transform keyword, using a mapping:

  • the mapping keys are either:
    • a type used somewhere in the output dataclass: e.g. str or set[int]
    • a dataclass field specified by a class-name tuple: e.g. (InventoryItem, "name") or (Cons, "head")
  • the mapping values are a tuple consisting of:
    • the JSON-serialisable input type that we want to represent this output type or field
    • a callable function to convert from that input type to the output type

Note that the input type can be the same as the output type. Conversely, note that transformations don't help with serialising the dataclasses back into JSON from non-serialisable types.

>>> @dataclass
... class Person:
...     name : str
...     aliases: set[str]

>>> transform = {
...     str: (str, str.title),
...     set[str]: (list[str], set),
...     (Person, "name"): (str, lambda s: s + "!")}
    
>>> from_dict(Person, {"name": "robert", "aliases": ["bob", "bobby"]}, transform=transform)
Person(name='Robert!', aliases={'Bobby', 'Bob'})
    
>>> print(dumps(to_json_schema(Person, transform=transform), indent=2))
print output...
{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "$ref": "#/$defs/Person",
  "$defs": {
    "Person": {
      "type": "object",
      "properties": {
        "name": {
          "type": "string"
        },
        "aliases": {
          "type": "array",
          "items": {
            "type": "string"
          }
        }
      },
      "required": [
        "name",
        "aliases"
      ]
    }
  }
}

Contributions

Bug reports, feature requests and contributions are very welcome. Note that PRs must include tests with 100% code coverage and pass the necessary quality checks before they can be merged.

To run the tests, make sure you have uv installed, then type:

$ uv run task tests

To perform the formatting and linting checks, type:

$ uv run task check

To automatically resolve automatically fixable formatting and linting issues, type:

$ uv run task format

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataglasses-0.8.0.tar.gz (33.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dataglasses-0.8.0-py3-none-any.whl (10.1 kB view details)

Uploaded Python 3

File details

Details for the file dataglasses-0.8.0.tar.gz.

File metadata

  • Download URL: dataglasses-0.8.0.tar.gz
  • Upload date:
  • Size: 33.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dataglasses-0.8.0.tar.gz
Algorithm Hash digest
SHA256 eca0b94a3e2444a2ec786a6e442de9472f27ac8654be9cd58b0b0a6a5610a946
MD5 85c0d2514741058781dd83525a0b516d
BLAKE2b-256 73a88c7a367d80fb4c0569bdb399e0e5863a666fb678cd20c356b3a123c0c880

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataglasses-0.8.0.tar.gz:

Publisher: deploy.yaml on Udzu/dataglasses

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataglasses-0.8.0-py3-none-any.whl.

File metadata

  • Download URL: dataglasses-0.8.0-py3-none-any.whl
  • Upload date:
  • Size: 10.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dataglasses-0.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 019a4dd11d2fedc805971d4a4d7c45fb8d1888676c558717b819bb1fee27937c
MD5 235e828457b497572367e5dd87a7c869
BLAKE2b-256 79a7a59c29eea43d49c0da1ca468684ffd4a081c5e5d08df40e842852696910f

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataglasses-0.8.0-py3-none-any.whl:

Publisher: deploy.yaml on Udzu/dataglasses

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page