Skip to main content

Deep dataclasses with nested structures and validation

Project description

deep_dataclasses

Tests Coverage PyPI Python License Size

Define nested dataclass hierarchies as clean, readable schemas — no boilerplate, no dependencies.


The Problem

Python's @dataclass requires you to define each level of a nested hierarchy separately, then wire them together manually. Each inner class name must appear three times: once in the class definition, once as the field type hint, and once in field(default_factory=...):

from dataclasses import dataclass, asdict, field

@dataclass
class NestedParent:
    @dataclass
    class Child:
        @dataclass
        class GrandChild:
            grandchild_str: str = "grandchild1"
            grandchild_num: int = 1

        grandchild: GrandChild = field(default_factory=GrandChild)
        child_str: str = "child"

    child: Child = field(default_factory=Child)
    parent_str: str = "parent"

And even after all that boilerplate, the asdict round-trip is broken:

NestedParent(**asdict(NestedParent())) == NestedParent()  # False

This is because asdict serialises nested instances to plain dicts, but @dataclass does not coerce them back on construction — unlike flat dataclasses where this works naturally.


The Solution

@deep_dataclass lets you express the same hierarchy as a natural nested schema. The decorator infers the class name, type hint, and default_factory from the nested block — no repetition:

from deep_dataclasses import deep_dataclass

@deep_dataclass(autosnake=True)
class DeepParent:
    class Child:
        class Grandchild:
            grandchild_str: str = "grandchild1"
            grandchild_num: int = 1
        child_str: str = "child"
    parent_str: str = "parent"

print(DeepParent().child.grandchild)
# Grandchild(grandchild_str='grandchild1', grandchild_num=1)

The autosnake=True option converts PascalCase inner class names to snake_case field names. Without it the field name matches the class name exactly.


Fully Compatible with dataclasses

@deep_dataclass produces standard dataclass instances — all stdlib tools work as expected, and the asdict round-trip is fixed:

d1 = NestedParent()  # vanilla dataclass hierarchy
d2 = DeepParent()    # deep_dataclass equivalent

asdict(d1) == asdict(d2)        # True — identical structure
DeepParent(**asdict(d2)) == d2  # True — coercion works
NestedParent(**asdict(d1)) == d1  # False — vanilla @dataclass doesn't coerce nested dicts

Validation and Config Loading

to_json_schema exports any @deep_dataclass schema for use with third-party validators. Because @deep_dataclass coerces nested dicts at construction time, the validate-then-construct pattern works at all depths:

from deep_dataclasses import to_json_schema
import jsonschema, json

raw = json.loads('{"child": {"grandchild": {"grandchild_num": 2}}}')
jsonschema.validate(raw, to_json_schema(DeepParent))  # validate first
cfg = DeepParent(**raw)                               # then construct — fully typed
assert isinstance(cfg.child, DeepParent.child)        # True

Validation catches type violations at any nesting depth:

data = asdict(DeepParent())
data['child']['child_str'] = 3                         # inject a type error
jsonschema.validate(data, to_json_schema(DeepParent))  # raises ValidationError
Failed validating 'type' in schema['properties']['child']['properties']['child_str']:
    {'type': 'string', 'default': 'child'}

On instance['child']['child_str']:
    3

Data Modelling with Type Hints

@deep_dataclass works with the full range of typing annotations. The @auxiliary decorator marks an inner class as a type-only helper — it won't become a standalone field, but can be referenced in Union[...], List[...], Optional[...], etc.

from dataclasses import field, asdict
from typing import Literal, List, Union
from deep_dataclasses import deep_dataclass, auxiliary, to_json_schema
import jsonschema

@deep_dataclass
class Config:
    @auxiliary
    class TrainMode:
        lr: float = 0.001
        pseudo_batch_size: int = 32
    @auxiliary
    class TestMode:
        metric: str = "accuracy"
        folds: int = 5
    mode: Union[TrainMode, TestMode] = field(default_factory=TrainMode)
    device: Literal["cpu", "cuda"] = "cpu"
    images: List[str] = field(default_factory=list)

When constructing from a dict, @deep_dataclass selects the Union variant whose field names best cover the keys supplied — an exact match is always preferred over a partial one:

cfg_train = Config(mode={"lr": 0.05})           # 'lr' is a TrainMode field
cfg_test  = Config(mode={"metric": "f1"})        # 'metric' is a TestMode field

assert isinstance(cfg_train.mode, Config.TrainMode)
assert isinstance(cfg_test.mode,  Config.TestMode)
assert cfg_train.mode.pseudo_batch_size == 32    # unspecified fields get defaults

Schema validation enforces Literal values, Union structure, and List element types:

jsonschema.validate(asdict(Config()), to_json_schema(Config))  # passes

bad = asdict(Config())
bad["device"] = "tpu"                            # not in Literal["cpu", "cuda"]
jsonschema.validate(bad, to_json_schema(Config)) # raises ValidationError

Installation

pip install deep-dataclasses

deep_dataclasses has zero mandatory dependencies — it uses only re, dataclasses, and typing from the standard library. jsonschema (used in the examples above) is an optional third-party package needed only if you want schema validation.


Comparison

Feature @dataclass @deep_dataclass
Nested hierarchy Manual, verbose Inline, readable
field(default_factory=...) Required per field Automatic
Nested dict → instance coercion ✅ (recursive, all depths)
Union variant selection from dict ✅ (best-match by field coverage)
asdict() / == / __repr__
frozen, slots, etc. ✅ (tested)
Type validation Exports to jsonschema
Mandatory dependencies stdlib stdlib only

Status

Early release. Core functionality is complete with 100% test coverage. API may evolve — feedback welcome on discuss.python.org.

Contributing

Issues and PRs welcome. See the issue tracker for known TODOs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deep_dataclasses-0.3.3.tar.gz (24.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deep_dataclasses-0.3.3-py3-none-any.whl (14.4 kB view details)

Uploaded Python 3

File details

Details for the file deep_dataclasses-0.3.3.tar.gz.

File metadata

  • Download URL: deep_dataclasses-0.3.3.tar.gz
  • Upload date:
  • Size: 24.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for deep_dataclasses-0.3.3.tar.gz
Algorithm Hash digest
SHA256 72db0434661feb22fbe459b19206acf5ac72b9df9cacf1d631fb8e3b42f680c6
MD5 3b62f19e992954fed3f764b64aa784a2
BLAKE2b-256 5a05732eb00770ae159d1dcfb2d079b1d8130ccf762339746cfa390ce6f9a8c3

See more details on using hashes here.

File details

Details for the file deep_dataclasses-0.3.3-py3-none-any.whl.

File metadata

File hashes

Hashes for deep_dataclasses-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 26657bb76e3de395c73751756c77f39a197e0c4922081f8df1f9e30bc77bac84
MD5 2b83cfa61c157b4ae0a09d32c4a85dcd
BLAKE2b-256 2b7687111219e41ca3db0f0aad6daac5beffd366cf9524bea3c8d240092e2ea8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page