Skip to main content

Deep dataclasses with nested structures and validation

Project description

deep_dataclasses

Tests Coverage Docs PyPI Python License Size

Define nested dataclass hierarchies as clean, readable schemas — no boilerplate, no dependencies.


The Problem

Python's @dataclass requires you to define each level of a nested hierarchy separately, then wire them together manually. Each inner class name must appear three times: once in the class definition, once as the field type hint, and once in field(default_factory=...):

from dataclasses import dataclass, asdict, field

@dataclass
class NestedParent:
    @dataclass
    class Child:
        @dataclass
        class GrandChild:
            grandchild_str: str = "grandchild1"
            grandchild_num: int = 1

        grandchild: GrandChild = field(default_factory=GrandChild)
        child_str: str = "child"

    child: Child = field(default_factory=Child)
    parent_str: str = "parent"

And even after all that boilerplate, the asdict round-trip is broken:

NestedParent(**asdict(NestedParent())) == NestedParent()  # False

This is because asdict serialises nested instances to plain dicts, but @dataclass does not coerce them back on construction — unlike flat dataclasses where this works naturally.


The Solution

@deep_dataclass lets you express the same hierarchy as a natural nested schema. The decorator infers the class name, type hint, and default_factory from the nested block — no repetition:

from deep_dataclasses import deep_dataclass

@deep_dataclass(autosnake=True)
class DeepParent:
    class Child:
        class Grandchild:
            grandchild_str: str = "grandchild1"
            grandchild_num: int = 1
        child_str: str = "child"
    parent_str: str = "parent"

print(DeepParent().child.grandchild)
# Grandchild(grandchild_str='grandchild1', grandchild_num=1)

The autosnake=True option converts PascalCase inner class names to snake_case field names. Without it the field name matches the class name exactly.


Fully Compatible with dataclasses

@deep_dataclass produces standard dataclass instances — all stdlib tools work as expected, and the asdict round-trip is fixed:

d1 = NestedParent()  # vanilla dataclass hierarchy
d2 = DeepParent()    # deep_dataclass equivalent

asdict(d1) == asdict(d2)        # True — identical structure
DeepParent(**asdict(d2)) == d2  # True — coercion works
NestedParent(**asdict(d1)) == d1  # False — vanilla @dataclass doesn't coerce nested dicts

Validation and Config Loading: A poor mans pydantic

to_json_schema exports any @deep_dataclass schema for use with third-party validators. Because @deep_dataclass coerces nested dicts at construction time, the validate-then-construct pattern works at all depths:

from deep_dataclasses import to_json_schema
import jsonschema, json

raw = json.loads('{"child": {"grandchild": {"grandchild_num": 2}}}')
jsonschema.validate(raw, to_json_schema(DeepParent))  # validate first
cfg = DeepParent(**raw)                               # then construct — fully typed
assert isinstance(cfg.child, DeepParent.child)        # True

Validation catches type violations at any nesting depth:

data = asdict(DeepParent())
data['child']['child_str'] = 3                         # inject a type error
jsonschema.validate(data, to_json_schema(DeepParent))  # raises ValidationError
Failed validating 'type' in schema['properties']['child']['properties']['child_str']:
    {'type': 'string', 'default': 'child'}

On instance['child']['child_str']:
    3

Data Modelling with Type Hints

@deep_dataclass works with the full range of typing annotations. The @auxiliary decorator marks an inner class as a type-only helper — it won't become a standalone field, but can be referenced in Union[...], List[...], Optional[...], etc.

from dataclasses import field, asdict
from typing import Literal, List, Union
from deep_dataclasses import deep_dataclass, auxiliary, to_json_schema
import jsonschema

@deep_dataclass
class Config:
    @auxiliary
    class TrainMode:
        lr: float = 0.001
        pseudo_batch_size: int = 32
    @auxiliary
    class TestMode:
        metric: str = "accuracy"
        folds: int = 5
    mode: Union[TrainMode, TestMode] = field(default_factory=TrainMode)
    device: Literal["cpu", "cuda"] = "cpu"
    images: List[str] = field(default_factory=list)

When constructing from a dict, @deep_dataclass selects the Union variant whose field names best cover the keys supplied — an exact match is always preferred over a partial one:

cfg_train = Config(mode={"lr": 0.05})           # 'lr' is a TrainMode field
cfg_test  = Config(mode={"metric": "f1"})        # 'metric' is a TestMode field

assert isinstance(cfg_train.mode, Config.TrainMode)
assert isinstance(cfg_test.mode,  Config.TestMode)
assert cfg_train.mode.pseudo_batch_size == 32    # unspecified fields get defaults

Schema validation enforces Literal values, Union structure, and List element types:

jsonschema.validate(asdict(Config()), to_json_schema(Config))  # passes

bad = asdict(Config())
bad["device"] = "tpu"                            # not in Literal["cpu", "cuda"]
jsonschema.validate(bad, to_json_schema(Config)) # raises ValidationError

Installation

pip install deep-dataclasses

deep_dataclasses has zero mandatory dependencies — it uses only re, dataclasses, and typing from the standard library. jsonschema (used in the examples above) is an optional third-party package needed only if you want schema validation.


Comparison

Feature @dataclass @deep_dataclass
Nested hierarchy Manual, verbose Inline, readable
field(default_factory=...) Required per field Automatic
Nested dict → instance coercion ✅ (recursive, all depths)
Union variant selection from dict ✅ (best-match by field coverage)
asdict() / == / __repr__
frozen, slots, etc. ✅ (tested)
Type validation Exports to jsonschema
Mandatory dependencies stdlib stdlib only

Status

Early release. Core functionality is complete with 100% test coverage. API may evolve — feedback welcome on discuss.python.org.

Contributing

Issues and PRs welcome. See the issue tracker for known TODOs.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

deep_dataclasses-0.3.5.tar.gz (28.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

deep_dataclasses-0.3.5-py3-none-any.whl (16.3 kB view details)

Uploaded Python 3

File details

Details for the file deep_dataclasses-0.3.5.tar.gz.

File metadata

  • Download URL: deep_dataclasses-0.3.5.tar.gz
  • Upload date:
  • Size: 28.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for deep_dataclasses-0.3.5.tar.gz
Algorithm Hash digest
SHA256 a7c6361f767296e86cebe0de6aaa6f8ae5165aa94be2a54bd60747ca1fedcf3b
MD5 c05f8e254d253c49c5d1a7167c47a9ba
BLAKE2b-256 9d952f1a45e2425686e69569c37b9aadf40296d1f7a6e83efe17988a52e2a0ea

See more details on using hashes here.

File details

Details for the file deep_dataclasses-0.3.5-py3-none-any.whl.

File metadata

File hashes

Hashes for deep_dataclasses-0.3.5-py3-none-any.whl
Algorithm Hash digest
SHA256 231053cdbbdf51954c20b604b1ed8245aba016453458fb33689fc4a6e72b2a85
MD5 b8233ec98163fa6e723a1bd20ebfc908
BLAKE2b-256 ea4b49788302c084b5676d1d3c2e5011f8420535443ad3281d12bbd6c7bd89b3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page