Deep dataclasses with nested structures and validation
Project description
deep_dataclasses
Define nested dataclass hierarchies as clean, readable schemas — no boilerplate, no dependencies.
The Problem
Python's @dataclass requires you to define each level of a nested hierarchy separately, then wire them together manually. Each inner class name must appear three times: once in the class definition, once as the field type hint, and once in field(default_factory=...):
from dataclasses import dataclass, asdict, field
@dataclass
class NestedParent:
@dataclass
class Child:
@dataclass
class GrandChild:
grandchild_str: str = "grandchild1"
grandchild_num: int = 1
grandchild: GrandChild = field(default_factory=GrandChild)
child_str: str = "child"
child: Child = field(default_factory=Child)
parent_str: str = "parent"
And even after all that boilerplate, the asdict round-trip is broken:
NestedParent(**asdict(NestedParent())) == NestedParent() # False
This is because asdict serialises nested instances to plain dicts, but @dataclass does not coerce them back on construction — unlike flat dataclasses where this works naturally.
The Solution
@deep_dataclass lets you express the same hierarchy as a natural nested schema. The decorator infers the class name, type hint, and default_factory from the nested block — no repetition:
from deep_dataclasses import deep_dataclass
@deep_dataclass(autosnake=True)
class DeepParent:
class Child:
class Grandchild:
grandchild_str: str = "grandchild1"
grandchild_num: int = 1
child_str: str = "child"
parent_str: str = "parent"
print(DeepParent().child.grandchild)
# Grandchild(grandchild_str='grandchild1', grandchild_num=1)
The autosnake=True option converts PascalCase inner class names to snake_case field names. Without it the field name matches the class name exactly.
Fully Compatible with dataclasses
@deep_dataclass produces standard dataclass instances — all stdlib tools work as expected, and the asdict round-trip is fixed:
d1 = NestedParent() # vanilla dataclass hierarchy
d2 = DeepParent() # deep_dataclass equivalent
asdict(d1) == asdict(d2) # True — identical structure
DeepParent(**asdict(d2)) == d2 # True — coercion works
NestedParent(**asdict(d1)) == d1 # False — vanilla @dataclass doesn't coerce nested dicts
Validation and Config Loading
to_json_schema exports any @deep_dataclass schema for use with third-party validators. Because @deep_dataclass coerces nested dicts at construction time, the validate-then-construct pattern works at all depths:
from deep_dataclasses import to_json_schema
import jsonschema, json
raw = json.loads('{"child": {"grandchild": {"grandchild_num": 2}}}')
jsonschema.validate(raw, to_json_schema(DeepParent)) # validate first
cfg = DeepParent(**raw) # then construct — fully typed
assert isinstance(cfg.child, DeepParent.child) # True
Validation catches type violations at any nesting depth:
data = asdict(DeepParent())
data['child']['child_str'] = 3 # inject a type error
jsonschema.validate(data, to_json_schema(DeepParent)) # raises ValidationError
Failed validating 'type' in schema['properties']['child']['properties']['child_str']:
{'type': 'string', 'default': 'child'}
On instance['child']['child_str']:
3
Data Modelling with Type Hints
@deep_dataclass works with the full range of typing annotations. The @auxiliary decorator marks an inner class as a type-only helper — it won't become a standalone field, but can be referenced in Union[...], List[...], Optional[...], etc.
from dataclasses import field, asdict
from typing import Literal, List, Union
from deep_dataclasses import deep_dataclass, auxiliary, to_json_schema
import jsonschema
@deep_dataclass
class Config:
@auxiliary
class TrainMode:
lr: float = 0.001
pseudo_batch_size: int = 32
@auxiliary
class TestMode:
metric: str = "accuracy"
folds: int = 5
mode: Union[TrainMode, TestMode] = field(default_factory=TrainMode)
device: Literal["cpu", "cuda"] = "cpu"
images: List[str] = field(default_factory=list)
When constructing from a dict, @deep_dataclass selects the Union variant whose field names best cover the keys supplied — an exact match is always preferred over a partial one:
cfg_train = Config(mode={"lr": 0.05}) # 'lr' is a TrainMode field
cfg_test = Config(mode={"metric": "f1"}) # 'metric' is a TestMode field
assert isinstance(cfg_train.mode, Config.TrainMode)
assert isinstance(cfg_test.mode, Config.TestMode)
assert cfg_train.mode.pseudo_batch_size == 32 # unspecified fields get defaults
Schema validation enforces Literal values, Union structure, and List element types:
jsonschema.validate(asdict(Config()), to_json_schema(Config)) # passes
bad = asdict(Config())
bad["device"] = "tpu" # not in Literal["cpu", "cuda"]
jsonschema.validate(bad, to_json_schema(Config)) # raises ValidationError
Installation
pip install deep-dataclasses
deep_dataclasses has zero mandatory dependencies — it uses only re, dataclasses, and typing from the standard library. jsonschema (used in the examples above) is an optional third-party package needed only if you want schema validation.
Comparison
| Feature | @dataclass |
@deep_dataclass |
|---|---|---|
| Nested hierarchy | Manual, verbose | Inline, readable |
field(default_factory=...) |
Required per field | Automatic |
| Nested dict → instance coercion | ❌ | ✅ (recursive, all depths) |
Union variant selection from dict |
❌ | ✅ (best-match by field coverage) |
asdict() / == / __repr__ |
✅ | ✅ |
frozen, slots, etc. |
✅ | ✅ (tested) |
| Type validation | ❌ | Exports to jsonschema |
| Mandatory dependencies | stdlib | stdlib only |
Status
Early release. Core functionality is complete with 100% test coverage. API may evolve — feedback welcome on discuss.python.org.
Contributing
Issues and PRs welcome. See the issue tracker for known TODOs.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file deep_dataclasses-0.3.3.tar.gz.
File metadata
- Download URL: deep_dataclasses-0.3.3.tar.gz
- Upload date:
- Size: 24.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
72db0434661feb22fbe459b19206acf5ac72b9df9cacf1d631fb8e3b42f680c6
|
|
| MD5 |
3b62f19e992954fed3f764b64aa784a2
|
|
| BLAKE2b-256 |
5a05732eb00770ae159d1dcfb2d079b1d8130ccf762339746cfa390ce6f9a8c3
|
File details
Details for the file deep_dataclasses-0.3.3-py3-none-any.whl.
File metadata
- Download URL: deep_dataclasses-0.3.3-py3-none-any.whl
- Upload date:
- Size: 14.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
26657bb76e3de395c73751756c77f39a197e0c4922081f8df1f9e30bc77bac84
|
|
| MD5 |
2b83cfa61c157b4ae0a09d32c4a85dcd
|
|
| BLAKE2b-256 |
2b7687111219e41ca3db0f0aad6daac5beffd366cf9524bea3c8d240092e2ea8
|