A Marshmallow schema generator for `attrs` classes. Inspired by `desert`.
Project description
atacama
A Marshmallow schema generator for attrs classes.
Inspired by desert.
Why
desert seems mostly unmaintained. It is also surprisingly small (kudos to the authors), which makes it
a reasonable target for forking and maintaining.
However, we think the (widespread) practice of complecting the data class definition with its serialization schema is unwise. While this is certainly DRY-er than having to rewrite the entire Schema, it's (critically) not DRY at all if you ever want to have different de/serialization patterns depending on the data source.
In particular, atacama is attempting to optimize for the space of Python application that serve APIs
from a database. These are common situations where serialization and deserialization may need to act
differently, and there's value in being able to cleanly separate those without redefining the attrs
class itself.
cattrs is the prior art here, which mostly dynamically defines all of its structure and unstructure
operations, and allows for different Converters to be used on the same attrs classes. However cattrs
does not bring the same level of usability as Marshmallow when it comes to various things that are
important for APIs. In particular, we prefer Marshmallow for its:
- validation, which we find to be more ergonomic in the Marshmallow-verse.
- ecosystem utilities such as OpenAPI spec generation from Marshmallow Schemas.
As of this writing, we are unaware of anything that cattrs can do that we cannot accomplish in
Marshmallow, although for performance and other reasons, there may be cases where cattrs remains a
better fit!
Thus atacama. It aims to provide fully dynamic Schema generation, while retaining 100% of the
generality offered by Marshmallow, in a form that avoids introducing complex shim APIs that no longer
look and feel like Marshmallow itself.
What
atacama takes advantage of Python keyword arguments to provide as low-boilerplate an interface as
possible. Given:
from datetime import datetime, date
import attrs
@attrs.define
class Todo:
id: str
owner_id: str
created_at: datetime
priority: float = 0.0
due_on: None | date = None
For such a simple example, let's assume the following Schema validation rules, but only for when the data comes in via the API:
created_atmust be before the current momentprioritymust be in the range [0.0, 10.0]due_on, if present, must be before 2038, when the Unix epoch will roll over and all computers will die a fiery death.
from typing import Type
from atacama import neo # neo is the recommended default SchemaGenerator
import marshmallow as ma
def before_now(dt: datetime) -> bool:
return dt <= datetime.now()
def before_unix_death(date: date):
return date < date(2038, 1, 19)
TodoFromApi: Type[ma.Schema] = neo(
Todo,
created_at=neo.field(validate=before_now),
priority=neo.field(validate=ma.validate.Range(min=0.0, max=10.0),
due_on=neo.field(validate=before_unix_death),
)
TodoFromDb: Type[ma.Schema] = neo(
Todo,
created_at=neo.field(data_key='created_ts'),
)
# both of the generated Schemas are actually Schema _classes_,
# just like a statically defined Marshmallow class.
# In most cases, you'll want to instantiate an object of the class
# before use, e.g. `TodoFromDb().load(...)`
Note that nothing that we have done here requires
- modifying the
Todoclass in any way. - repeating any information that can be derived from the
Todoclass (e.g. thatdue_onis adate, or that it isOptionalwith a default ofNone). - complecting the data source and validation/transformation for that source with the core data type itself, which can easily be shared across both the database and the API.
Recursive Schema and Field generation
The first example demonstrates what we want and why we want it, but does not prove generality for our approach. Classes are by nature recursively defined, and Schemas must also be.
Happily, atacama supports recursive generation and recursive customization at each layer of the
class+Schema.
There are five fundamental cases for every attribute in a class which is desired to be a Field in a
Schema. Two of these have already been demonstrated. The 5 cases are the following:
- Completely dynamic
Fieldand recursiveSchemageneration.
- This is demonstrated by
idandowner_idin ourTodoexample. We toldatacamanothing about them, and reasonable Marshmallow Fields with correct defaults were generated for both.
- A customized
Field, with recursiveSchemageneration as needed.
- This is demonstrated by
created_at,priority, anddue_onin ourTodoexample. Much information can be dynamically derived from the annotations in theTodoclass, andatacamawill do so. However, we also wished to add information to the generatedField, and we can trivially do so by supplying keyword arguments normally accepted byFielddirectly to thefieldmethod of ourSchemaGenerator. These keyword arguments can even technically override the keyword arguments forFieldderived byatacamaitself, though that would in most cases be a violation of your contract with the readers of your class definition and is therefore not recommended. TheFieldtype will still be chosen byatacama, so if for some reason you want more control than is being offered byatacama, that takes you to option #3:
- Completely static
Fielddefinition.
- In some cases, you may wish to opt out of
atacamaentirely, starting at a given attribute. In this case, simply provide a MarshmallowField(which is by definition fully defined recursively), andatacamawill respect your intention by placing theFielddirectly into theSchemaat the specified point.
- A statically defined
Schema.
- This is similar to case 2, except that, by providing a Marshmallow
Schemafor a nested attribute, you are confirming that you wantatacamato infer the "outer" information about that attribute, including that is is aNestedField, to perform all the standard unwrapping of Generic and Union types, and to assign the correct default based on yourattrsclass definition. For instance, an attribute that exhibits the definitionOptional[List[YourClass]] = Nonewould allow you to provide a nestedSchemadefining only how to handleYourClass, while still generating the functionality around the default value None and expecting aListofYourClass. - In particular, this would be an expected case when you have a need to generate a
Schemafor direct deserialization of a class that is also used in a parent class andSchema, but where both the parent and child Schema share all the same custom validation, etc. By generating the nestedSchemaand then assigning it at the proper location within the parentSchema, you can easily reuse all of the customization from the child generation.
- A nested
Schemagenerator.
- The most common use case for this will be when it is desirable to customize the generated
Fieldof a nested class. In order to provide an API that continues to privilege keyword arguments as a way of 'pathing' to the various parts of theSchema, we must first capture any keyword arguments specific to theNestedFieldthat will be generated, and from there on we can allow you to provide names pointing to attributes in the nested class. - SchemaGenerators are objects created by users who wish to customize
Schemageneration in particular ways. TheMetaclass within a MarshmallowSchemachanges certain behaviors across all its fields. Whileatacamaprovides several default generators, you may wish to create your own. Regardless, the use case for providing a nestedSchemaGeneratoris more specifically where you wish to make Schemas with nested Schemas that follow different rules than their parents. This is no issue withatacama- if it finds a nestedSchemaGenerator, it will defer nested generation from that point onward to the newSchemaGeneratoras expected. Note that, of course, theFieldbeing generated for that attribute will follow the rules of the current SchemaGenerator, just as would happen with nestedMetaclasses in nested Schemas.
What does this look like in practice? See the annotated example below, which demonstrates all 5 of these
possible interactions between an attrs class and the specific Schema desired by our (potentially
somewhat sugar-high) imaginary user:
@attrs.define
class Mallow:
gooeyness: GooeyEnum
color: str = "light-brown"
@attrs.define
class Milk:
"""Just a percentage"""
fat_pct: float
@attrs.define
class ChocolateIngredients:
cacao_src: str
sugar_grams: float
milk: ty.Optional[Milk] = None
@attrs.define
class Chocolate:
brand: str
cacao_pct: float
ingredients: ty.Optional[ChocolateIngredients] = None
@attrs.define
class GrahamCracker:
brand: str
@attrs.define
class Smore:
graham_cracker: GrahamCracker
marshmallows: ty.List[Mallow]
chocolate: ty.Optional[Chocolate] = None
ChocolateIngredientsFromApiSchema = atacama.neo(
ChocolateIngredients,
# 1. milk and sugar_grams are fully dynamically generated
# 2. a partially-customized Field inheriting its Field type, default, etc from the attrs class definition
cacao_src=atacama.neo.field(
validate=ma.validate.OneOf(["Ivory Coast", "Nigeria", "Ghana", "Cameroon"])
),
)
class MallowSchema(ma.Schema):
"""Why are you doing this by hand?"""
gooeyness = EnumField(GooeyEnum, by_value=True)
color = ma.fields.Raw()
@ma.post_load
def pl(self, data: dict, **_kw):
return Mallow(**data)
SmoreFromApiSchema = atacama.ordered(
Smore,
# 1. graham_cracker, by being omitted, will have a nested schema generated with no customizations
# 5. In order to name/path the fields of nested elements, we plug in a nested
# SchemaGenerator.
#
# Note that keyword arguments applicable to the Field surrounding the nested Schema,
# e.g. load_only, are supplied to the `nested` method, whereas 'paths' to attributes within the nested class
# are supplied to the returned NestedSchemaGenerator function.
#
# Note also that we use a different SchemaGenerator (neo) than the parent (ordered),
# and this is perfectly fine and works as you'd expect.
chocolate=atacama.neo.nested(load_only=True)(
# 2. Both pct_cacao and brand have customizations but are otherwise dynamically generated.
# Note in particular that we do not need to specify the `attrs` class itself, as that
# is known from the type of the `chocolate` attribute.
cacao_pct=atacama.neo.field(validate=ma.validate.Range(min=0, max=100)),
brand=atacama.neo.field(validate=ma.validate.OneOf(["nestle", "hershey"])),
# 4. we reuse the previously defined ChocolateIngredientsFromApi Schema
ingredients=ChocolateIngredientsFromApiSchema,
),
# 3. Here, the list of Mallows is represented by a statically defined NestedField
# containing a statically defined Schema.
# Why? Who knows, but if you want to do it yourself, it's possible!
marshmallows=ma.fields.Nested(MallowSchema(many=True)),
)
How
SchemaGenerators
All interaction with atacama is done via a top-level SchemaGenerator object. It contains some
contextual information which will be reused recursively throughout a generated Schema, including a way
to define the Meta class that is a core part of Marshmallow's configurability.
atacama currently provides two 'default' schema generators, neo and ordered.
-
orderedprovides no configuration other than the common specification that the generated Schema should preserve the order of the attributes as they appear in the class - while this may not matter for most runtime use cases, it is infinitely valuable for debuggability and for further ecosystem usage such as OpenAPI spec generation, which ought to follow the order defined by theattrsclass. -
neostands for "non-empty, ordered", and is the preferred generator for new Schemas, because it builds in a very opinionated but nonetheless generally useful concept of non-emptiness. For attributes of types that properly have lengths, it is in general the case that one and only one of the following should be true:- Your attribute has a default defined, such that it is not required to be present in input data for successful deserialization.
- It is illegal to provide an empty, zero-length value.
The intuition here is that a given attribute type either may have an 'essentially empty' value, or it may not. Examples of things which may never be empty include database ids (empty string would be inappropriate), lists of object 'owners' (an empty list would orphan the object, and therefore must not be permitted), etc. Whereas in many cases, an empty string or list is perfectly normal, and in those cases it is preferred that the class itself define the common-sense default value in order to make things work as expected without boilerplate.
FieldTransforms
The neo SchemaGenerator performs the additional 'non-empty' validation to non-defaulted Fields via
something called a FieldTransform. Any FieldTransform attached to a SchemaGenerator will be run on
every Field attached to the Schema, recursively. This includes statically-provided Fields.
The FieldTransform must accept an actual Field object and returns a (presumably modified) Field
object. This is only run at the time of Schema generation, so if you wish to add validators or perform
customization to the Field that happens at load/dump time, you must compose your logic with the existing
Field. A Schema generator can have multiple FieldTransforms, and they will be run in order on every
Field. A FieldTransform is, in essence, a higher-order function over Field, which are themselves
functions for the incoming attribute data.
The two default generators are provided as a convenience to the user and nothing more - it is perfectly
acceptable and indeed expected that you might define your own 'sorts' of schema generators, with your own
FieldTransforms and basic Meta definitions, depending on your needs.
Leaf type->Field mapping
As a recursive generator, there must be known base cases where a concrete Marshmallow Field can be
automatically generated based on the type of an attribute.
Built-in mappings
The default base cases are defined in atacama/leaf.py. They are relatively comprehensive as far as
Python builtins go, covering various date/time concepts and UUID. We also specifically map
Union[int, float] to the Marshmallow Number Field. Further, we support typing_extensions.Literal
using the built-in Marshmallow validator OneOf, and we have introduced a simple Set Field that
serializes sets to sorted lists.
Custom static mappings
Nevertheless, you may find that you wish to configure a more comprehensive (or different) set of leaf
types for your SchemaGenerator. This may be configured by passing the keyword argument leaf_types to
the SchemaGenerator constructor with a mapping of those leaf types. A dict is sufficient to provide a
static LeafTypeMapping.
Custom dynamic mappings
You may also provide a more dynamic implementation of the Protocol defined in atacama/leaf.py. This
would provide functionality similar to cattrs.register_structure_hook, except that a Marshmallow
Field handles both serialization and deserialization. The included DynamicLeafTypeMapping class can
help accomplish this, though you may provide your own custom implementation of the Protocol as well.
DynamicLeafTypeMapping is recursively nestable, so you may overlay your own handlers on top of our base
handlers via:
from atacama import DynamicLeafTypeMapping, AtacamaBaseLeafTypeMapping
your_mapping = DynamicLeafTypeMapping(AtacamaBaseLeafTypeMapping, [handler_1, handler_2])
Minor Features
require_all
You may specify at generation time that you wish to make all fields (recursively) required at the time
of load. This may be useful on its own, but is also the only way of accurately describing an 'output'
type in a JSON/OpenAPI schema, because required in that context is the only way to indicate that your
attribute will never be undefined. When dumping an attrs class to Python dictionary, all attributes
are always guaranteed to be present in the output, so undefined will never happen even for attributes
with defaults.
Example:
atacama.neo(Foo, config(require_all=True))
Schema name suffix
You may specify a suffix for the name of the Schema generated. This may be useful when you are trying to
generate an output JSON schema and have multiple Schemas derived from the same attrs class.
Example:
atacama.neo(Foo, config(schema_name_suffix='Input')) results in the schema having the name
your_module.FooInput rather than your_module.Foo.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file thds_atacama-1.2.20251124154135-py3-none-any.whl.
File metadata
- Download URL: thds_atacama-1.2.20251124154135-py3-none-any.whl
- Upload date:
- Size: 22.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ec661c628cb4ccd728a10733ff77c23eb73ff2ee5331e95444b1a18aef5b5c82
|
|
| MD5 |
56946c888a0e3f705f85264448c58f91
|
|
| BLAKE2b-256 |
56b321d6340a2e9411b3c830b5329bb532cd3c95f76241625d76f3d0e28f1db4
|