Structure unstructured data for the purpose of static type checking
Structure unstructured data for the purpose of static type
checking. An opinionated wrapper for
In many web services it is common to consume or generate JSON or some
JSON-like representation of data. JSON translates quite nicely to core
Python objects such as dicts and lists. However, if your data is
structured, it is nice to be able to work with it in a structured
manner, i.e. with Python objects. Python objects give you better code
readability, and in more recent versions of Python they are also
capable of being statically type-checked with a tool like
attrs is an excellent library for defining boilerplate-free Python
classes that are easy to work with and that make static type-checking
mypy a breeze. You define your attributes and their types with
a very clean syntax,
attrs gives you constructors and dunder
mypy brings the static type-checking.
cattrs into the mix, you can have pleasant and simple
conversions to and from unstructured data with extremely low
boilerplate as well.
typecats, and its core decorator
Cat, is a thin opionated layer on
top of these two runtime libraries (
cattrs) and the
mypy. It defines an
attrs class with a few additional
features. The 3 core features are:
Static class function
strucand object method
unstrucadded to every class type defined as a Cat, which pass directly through to their underlying
@Cat class TestCat: name: str age: int TestCat.struc(dict(name='Tom', age=9)) == TestCat(name='Tom', age=9) TestCat.struc(dict(name='Tom', age=9)).unstruc() == dict(name='Tom', age=9)
Make your code easier to read, create a common pattern for defining, structuring, and unstructuring pure data objects, and require fewer imports - just import your defined type and go! Abbreviated forms of the verbs
unstructurewere chosen to underscore the difference between the built-in
cattrsverbs and to reduce code clutter slightly for what is intended to be a common and idiomatic operation.
Note that a
mypyplugin is provided to inform the type checker that these dynamically-added methods are real and provide the intended result types. Add to your
plugins = typecats.cats_mypy_plugin
unstrucfirst-class functions are provided if you strongly prefer a functional approach.
strucreverses the order of the
cattrsfunction signature to make it suitable for the common case of partial application:
TestCat_struc = functools.partial(struc, TestCat) TestCat_struc(dict(name='Tom', age=2))
Non-empty validators defined for all attributes with no default provided.
@Cat class TestCat: name: str age: int neutered: bool = True owner: Optional[Owner] = None works = TestCat.struc(dict(name='Tom', age=0)) assert works.neutered == True try: TestCat.struc(dict(name='', age=0)) except ValueError as ve: print(ve) # Attribute "name" on class <class 'TestCat'> with type <class 'str'> cannot have empty value ''!
For many types of data, a default value such as an empty string, empty list/set, or missing complex type is perfectly valid, and
typecatstakes the approach that such attributes should have a defined default value in order to simplify the use of those objects. This has been found to be particularly useful in the context of structuring data from APIs, where the API contract may not require all keys to be provided for a given type, and where new attributes/keys may be defined later on that old clients would not know about (backwards compatibility). In these cases, not defining a default value would complicate the code, by forcing developers to remember which keys needed to be added to a raw data
dictbefore structuring it.
On the other hand, there are some facets of the data that are absolutely required. A common example would be a database ID - without a defined ID, the object/data is meaningless.
typecatsallows you to enforce the most basic level of compliance by not defining defaults, which forces clients to provide not simply a value of the proper type, but a non-empty value of that type - for instance, the empty string would never be a valid database ID.
Objects may subclass
dictin order to transparently retain untyped key/value pairs for a roundtrip structure-unstructure. These are called
Wildcats, since they allow a significant amount of extra functionality at the cost of not fully enforcing type-checking.
@Cat class TestWildcat(dict): name: str age: int cat_from_db = dict(name='Tom', age=8, gps_tracker=True) wc = TestWildcat.struc(cat_from_db) assert wc.name == Tom assert wc.age == 8 assert wc['gps_tracker'] == True assert wc.unstruc() == cat_from_db # `gps_tracker` survived the roundtrip
attrsclass with a defined set of attributes that will be structured from raw data, and as of
cattrs1.0.0rc0, unexpected keys are silently dropped in order to prevent users from needing to sanitize their data before structuring (as opposed to being a runtime error). This behavior means that a structured object is not suitable for being passed between different parts of a program if there may be other parts to the data that the structuring class does not know about. This is an unfortunately common requirement, for instance when operating a roundtrip read/write transaction to/from a database. Since the alternative of passing around the raw data and performing many separate structuring/unstructuring roundtrips can be prohibitively expensive, and additionally it is arguably (e.g., the design philosophy behind Clojure's Maps, or simply duck/structural typing in general) better software design in many cases to allow code to operate on a limited subset of attributes without preventing objects with a superset of their functionality to be used,
Wildcatfunctionality to mimic these more expressive and flexible type/data systems.
Note that, as with the rest of
typecats, this is a local optimum designed for specific though arguably common usecases. You don't need to use the Wildcat functionality to take advantage of features 1 and 2, and since it is presumably quite rare to explicity subclass
dictfor normal Python classes, it seems unlikely that this implementation choice to require inheritance would prevent most practical use cases of
Cateven if the functionality of preserving unknown data was specifically not desirable for a given application.
A further design note on Wildcats: A non-inheriting implementation was considered and rejected (so far) for two reasons: first, that this would require major additional work in order to support
mypyunderstanding that dict-like access was legal for these objects; and second, that not inheriting
__setitem__would be even more likely to conflict with existing class hierarchies, since any object that already inherited from
dictwould appear to 'work' as a Wildcat but its underlying
dictwould be overlaid and inaccessible as a Wildcat.
Notes on intent, compatibility, and dependencies
Cat are explictly intended to solve a few specific
but common uses, and though they do not intentionally override or
cattrs features, any complex use of those
underlying features may or may not be fully operational. If you want
to write complex validator or constructor/builder logic of your own,
this library may not be for you.
That said, it is common in our experience to register a number of
specific structure and unstructure hooks with
cattrs to make certain
specific scenarios work ideally with your data, and
provides convenient wrappers to allow adding your hooks to its
Converter instance. By defining its own converter
typecats does not interfere in any way with an existing
application's usage of
cattrs, and may be used in
addition to, rather than as a replacement for, those libraries.
register_unstruc_hook to register on
the built-in converter instance.
typecats uses newer-style static typing within its own codebase, and
is therefore currently only compatible with Python 3.6 and up.
As core parts of the implementation, both
typecats has been used in production in the Vision system at XOi
Technologies for over 6 months, with no significant changes or bugs
found in the past 3 months.