A fast and flexible reimplementation of data classes
Project description
dataclassy
Your data classes just got an upgrade! dataclassy is a reimplementation of data classes in Python — an alternative to the built-in dataclasses module that avoids many of its common pitfalls. dataclassy is designed to be more flexible, less verbose, and more powerful than dataclasses, while retaining a familiar interface.
What are data classes?
Simply put, data classes are classes optimised for storing data. In this sense they are similar to record or struct types in other languages. In Python, data classes take the form of a decorator which, when applied to a class, automatically generates methods to set the class's fields from arguments to its constructor, represent it as a string, and more.
Why use dataclassy?
Data classes from dataclassy offer the following advantages over those from dataclasses:
- Cleaner code: no messy
InitVar
,ClassVar
,field
or__post_init__
- Beautiful post-init processing: just define an
__init__
as you would normally - Friendly inheritance:
- No need to apply a decorator to each and every subclass - just once and all following classes will also be data classes
- Complete freedom in field ordering - no headaches if a field with a default value follows a field without one
- Optional generation of:
__slots__
, significantly improving memory efficiency and lookup performance**kwargs
, simplifying data class instantiation from dictionaries- An
__iter__
method, enabling data class destructuring
- Internal fields (marked with
_
or__
) are excluded from__repr__
by default
Worried about performance? Benchmarks show:
- Performance is at least as good as dataclasses when
slots=False
- You get an instant ~25% performance boost to initialisation when
slots=True
In addition, dataclassy:
- Is tiny (around 150 lines of code)
- Has no dependencies
- Supports Python 3.6 and up
- Has 100% test coverage
Usage
Installation
Install the latest stable version from PyPI with pip:
pip install dataclassy
Or install the latest development version straight from this repository:
pip install https://github.com/biqqles/dataclassy/archive/master.zip -U
Migration
By and large, dataclassy is a drop-in replacement for dataclasses. If you simply use the decorator and other functions, it is possible to instantly migrate from dataclasses to dataclassy by simply changing
from dataclasses import *
to
from dataclassy import *
This being said, there are differences. dataclassy does not try to be a "clone" of dataclasses, but rather an alternative, feature-complete implementation of the concept with its own design philosophy, yet one that remains highly familiar to those acquainted with the module in the standard library. Minimalism takes precedence over compatibility.
Similarities
dataclassy's dataclass
decorator takes all of the same arguments as dataclasses', plus its own, and should therefore be a drop-in replacement.
dataclassy also implements all dataclasses' functions: is_dataclass
, fields
, replace
, make_dataclass
, asdict
and astuple
(the last two are aliased from as_dict
and as_tuple
respectively), and they should work as you expect.
Differences
dataclassy has several important differences from dataclasses, mainly reflective of its minimalistic style and implementation. These differences are enumerated below and fully expanded on in the next section.
dataclasses | dataclassy | |
---|---|---|
post-initialisation processing | __post_init__ method |
__init__ method |
init-only variables | fields with type InitVar |
arguments to __init__ |
class variables | fields with type ClassVar |
fields without type annotation |
mutable defaults | a: Dict = field(default_factory=dict) |
a: Dict = {} |
field excluded from repr |
b: int = field(repr=False) |
Internal type wrapper or _name |
"late init" field | c: int = field(init=False ) |
c: int = None |
There are a couple of minor differences, too:
fields
returnsDict[str, Type]
instead ofDict[Field, Type]
and has an additional parameter which filters internal fields- Attempting to modify a frozen instance raises
AttributeError
with an explanation rather thanFrozenInstanceError
Finally, there are some quality of life improvements that, while not being directly implicated in migration, will allow you to make your code cleaner:
@dataclass
does not need to be applied to every subclass - its behaviour and options are inherited- Unlike dataclasses, fields with defaults do not need to follow those without them. This is particularly useful when working with subclasses, which is almost impossible with dataclasses
- dataclassy adds a
DataClass
type annotation to represent variables that should be generic data class instances - dataclassy has the
is_dataclass_instance
suggested as a recipe for dataclasses built-in - The generated comparison methods (when
order=True
) are compatible with supertypes and subtypes of the class. This means that heterogeneous collections of instances with the same superclass can be sorted
It is also worth noting that internally, dataclasses and dataclassy work in different ways. You can think of dataclassy as turning your class into a different type of thing (indeed, it uses a metaclass) and dataclasses as adding things to your class (it does not).
Examples
The basics
To define a data class, simply apply the @dataclass
decorator to a class definition:
from dataclassy import dataclass
from typing import List
@dataclass # with default parameters
class Pet:
name: str
age: int
species: str
foods: List[str] = []
fluffy: bool
Without arguments to the decorator, the resulting class will behave very similarly to its equivalent from the built-in module. However, dataclassy's decorator has some additional options over dataclasses', and it is also inherited so that subclasses of data classes are automatically data classes too.
The decorator generates various methods for the class. Which ones exactly depend on the options to the decorator. For example, @dataclass(repr=False)
will prevent a __repr__
method from being generated. @dataclass
is equivalent to using the decorator with default parameters (i.e. @dataclass
and @dataclass()
are equivalent). Options to the decorator are detailed fully in the next section.
You can exclude a class attribute from dataclassy's mechanisms entirely by simply defining it without a type annotation. This can be used for class variables and constants.
Inheritance
Unlike dataclasses, dataclassy's decorator only needs to be applied once, and all subclasses will become data classes with the same options as the parent class. The decorator can still be reapplied to subclasses in order to apply new parameters.
To change the type, or to add or change the default value of a field in a subclass, simply redeclare it in the subclass.
Post-initialisation processing
If an initialiser is requested (init=True
), dataclassy automatically sets the attributes of the class upon initialisation. You can define code that should run after this happens - this is called post-init processing.
You can call the method that contains this logic one of two options:
__init__
- this originated back when dataclassy used__new__
for initialisation. It is recommended if you are comfortable with "magic" - see the note about dataclassy turning a class into a different thing. From this point of view, a data class (in contrast to a regular class) happens to perform special logic before__init__
is called. You must callsuper().__post_init__
instead ofsuper().__init__
to prevent ambiguity.__post_init__
- compatible with dataclasses. Will not be called ifinit=False
(like dataclasses) or if the class has no fields.
This logic can include, for example, calculating new fields based on the values of others. This is demonstrated in the following example:
@dataclass
class CustomInit:
a: int
b: int
def __init__(self):
self.c = self.a / self.b
In this example, when the class is instantiated with CustomInit(1, 2)
, the field c
is calculated as 0.5
.
Like with any class, your __init__
can also take parameters which exist only in the context of __init__
. These can be used for arguments to the class that you do not want to store as fields. A parameter cannot have the name of a class field; this again is to prevent ambiguity.
Default values
Default values for fields work exactly as default arguments to functions (and in fact this is how they are implemented), with one difference: for copyable defaults, a copy is automatically created for each class instance. This means that a new copy of the list
field foods
in Pet
above will be created each time it is instantiated, so that appending to that attribute in one instance will not affect other instances. A "copyable default" is defined as any object implementing a copy
method, which includes all the built-in mutable collections (including defaultdict
).
If you want to create new instances of objects which do not have a copy method, do so in __init__
:
@dataclass
class CustomInit2:
m: MyClass = None
def __init__(self):
self.m = MyClass()
API
Decorator
@dataclass(init=True, repr=True, eq=True, iter=False, frozen=False, kwargs=False, slots=False, hide_internals=True, meta=DataClassMeta)
The decorator used to signify that a class definition should become a data class. The decorator returns a new data class with generated methods as detailed below. If the class already defines a particular method, it will not be replaced with a generated one.
Without arguments, its behaviour is, superficially, almost identical to its equivalent in the built-in module. However, dataclassy's decorator only needs to be applied once, and all subclasses will become data classes with the same parameters. The decorator can still be reapplied to subclasses in order to change parameters.
A data class' fields are defined using Python's type annotations syntax. To change the type or default value of a field in a subclass, simply redeclare it.
This decorator takes advantage of two equally important features added in Python 3.6: variable annotations and dictionaries being ordered. (The latter is technically an implementation detail of Python 3.6, only becoming standardised in Python 3.7, but is the case for all current implementations of Python 3.6, i.e. CPython and PyPy.)
Decorator options
The term "field", as used in this section, refers to a class-level variable with a type annotation. For more information, see the documentation for
fields()
below.
init
If true (the default), generate an __init__
method that has as parameters all fields up its inheritance chain. These are ordered in definition order, with all fields with default values placed towards the end, following all fields without them. The method initialises the class by applying these parameters to the class as attributes.
This ordering is an important distinction from dataclasses, where all fields are simply ordered in definition order, and is what allows dataclassy's data classes to be far more flexible in terms of inheritance.
You can verify the signature of the generated initialiser for any class using signature
from the inspect
module. For example, print(inspect.signature(Pet))
will output (name: str, age: int, species: str, foods: List[str] = [])
.
A shallow copy will be created for mutable arguments (defined as those defining a copy
method). This means that default field values that are mutable (e.g. a list) will not be mutated between instances.
repr
If true (the default), generate a __repr__
method that displays all fields (or if hide_internals
is true, all fields excluding internal ones) of the data class instance and their values.
eq
If true (the default), generate an __eq__
method that compares this data class to another of the same type as if they were tuples created by as_tuple
.
frozen
If true, instances are nominally immutable: fields cannot be overwritten or deleted after initialisation in __init__
. Attempting to do so will raise an AttributeError
. Warning: incurs a significant initialisation performance penalty.
unsafe_hash
If true, force the generation of a __hash__
method that attempts to hash the class as if it were a tuple of its hashable fields. If unsafe_hash
is false, __hash__
will only be generated if eq
and frozen
are both true.
order
If true, a __lt__
method is generated, making the class orderable. If eq
is also true, all other comparison methods are also generated. These methods compare this data class to another of the same type (or a subclass) as if they were tuples created by as_tuple
. The normal rules of lexicographical comparison apply.
iter
If true, generate an __iter__
method that returns the values of the class's fields, in order of definition. This can be used to destructure a data class instance, as with a Scala case class
or a Python namedtuple
.
kwargs
If true, add **kwargs
to the end of the parameter list for __init__
. This simplifies data class instantiation from dictionaries that may have keys in addition to the fields of the data class (i.e. SomeDataClass(**some_dict)
).
slots
If true, generate a __slots__
attribute for the class. This reduces the memory footprint of instances and attribute lookup overhead. However, __slots__
come with a few restrictions (for example, multiple inheritance becomes tricky) that you should be aware of.
hide_internals
If true (the default), internal fields are not included in the generated __repr__
.
meta
Set this parameter to use a metaclass other than dataclassy's own. This metaclass must subclass dataclassy.dataclass.DataClassMeta
.
DataClassMeta
is best considered less stable than the parts of the library available in the root namespace. Only use a custom metaclass if absolutely necessary.
Functions
is_dataclass(obj)
Returns True if obj
is a data class as implemented in this module.
is_dataclass_instance(obj)
Returns True if obj
is an instance of a data class as implemented in this module.
fields(dataclass, internals=False)
Return a dict of dataclass
's fields and their types. internals
selects whether to include internal fields. dataclass
can be either a data class or an instance of a data class.
A field is defined as a class-level variable with a type annotation. Variables defined in the class without type annotations are completely excluded from dataclassy's consideration. Class variables and constants can therefore be indicated by the absence of type annotations.
values(dataclass, internals=False)
Return a dict of dataclass
's fields and their values. internals
selects whether to include internal fields. dataclass
must be an instance of a data class.
as_dict(dataclass dict_factory=dict)
Recursively create a dict of a data class instance's fields and their values.
This function is recursively called on data classes, named tuples and iterables.
as_tuple(dataclass)
Recursively create a tuple of the values of a data class instance's fields, in definition order.
This function is recursively called on data classes, named tuples and iterables.
make_dataclass(name, fields, defaults, bases=(), **options)
Dynamically create a data class with name name
, fields fields
, default field values defaults
and inheriting from bases
.
replace(dataclass, **changes)
Return a new copy of dataclass
with field values replaced as specified in changes
.
Type hints
Internal
The Internal
type wrapper marks a field as being "internal" to the data class. Fields which begin with the "internal use" idiomatic indicator _
or the private field interpreter indicator __
are automatically treated as internal fields. The Internal
type wrapper therefore serves as an alternative method of indicating that a field is internal for situations where you are unable to name your fields in this way.
DataClass
Use this type hint to indicate that a variable, parameter or field should be a generic data class instance. For example, dataclassy uses these in the signatures of as_dict
, as_tuple
and values
to show that these functions should be called on data class instances.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file dataclassy-0.7.2.tar.gz
.
File metadata
- Download URL: dataclassy-0.7.2.tar.gz
- Upload date:
- Size: 18.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e427597c95521697c22e86f89cb8fe9d7293d9aafea863efaf419fa39d08455 |
|
MD5 | 22fe02b13c77ae183a8635775413b6a4 |
|
BLAKE2b-256 | e567db9520d6f09eb4005a7c027ef9d3c87d655934cc655cbeb96cbf5a0bfa65 |
File details
Details for the file dataclassy-0.7.2-py3-none-any.whl
.
File metadata
- Download URL: dataclassy-0.7.2-py3-none-any.whl
- Upload date:
- Size: 19.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/47.1.0 requests-toolbelt/0.9.1 tqdm/4.56.0 CPython/3.7.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 67b5b69e019d7ff14cfc0604ca170e9301eab6844bde110edebb20df638ab690 |
|
MD5 | 1ecc0e54a3b49fac04abdf9a3332a2dd |
|
BLAKE2b-256 | 656ddc7f1537888fe6ef51cd0f83722d2c1610cf81d3ac9d5d3d948566f82913 |