A plug-and-play JIT implementation for Marshmallow to speed up data serialization and deserialization
Project description
:fire: Deep-Fried Marshmallow – Makes Marshmallow a Chicken Nugget
I need to be honest with you — I have no idea how to compare the speed of a marshmallow and the speed of a chicken nugget. I really liked that headline, though, so let's just assume that a nugget is indeed faster than a marshmallow. So is this project, Deep-Fried Marshmallow, faster than vanilla Marshmallow. Or, to be precise, it makes Marshmallow faster.
Deep-Fried Marshmallow implements a JIT for Marshmallow that speeds up dumping objects 3-5x (depending on your schema). Deep-Fried Marshmallow allows you to have the great API that Marshmallow provides without having to sacrifice performance.
Benchmark Result:
Original Dump Time: 220.50 usec/dump
Original Load Time: 536.51 usec/load
Optimized Dump Time: 58.67 usec/dump
Optimized Load Time: 118.44 usec/load
Speed up for dump: 3.76x
Speed up for load: 4.53x
Deep-Fried Marshmallow is a fork of the great Toasted Marshmallow project that, sadly, has been abandoned for years. Deep-Fried Marshmallow introduces many changes that make it compatible with all latest versions of Marshmallow (3.13+). It also changes the way the library interacts with Marshmallow, which means that code of Marshmallow doesn't need to be forked and modified for the JIT magic to work. That's a whole new level of magic!
Installing Deep-Fried Marshmallow
pip install DeepFriedMarshmallow
# or, if your project uses Poetry:
poetry install DeepFriedMarshmallow
If your project doesn't have vanilla Marshmallow specified in requirements, the latest version of it will be installed alongside Deep-Fried Marshmallow. You are free to pin any version of it that you need, as long as it's newer than v3.13.
Enabling Deep-Fried Marshmallow
Enabling Deep-Fried Marshmallow on an existing schema is just one change of code. Change your schemas to inherit from the JitSchema
class in the deepfriedmarshmallow
package instead of Schema
from marshmallow
.
For example, this block of code:
from marshmallow import Schema, fields
class ArtistSchema(Schema):
name = fields.Str()
class AlbumSchema(Schema):
title = fields.Str()
release_date = fields.Date()
artist = fields.Nested(ArtistSchema())
schema = AlbumSchema()
Should become this:
from marshmallow import fields
from deepfriedmarshmallow import JitSchema
class ArtistSchema(JitSchema):
name = fields.Str()
class AlbumSchema(JitSchema):
title = fields.Str()
release_date = fields.Date()
artist = fields.Nested(ArtistSchema())
schema = AlbumSchema()
And that's it!
Auto-patching all Marshmallow schemas
If you want to automatically patch all Marshmallow schemas in your project,
Deep-Fried Marshmallow provides a helper function for that. Just call
deepfriedmarshmallow.deep_fry_marshmallow()
before you start using
Marshmallow schemas, and you're all set. The upmost __init__.py
file of
your project is a good place to do that.
# your_package/__init__.py
from deepfriedmarshmallow import deep_fry_marshmallow
deep_fry_marshmallow()
All imports of marshmallow.Schema
will be automatically replaced with
deepfriedmarshmallow.Schema
with no other changes to your code. Isn't that
sweet extra crispy?
Custom Schema classes
Deep-Fried Marshmallow also provides a mixin class that you can use to create
or extend custom Schema classes. To use it, just inherit from JitSchemaMixin
.
Let's take a look at the following example:
from marshmallow import fields
class ClockSchema(MyAwesomeBaseSchema):
time = fields.DateTime(data_key="Time")
If you want to make this schema JIT-compatible, and don't want to modify the
MyAwesomeBaseSchema
class to inherit from deepfriedmarshmallow.Schema
,
you can do the following:
from marshmallow import fields
from deepfriedmarshmallow import JitSchemaMixin
class ClockSchema(JitSchemaMixin, MyAwesomeBaseSchema):
time = fields.DateTime(data_key="Time")
Patcher functions
If all of the above wasn't enough, Deep-Fried Marshmallow also provides two more ways to patch Marshmallow schemas. Both of them are functions that you can call to patch either a Schema class, or a Schema instance. Let's take a look at the following example:
from marshmallow import Schema, fields
from deepfriedmarshmallow import deep_fry_schema
class ArtistSchema(Schema):
name = fields.Str()
deep_fry_schema(ArtistSchema)
schema = ArtistSchema()
The deep_fry_schema
function will patch the AlbumSchema
class, and all
instances of it will be JIT-compatible. If you want to patch a specific
instance of a schema, you can use the deep_fry_schema_object
function:
from marshmallow import Schema, fields
from deepfriedmarshmallow import deep_fry_schema_object
class ArtistSchema(Schema):
name = fields.Str()
schema = ArtistSchema()
deep_fry_schema_object(schema)
This function will patch the schema
object, and all dumps and loads will
be JIT-compatible. This function is useful if you want to patch a schema
that you don't have control over, for example, a schema that is provided
by a third-party library.
How it works
Deep-Fried Marshmallow works by generating code at runtime to optimize dumping objects without going through layers and layers of reflection. The generated code optimistically assumes the objects being passed in are schematically valid, falling back to the original Marshmallow code on failure.
For example, taking AlbumSchema
from above, Deep-Fried Marshmallow will
generate the following methods:
def InstanceSerializer(obj):
res = {}
value = obj.title; value = value() if callable(value) else value; value = str(value) if value is not None else None; res["title"] = value
value = obj.release_date; value = value() if callable(value) else value; res["release_date"] = _field_release_date__serialize(value, "release_date", obj)
value = obj.artist; value = value() if callable(value) else value; res["artist"] = _field_artist__serialize(value, "artist", obj)
return res
def DictSerializer(obj):
res = {}
if "title" in obj:
value = obj["title"]; value = value() if callable(value) else value; value = str(value) if value is not None else None; res["title"] = value
if "release_date" in obj:
value = obj["release_date"]; value = value() if callable(value) else value; res["release_date"] = _field_release_date__serialize(value, "release_date", obj)
if "artist" in obj:
value = obj["artist"]; value = value() if callable(value) else value; res["artist"] = _field_artist__serialize(value, "artist", obj)
return res
def HybridSerializer(obj):
res = {}
try:
value = obj["title"]
except (KeyError, AttributeError, IndexError, TypeError):
value = obj.title
value = value; value = value() if callable(value) else value; value = str(value) if value is not None else None; res["title"] = value
try:
value = obj["release_date"]
except (KeyError, AttributeError, IndexError, TypeError):
value = obj.release_date
value = value; value = value() if callable(value) else value; res["release_date"] = _field_release_date__serialize(value, "release_date", obj)
try:
value = obj["artist"]
except (KeyError, AttributeError, IndexError, TypeError):
value = obj.artist
value = value; value = value() if callable(value) else value; res["artist"] = _field_artist__serialize(value, "artist", obj)
return res
Deep-Fried Marshmallow will invoke the proper serializer based upon the input.
Since Deep-Fried Marshmallow generates code at runtime, it's critical you re-use Schema objects. If you're creating a new Schema object every time you serialize or deserialize an object, you're likely to experience much worse performance.
Special thanks to
- @rowillia/@lyft — for creating Toasted Marshmallow
- @taion — for a PoC of injecting the JIT compiler by replacing the marshaller
- @Kalepa — for needing improved Marshmallow performance so that I could actually work on this project 😅
License
See LICENSE for details.
Contributing
Contributions, issues and feature requests are welcome!
Feel free to check existing issues before reporting a new one.
Show your support
Give this repository a ⭐️ if this project helped you!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for deepfriedmarshmallow-1.0.0b2.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | bc05020e47df6ea1c05ff217c1a6f0e325b29e0e66bf570b7277aeb13de169a4 |
|
MD5 | c66f874c321e015b7a1ed99a24530e2f |
|
BLAKE2b-256 | 8412b765c210734062b70abbb5937950ac6b9918f258d9eefed9d7afed910fd8 |
Hashes for deepfriedmarshmallow-1.0.0b2-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ddb6d4deac297a21c0e2d8e29bc318bebbcfe45c9b78e3748f3250906ea79b88 |
|
MD5 | 118d75807192419a4cbbec6ff9ecb2d5 |
|
BLAKE2b-256 | 11941f0b76d23fef0baf5be1ff6030df4ef4f4709c80219c3ec539b4e0cf48b1 |