Skip to main content

Marshmallow Schema generator for pandas and numpy

Project description

marshmallow-dataframe

Build Status PyPI License

marshmallow-dataframe is a library that helps you generate marshmallow Schemas for Pandas DataFrames.

Usage

Let's start by creating an example dataframe for which we want to create a Schema. This dataframe has four columns: two of them are of string type, one is a float, and the last one is an integer.

import pandas as pd
import numpy as np
from marshmallow_dataframe import SplitDataFrameSchema

animal_df = pd.DataFrame(
    [
        ("falcon", "bird", 389.0, 2),
        ("parrot", "bird", 24.0, 2),
        ("lion", "mammal", 80.5, 4),
        ("monkey", "mammal", np.nan, 4),
    ],
    columns=["name", "class", "max_speed", "num_legs"],
)

You can then create a marshmallow schema that will validate and load dataframes that follow the same structure as the one above and that have been serialized with DataFrame.to_json with the orient=split format:

class AnimalSchema(SplitDataFrameSchema):
    """Automatically generated schema for animal dataframe"""

    dtypes = animal_df.dtypes

When passing a valid payload for a new animal, this schema will validate it and build a dataframe:

animal_schema = AnimalSchema()

new_animal = {
    "data": [("leopard", "mammal", 58.0, 4), ("ant", "insect", 0.288, 6)],
    "columns": ["name", "class", "max_speed", "num_legs"],
    "index": [0, 1],
}

new_animal_df = animal_schema.load(new_animal)

print(type(new_animal_df))
# <class 'pandas.core.frame.DataFrame'>
print(new_animal_df)
#       name   class  max_speed  num_legs
# 0  leopard  mammal     58.000         4
# 1      ant  insect      0.288         6

However, if we pass a payload that doesn't conform to the schema, it will raise a marshmallow ValidationError exception with informative message about errors:

invalid_animal = {
    "data": [("leopard", "mammal", 58.0, "four")],  # num_legs is not an int
    "columns": ["name", "class", "num_legs"],  # missing  max_speed column
    "index": [0],
}

animal_schema.load(invalid_animal)

# Raises:
# marshmallow.exceptions.ValidationError: {
#     'columns': ["Must be equal to ['name', 'class', 'max_speed', 'num_legs']."],
#     'data': {0: {3: ['Not a valid integer.']}}
# }

marshmallow_dataframe can also generate Schemas for the orient=records format by following the above steps but using marshmallow_dataframe.RecordsDataFrameSchema as the superclass for AnimalSchema.

Installation

marshmallow-dataframe requires Python >= 3.6 and marshmallow >= 3.0. You can install it with pip:

pip install marshmallow-dataframe

Contributing

Contributions are welcome!

You can report a problem or feature request in the issue tracker. If you feel that you can fix it or implement it, please submit a pull request referencing the issues it solves.

Unit tests written using the pytest framework are in the tests directory, and are run using tox on Python 3.6 and 3.7. You can run the tests by installing tox:

pip install tox

and running the linters and tests for all Python versions by running tox, or for a specific Python version by running:

tox -e py36

We format the code with black, and you can format your checkout of the code before commiting it by running:

tox -e black -- .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

marshmallow-dataframe-0.0.3.tar.gz (11.6 kB view hashes)

Uploaded Source

Built Distribution

marshmallow_dataframe-0.0.3-py3-none-any.whl (8.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page