Skip to main content

Marshmallow Schema generator for pandas dataframes

Project description

marshmallow-dataframe

Build Status PyPI License

marshmallow-dataframe is a library that helps you generate marshmallow Schemas for Pandas DataFrames.

Usage

Let's start by creating an example dataframe for which we want to create a Schema. This dataframe has four columns: two of them are of string type, one is a float, and the last one is an integer.

import pandas as pd
import numpy as np
from marshmallow_dataframe import SplitDataFrameSchema

animal_df = pd.DataFrame(
    [
        ("falcon", "bird", 389.0, 2),
        ("parrot", "bird", 24.0, 2),
        ("lion", "mammal", 80.5, 4),
        ("monkey", "mammal", np.nan, 4),
    ],
    columns=["name", "class", "max_speed", "num_legs"],
)

You can then create a marshmallow schema that will validate and load dataframes that follow the same structure as the one above and that have been serialized with DataFrame.to_json with the orient=split format. The dtypes attribute of the Meta class is required, and other marshmallow Schema options can also be passed as attributes of Meta:

class AnimalSchema(SplitDataFrameSchema):
    """Automatically generated schema for animal dataframe"""

    class Meta:
        dtypes = animal_df.dtypes

When passing a valid payload for a new animal, this schema will validate it and build a dataframe:

animal_schema = AnimalSchema()

new_animal = {
    "data": [("leopard", "mammal", 58.0, 4), ("ant", "insect", 0.288, 6)],
    "columns": ["name", "class", "max_speed", "num_legs"],
    "index": [0, 1],
}

new_animal_df = animal_schema.load(new_animal)

print(type(new_animal_df))
# <class 'pandas.core.frame.DataFrame'>
print(new_animal_df)
#       name   class  max_speed  num_legs
# 0  leopard  mammal     58.000         4
# 1      ant  insect      0.288         6

However, if we pass a payload that doesn't conform to the schema, it will raise a marshmallow ValidationError exception with informative message about errors:

invalid_animal = {
    "data": [("leopard", "mammal", 58.0, "four")],  # num_legs is not an int
    "columns": ["name", "class", "num_legs"],  # missing  max_speed column
    "index": [0],
}

animal_schema.load(invalid_animal)

# Raises:
# marshmallow.exceptions.ValidationError: {
#     'columns': ["Must be equal to ['name', 'class', 'max_speed', 'num_legs']."],
#     'data': {0: {3: ['Not a valid integer.']}}
# }

marshmallow_dataframe can also generate Schemas for the orient=records format by following the above steps but using marshmallow_dataframe.RecordsDataFrameSchema as the superclass for AnimalSchema.

Installation

marshmallow-dataframe requires Python >= 3.6 and marshmallow >= 3.0. You can install it with pip:

pip install marshmallow-dataframe

Contributing

Contributions are welcome!

You can report a problem or feature request in the issue tracker. If you feel that you can fix it or implement it, please submit a pull request referencing the issues it solves.

Unit tests written using the pytest framework are in the tests directory, and are run using tox on Python 3.6 and 3.7. You can run the tests by installing tox:

pip install tox

and running the linters and tests for all Python versions by running tox, or for a specific Python version by running:

tox -e py36

We format the code with black, and you can format your checkout of the code before commiting it by running:

tox -e black -- .

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

marshmallow-dataframe-0.1.4.tar.gz (14.7 kB view details)

Uploaded Source

Built Distribution

marshmallow_dataframe-0.1.4-py3-none-any.whl (10.7 kB view details)

Uploaded Python 3

File details

Details for the file marshmallow-dataframe-0.1.4.tar.gz.

File metadata

  • Download URL: marshmallow-dataframe-0.1.4.tar.gz
  • Upload date:
  • Size: 14.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.0

File hashes

Hashes for marshmallow-dataframe-0.1.4.tar.gz
Algorithm Hash digest
SHA256 a18f47b2646d16236d89e07e295eecd37d7d52d6dd5b25ee00a80d5d6e04bb58
MD5 689077a8dbf54528b87cd0703efc3cd5
BLAKE2b-256 e77466808782f001791444c9969fea42da80473192fcdacbc049c7fc890b3b12

See more details on using hashes here.

File details

Details for the file marshmallow_dataframe-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: marshmallow_dataframe-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 10.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.0

File hashes

Hashes for marshmallow_dataframe-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 3aa5180eeb6867c0b608b425e5776e1a969cb3f9dfbac751fa405fc2766552bd
MD5 93aa094c3b61a3fc8f405ffd0a9d70ef
BLAKE2b-256 97305a4e81e718f2e371dc5c231728ccc2ebeab24a3c5b0a3d15dbe794f18b58

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page