Marshmallow Schema generator for pandas dataframes
Project description
marshmallow-dataframe
marshmallow-dataframe
is a library that helps you generate
marshmallow Schemas for Pandas
DataFrames.
Usage
Let's start by creating an example dataframe for which we want to create a
Schema
. This dataframe has four columns: two of them are of string type, one
is a float, and the last one is an integer.
import pandas as pd
import numpy as np
from marshmallow_dataframe import SplitDataFrameSchema
animal_df = pd.DataFrame(
[
("falcon", "bird", 389.0, 2),
("parrot", "bird", 24.0, 2),
("lion", "mammal", 80.5, 4),
("monkey", "mammal", np.nan, 4),
],
columns=["name", "class", "max_speed", "num_legs"],
)
You can then create a marshmallow schema that will validate and load dataframes
that follow the same structure as the one above and that have been serialized
with DataFrame.to_json
with the orient=split
format.
The dtypes
attribute of the Meta
class is required, and other marshmallow
Schema
options
can also be passed as attributes of Meta
:
class AnimalSchema(SplitDataFrameSchema):
"""Automatically generated schema for animal dataframe"""
class Meta:
dtypes = animal_df.dtypes
When passing a valid payload for a new animal, this schema will validate it and build a dataframe:
animal_schema = AnimalSchema()
new_animal = {
"data": [("leopard", "mammal", 58.0, 4), ("ant", "insect", 0.288, 6)],
"columns": ["name", "class", "max_speed", "num_legs"],
"index": [0, 1],
}
new_animal_df = animal_schema.load(new_animal)
print(type(new_animal_df))
# <class 'pandas.core.frame.DataFrame'>
print(new_animal_df)
# name class max_speed num_legs
# 0 leopard mammal 58.000 4
# 1 ant insect 0.288 6
However, if we pass a payload that doesn't conform to the schema, it will raise
a marshmallow ValidationError
exception with informative message about errors:
invalid_animal = {
"data": [("leopard", "mammal", 58.0, "four")], # num_legs is not an int
"columns": ["name", "class", "num_legs"], # missing max_speed column
"index": [0],
}
animal_schema.load(invalid_animal)
# Raises:
# marshmallow.exceptions.ValidationError: {
# 'columns': ["Must be equal to ['name', 'class', 'max_speed', 'num_legs']."],
# 'data': {0: {3: ['Not a valid integer.']}}
# }
marshmallow_dataframe
can also generate Schemas for the orient=records
format by following the above steps but using
marshmallow_dataframe.RecordsDataFrameSchema
as the superclass for
AnimalSchema
.
Installation
marshmallow-dataframe requires Python >= 3.6 and marshmallow >= 3.0. You can install it with pip:
pip install marshmallow-dataframe
Contributing
Contributions are welcome!
You can report a problem or feature request in the issue tracker. If you feel that you can fix it or implement it, please submit a pull request referencing the issues it solves.
Unit tests written using the pytest
framework are in the
tests
directory, and are run using
tox on Python 3.6 and 3.7. You can run
the tests by installing tox:
pip install tox
and running the linters and tests for all Python versions by running tox
, or
for a specific Python version by running:
tox -e py36
We format the code with black, and you can format your checkout of the code before commiting it by running:
tox -e black -- .
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file marshmallow-dataframe-0.1.4.tar.gz
.
File metadata
- Download URL: marshmallow-dataframe-0.1.4.tar.gz
- Upload date:
- Size: 14.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a18f47b2646d16236d89e07e295eecd37d7d52d6dd5b25ee00a80d5d6e04bb58 |
|
MD5 | 689077a8dbf54528b87cd0703efc3cd5 |
|
BLAKE2b-256 | e77466808782f001791444c9969fea42da80473192fcdacbc049c7fc890b3b12 |
File details
Details for the file marshmallow_dataframe-0.1.4-py3-none-any.whl
.
File metadata
- Download URL: marshmallow_dataframe-0.1.4-py3-none-any.whl
- Upload date:
- Size: 10.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/39.0.1 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3aa5180eeb6867c0b608b425e5776e1a969cb3f9dfbac751fa405fc2766552bd |
|
MD5 | 93aa094c3b61a3fc8f405ffd0a9d70ef |
|
BLAKE2b-256 | 97305a4e81e718f2e371dc5c231728ccc2ebeab24a3c5b0a3d15dbe794f18b58 |