Static type checking of pandas DataFrames
Project description
I love Pandas! But in production code I’m always a bit wary when I see:
import pandas as pd
def foo(df: pd.DataFrame) -> pd.DataFrame:
# do stuff
return df
Because… How do I know which columns are supposed to be in df?
Using strictly_typed_pandas, we can be more explicit about what these data should look like.
from strictly_typed_pandas import DataSet
class Schema:
id: int
name: str
def foo(df: DataSet[Schema]) -> DataSet[Schema]:
# do stuff
return df
Where DataSet:
– is a subclass of pd.DataFrame and hence has the same functionality as DataFrame.
– validates whether the data adheres to the provided schema upon its initialization.
– is immutable, so its schema cannot be changed using inplace modifications.
The DataSet[Schema] annotations are compatible with:
– mypy for type checking during linting-time (i.e. while you write your code).
– typeguard for type checking during run-time (i.e. while you run your unit tests).
To get the most out of strictly_typed_pandas, be sure to:
– set up mypy in your IDE.
– run your unit tests with pytest –typeguard-packages=foo.bar (where foo.bar is your package name).
Installation
For now, please install strictly_typed_pandas directly from Github.
pip install git+https://github.com/nanne-aben/strictly_typed_pandas
Documentation
For example notebooks and API documentation, please see our ReadTheDocs.
FAQ
How is this different from Dataenforce / Pandera?
The main difference: strictly_typed_pandas works really well with mypy, allowing you to catch many of the errors during linting-time (i.e. while your coding), rather than during run-time.
Why use Python if you want static typing?
There are just so many good packages for data science in Python. Rather than sacrificing all of that by moving to a different language, I’d like to make the Pythonverse a little bit better.
I found a bug! What should I do?
Great! Contact me and I’ll look into it.
I have a great idea to improve strictly_typed_pandas! How can we make this work?
Awesome, drop me a line!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Close
Hashes for strictly_typed_pandas-0.1.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 759ed05707d26192d4f3ff8ec8fa03afb3d6b6f58b35414e25921d1314a9e3d3 |
|
MD5 | dc24ef0a8ae460e3f8660600412ae10d |
|
BLAKE2b-256 | 4dadbfdf8e318364d411856e4f9fb758200708165cc3be520fa8e8c8dea92b3b |
Close
Hashes for strictly_typed_pandas-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5c1aef4cac9b22688d0149d8acea376f04383c9cd006bdeb3e24df106818d811 |
|
MD5 | 7fe86a059be92e60ed865873fdabbc92 |
|
BLAKE2b-256 | 8a7a53cd10f7bc81eeb49e0647984ff420a815d27a7aa5ac9c0da04d2e92b82c |