Pandas DataFrame subclasses that enforce structure and can self-organize.
Project description
Typed DataFrames
Pandas DataFrame subclasses that enforce structure and can self-organize. Because your functions can’t exactly accept any DataFrame.
The subclassed DataFrames can have required and/or optional columns and indices,
and support custom requirements.
Columns are automatically turned into indices,
which means read_csv
and to_csv
are always inverses.
MyDf.read_csv(mydf.to_csv())
is just mydf
.
The DataFrames will display nicely in Jupyter notebooks,
and a few convenience methods are added, such as sort_natural
and drop_cols
.
See the docs for more information.
pip install typeddfs[hdf5]
to install.
Please note that HDF5 via pytables is unsupported in Python 3.9 on Windows as of 2021-02-03.
Simple example for a CSV like this:
key | value | note |
---|---|---|
abc | 123 | ? |
from typeddfs import TypedDfs
# Build me a Key-Value-Note class!
KeyValue = (
TypedDfs.typed("KeyValue") # typed means enforced requirements
.require("key", dtype=str, index=True) # automagically make this an index
.require("value") # required
.reserve("note") # permitted but not required
.strict() # don’t allow other columns
).build()
# This will self-organize and use "key" as the index:
df = KeyValue.read_csv("example.csv")
# For fun, let"s write it and read it back:
df.to_csv("remke.csv")
df = KeyValue("remake.csv")
print(df.index_names(), df.column_names()) # ["key"], ["value", "note"]
# And now, we can type a function to require a KeyValue,
# and let it raise an `InvalidDfError` (here, a `MissingColumnError`):
def my_special_function(df: KeyValue) -> float:
return KeyValue(df)["value"].sum()
All of the normal DataFrame methods are available.
Use .untyped()
or .vanilla()
to make a detyped copy that doesn’t enforce requirements.
A small note of caution: natsort is no longer pinned
to a specific major version as of version 0.5 because it receives somewhat frequent major updates.
This means that the result of typed-df’s sort_natural
could change.
You can pin natsort to a specific major version; e.g. natsort = "^7"
with Poetry.
Typed-Dfs is licensed under the Apache License, version 2.0.
New issues and pull requests are welcome.
Please refer to the contributing guide.
Generated with Tyrannosaurus.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.