Pandas DataFrame subclasses that enforce structure and can self-organize.
Project description
Typed DataFrames
Pandas DataFrame subclasses that enforce structure and can self-organize. Because your functions can’t exactly accept any DataFrame.
The subclassed DataFrames can have required and/or optional columns and indices,
and support custom requirements.
Columns are automatically turned into indices,
which means read_csv
and to_csv
are always inverses.
MyDf.read_csv(mydf.to_csv())
is just mydf
.
The DataFrames will display nicely in Jupyter notebooks,
and a few convenience methods are added, such as sort_natural
and drop_cols
.
See the docs for more information.
Simple example for a CSV like this:
key | value | note |
---|---|---|
abc | 123 | ? |
from typeddfs import TypedDfs
# Build me a Key-Value-Note class!
KeyValue = (
TypedDfs.typed('KeyValue') # typed means enforced requirements
.require('key', index=True) # automagically make this an index
.require('value') # required
.reserve('note') # permitted but not required
.strict() # don't allow other columns
).build()
# This will self-organize and use 'key' as the index:
df = KeyValue.read_csv('example.csv')
# For fun, let's write it and read it back:
df.to_csv('remke.csv')
df = KeyValue('remake.csv')
print(df.index_names(), df.column_names()) # ['key'], ['value', 'note']
# And now, we can type a function to require a KeyValue,
# and let it raise an `InvalidDfError` (here, a `MissingColumnError`):
def my_special_function(df: KeyValue) -> float:
return KeyValue(df)['value'].sum()
All of the normal DataFrame methods are available.
Use .untyped()
or .vanilla()
to make a detyped copy that doesn't enforce requirements.
New issues and pull requests are welcome. Please refer to the contributing guide. Generated with Tyrannosaurus.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.