Skip to main content

Pandas DataFrame subclasses that enforce structure and can self-organize.

Project description

Typed DataFrames

Version status License Python version compatibility Version on Github Version on PyPi Build (Actions) Documentation status Coverage (coveralls) Maintainability Scrutinizer Code Quality
Created with Tyrannosaurus

Pandas DataFrame subclasses that enforce structure and can self-organize. Because your functions can’t exactly accept any DataFrame.

The subclassed DataFrames can have required and/or optional columns and indices, and support custom requirements. Columns are automatically turned into indices, which means read_csv and to_csv are always inverses. MyDf.read_csv(mydf.to_csv()) is just mydf.

The DataFrames will display nicely in Jupyter notebooks, and a few convenience methods are added, such as sort_natural and drop_cols. See the docs for more information.

pip install typeddfs[hdf5] to install.

Please note that HDF5 via pytables is unsupported in Python 3.9 on Windows as of 2021-02-03.

Simple example for a CSV like this:

key value note
abc 123 ?
from typeddfs import TypedDfs

# Build me a Key-Value-Note class!
KeyValue = (
    TypedDfs.typed("KeyValue")        # typed means enforced requirements
    .require("key", dtype=str, index=True)  # automagically make this an index
    .require("value")                 # required
    .reserve("note")                  # permitted but not required
    .strict()                         # don’t allow other columns
).build()

# This will self-organize and use "key" as the index:
df = KeyValue.read_csv("example.csv")

# For fun, let"s write it and read it back:
df.to_csv("remke.csv")
df = KeyValue("remake.csv")
print(df.index_names(), df.column_names())  # ["key"], ["value", "note"]

# And now, we can type a function to require a KeyValue,
# and let it raise an `InvalidDfError` (here, a `MissingColumnError`):
def my_special_function(df: KeyValue) -> float:
    return KeyValue(df)["value"].sum()

All of the normal DataFrame methods are available. Use .untyped() or .vanilla() to make a detyped copy that doesn’t enforce requirements.

A small note of caution: natsort is no longer pinned to a specific major version as of version 0.5 because it receives somewhat frequent major updates. This means that the result of typed-df’s sort_natural could change. You can pin natsort to a specific major version; e.g. natsort = "^7" with Poetry.

Typed-Dfs is licensed under the Apache License, version 2.0. New issues and pull requests are welcome. Please refer to the contributing guide.
Generated with Tyrannosaurus.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

typeddfs-0.5.0.tar.gz (19.3 kB view hashes)

Uploaded Source

Built Distribution

typeddfs-0.5.0-py3-none-any.whl (25.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page