Schematized pipeline operations on dataframes
Project description
dfbridge
A Schematized dataframe formatter.
We often have need to reformat a base dataframe to create a dataframe following a schema, applying a combination of renaming some columns, applying functions to others, and doing groupby/transform operations. These steps introduce a lot of boilerplate, but here we can assign it as a dictionary schema. The original dataframe is unchanged, and all of the operations take place only on the original dataframe.
Let's say we want the output dataframe to have columns final_name1
, final_name2
, and final_name3
, with one of them a simple rename from an input dataframe, one the result of some fucntion applied to the input dataframe, and one some groupby transform operation.
We can even remap values to other values in the process.
Setting fill_missing
to True lets one add the column and set it as full of pandas NA values.
The schema to do this looks like:
schema = {
"final_name1`": {
"type": "rename",
"from": "original_name",
"fill_missing": True,
"column_type": None,
'remap_dict': {'orig_val': 'new_val'}, # Remaps elements with original val to new val. Set to None or ignore to not use.
'strict_remap': True, # If True, values not in the remap_dict are made pd.NA, else are passed through intact.
},
"final_name2": {
"type": "apply",
"func": function, # Expects the whole row of the original dataframe, so use row['col] style arguments.
"fill_missing": True,
"column_type": None,
'remap_dict': None, # Remaps elements with original val to new val. Set to None or ignore to not use.
},
"final_name3": {
"type": "transform",
"groupby": "groupby_column",
"column": "return_column",
"action`": "mean", # (or anything that works in df.groupby().transform())
"fill_missing": True,
"column_type": None,
},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file dfbridge-0.0.2.tar.gz
.
File metadata
- Download URL: dfbridge-0.0.2.tar.gz
- Upload date:
- Size: 5.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.3.0 pkginfo/1.7.0 requests/2.25.1 setuptools/60.2.0 requests-toolbelt/0.9.1 tqdm/4.60.0 CPython/3.8.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2f1e9f187b9fc187222b912c932fc1b79c54c49acc78ba674c8aae50281cbffc |
|
MD5 | ccf8fb17080a42bb1f068ea93bcd309b |
|
BLAKE2b-256 | 2a107680d63b0c6e533ccc622f1531a30a883e6a7d952e8542c52fed64473c6a |