Using df-and-order your interactions with dataframes become very clean and predictable.
Project description
🗄️ df-and-order
Yeah, it's just like Law & Order, but Dataframe & Order!
pip install df_and_order
Using df-and-order your interactions with dataframes become very clean and predictable.
- Tired of absolute file paths to data in shared notebooks in your repository?
- Can't remember how your datasets were generated?
- Want to have safe and reproducible data transformations?
- Like declarative config-based solutions?
Good news for you!
How it looks in code?
Imagine the world where all you need to do for reading some dataframe you need just a few lines:
reader = MagicDfReader()
df = reader.read(df_id='user_activity_may_2020')
Maybe you are interested in some transformed version of that dataframe? No problem!
reader = MagicDfReader()
# ready to fit a model on!
model_input_df = reader.read(df_id='user_activity_may_2020', transform_id='model_input')
Wow. Is it really magic?
df-and-order works with yaml configs. Every config contains metadata about a dataset as well as all desired transfomations. Here's an example:
df_id: user_activity_may_2020 # here's the dataframe identifier
initial_df_format: csv
metadata: # this section contains some useful information about the dataset
author: Data Man
data_collection_date: 2020-05-01
transforms:
model_input: # here's the transform identifier
df_format: csv
in_memory: # means we want to perform transformations in memory every time we calling it, permanent transforms are supported as well
- module_path: df_and_order.steps.pd.DropColsTransformStep # file where to find class describing some transformation. this one drops columns
params: # init params for the transformation class
cols:
- redundant_col
- module_path: df_and_order.steps.DatesTransformStep # another transformation that converts str to datetime
params:
cols:
- date_col
Okay, what exactly is a df-and-order's transform?
Every transformation is about changing an initial dataset in any way.
A transformation is made of one or many steps. Each step represents some operation. Here are examples of such operations:
- dropping cols
- adding cols
- transforming existing cols
- etc
df-and-order uses subclasses of DfTransformStepConfig
to describe a step. It's possible and highly recommended to declare init parameters for any step in config.
Using Single Responsibility principle we achieve a granular control over our entire transformation.
Just by looking at the config you can say how the transformed dataframe was created.
Take a look at the more detailed overview to find more exciting stuff.
I also wrote an article to describe the benefits, check it out! There are lemurs and stuff.
Hope the lib will help somebody to boost the productivity.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file df-and-order-0.2.5.tar.gz
.
File metadata
- Download URL: df-and-order-0.2.5.tar.gz
- Upload date:
- Size: 41.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4be9dedb502c60e7fe57dc5a2559a44ea55717b24a5cf053c5b9718c93213f75 |
|
MD5 | b66440b0533d0ecfef1977b90ba8aacd |
|
BLAKE2b-256 | bd555249fbcfa6ea5f31875ccc8dc8ccd896231c28c39ba2ac28c7404432318f |
File details
Details for the file df_and_order-0.2.5-py3-none-any.whl
.
File metadata
- Download URL: df_and_order-0.2.5-py3-none-any.whl
- Upload date:
- Size: 17.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 119c373e2f80f024fb8be85f0b04406147aca94653a0705604f52d5cefbc65b1 |
|
MD5 | 3936a741e91242d514e56fd2a7a633c8 |
|
BLAKE2b-256 | f05d28e0e7a632d43b193bbbffcc64990bca2ec3d678b9922478265ece7086b9 |