Skip to main content

Using df-and-order your interactions with dataframes become very clean and predictable.

Project description

Python 3.7 CodeFactor Maintainability codecov

🗄️ df-and-order

Yeah, it's just like Law & Order, but Dataframe & Order!

pip install df_and_order

Using df-and-order your interactions with dataframes become very clean and predictable.

  • Tired of absolute file paths to data in shared notebooks in your repository?
  • Can't remember how your datasets were generated?
  • Want to have safe and reproducible data transformations?
  • Like declarative config-based solutions?

Good news for you!

How it looks in code?

Imagine the world where all you need to do for reading some dataframe you need just a few lines:

reader = MagicDfReader()
df = reader.read(df_id='user_activity_may_2020')

Maybe you are interested in some transformed version of that dataframe? No problem!

reader = MagicDfReader()
# ready to fit a model on!
model_input_df = reader.read(df_id='user_activity_may_2020', transform_id='model_input')

Wow. Is it really magic?

df-and-order works with yaml configs. Every config contains metadata about a dataset as well as all desired transfomations. Here's an example:

df_id: user_activity_may_2020  # here's the dataframe identifier
initial_df_format: csv
metadata:  # this section contains some useful information about the dataset
  author: Data Man
  data_collection_date: 2020-05-01
transforms:
  model_input:  # here's the transform identifier
    df_format: csv
    in_memory:  # means we want to perform transformations in memory every time we calling it, permanent transforms are supported as well
    - module_path: df_and_order.steps.pd.DropColsTransformStep  # file where to find class describing some transformation. this one drops columns
      params:  # init params for the transformation class
        cols:
        - redundant_col
    - module_path: df_and_order.steps.DatesTransformStep  # another transformation that converts str to datetime
      params:
        cols:
        - date_col

Okay, what exactly is a df-and-order's transform?

Every transformation is about changing an initial dataset in any way.

A transformation is made of one or many steps. Each step represents some operation. Here are examples of such operations:

  • dropping cols
  • adding cols
  • transforming existing cols
  • etc

df-and-order uses subclasses of DfTransformStepConfig to describe a step. It's possible and highly recommended to declare init parameters for any step in config. Using Single Responsibility principle we achieve a granular control over our entire transformation.

Just by looking at the config you can say how the transformed dataframe was created.

Take a look at the more detailed overview to find more exciting stuff.

I also wrote an article to describe the benefits, check it out! There are lemurs and stuff.

Hope the lib will help somebody to boost the productivity.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

df-and-order-0.2.5.tar.gz (41.8 kB view details)

Uploaded Source

Built Distribution

df_and_order-0.2.5-py3-none-any.whl (17.0 kB view details)

Uploaded Python 3

File details

Details for the file df-and-order-0.2.5.tar.gz.

File metadata

  • Download URL: df-and-order-0.2.5.tar.gz
  • Upload date:
  • Size: 41.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.5

File hashes

Hashes for df-and-order-0.2.5.tar.gz
Algorithm Hash digest
SHA256 4be9dedb502c60e7fe57dc5a2559a44ea55717b24a5cf053c5b9718c93213f75
MD5 b66440b0533d0ecfef1977b90ba8aacd
BLAKE2b-256 bd555249fbcfa6ea5f31875ccc8dc8ccd896231c28c39ba2ac28c7404432318f

See more details on using hashes here.

File details

Details for the file df_and_order-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: df_and_order-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 17.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.8.5

File hashes

Hashes for df_and_order-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 119c373e2f80f024fb8be85f0b04406147aca94653a0705604f52d5cefbc65b1
MD5 3936a741e91242d514e56fd2a7a633c8
BLAKE2b-256 f05d28e0e7a632d43b193bbbffcc64990bca2ec3d678b9922478265ece7086b9

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page