Skip to main content

A convenient data flow to preprocess data using metadata.

Project description

dproc

A convenient data flow to preprocess data using metadata.

Install

pip install dproc

How to use

Import

from dproc import *

Load the definition file

Load the data defintion from the location of your choice (locally, server, cloud).

dproc.meta.definition = pd.read_excel('your-data-definition-file')

This file contains all meta information such as

In order to generate a specifc entity definition ...

dproc.meta.entity = 'your-entity'

and then you can apply the dataflow steps:

entity_cleaned = (entity_raw
                  .step_rename_cols()
                  .step_replace_missing_with_nan()
                  .step_remove_not_needed_cols()
                  .step_remove_rows_with_missing_ids()
                  .step_remove_duplicate_rows()
                  .step_format_dates(cols=['created'])
                  .step_format_dates(cols=['modified'])
                  .step_format_round_numeric_cols(cols=['rating'], decimal_places=2)
                  .step_change_dtypes()
                 )

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dproc-0.0.2.tar.gz (9.8 kB view hashes)

Uploaded Source

Built Distribution

dproc-0.0.2-py3-none-any.whl (8.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page