Skip to main content

Waterfall statistic logging for data quality or filtering steps.

Project description

Version Downloads ! Docs - GitHub.io

Waterfall-logging

Waterfall-logging is a Python package that enables you to log column counts in a Pandas DataFrames, export it as a Markdown table and plot a Waterfall statistics figure.

Documentation with examples can be found here.

Developed by Louis de Bruijn, https://louisdebruijn.com.

Installation

Install to use

Install Waterfall-logging using PyPi:

pip install waterfall-logging

Install to contribute

git clone https://github.com/LouisdeBruijn/waterfall-logging
python -m pip install -e .

pre-commit install --hook-type pre-commit --hook-type pre-push

Documentation

Documentation can be created via

mkdocs serve

Usage

Instructions are provided in the documentation's how-to-guides.

import pandas as pd
from waterfall_logging.log import PandasWaterfall

bicycle_rides = pd.DataFrame(data=[
    ['Shimano', 'race', 28, '2023-02-13', 1],
    ['Gazelle', 'comfort', 31, '2023-02-15', 1],
    ['Shimano', 'race', 31, '2023-02-16', 2],
    ['Batavia', 'comfort', 30, '2023-02-17', 3],
], columns=['brand', 'ride_type', 'wheel_size', 'date', 'bike_id']
)

bicycle_rides_log = PandasWaterfall(table_name='rides', columns=['brand', 'ride_type', 'wheel_size'],
    distinct_columns=['bike_id'])
bicycle_rides_log.log(table=bicycle_rides, reason='Logging initial column values', configuration_flag='')

bicycle_rides = bicycle_rides.loc[lambda row: row['wheel_size'] > 30]
bicycle_rides_log.log(table=bicycle_rides, reason="Remove small wheels",
    configuration_flag='small_wheel=False')

print(bicycle_rides_log.to_markdown())

| Table   |   brand |   Δ brand |   ride_type |   Δ ride_type |   wheel_size |   Δ wheel_size |   bike_id |   Δ bike_id |   Rows |   Δ Rows | Reason                        | Configurations flag   |
|:--------|--------:|----------:|------------:|--------------:|-------------:|---------------:|----------:|------------:|-------:|---------:|:------------------------------|:----------------------|
| rides   |       4 |         0 |           4 |             0 |            4 |              0 |         3 |           0 |      4 |        0 | Logging initial column values |                       |
| rides   |       2 |        -2 |           2 |            -2 |            2 |             -2 |         2 |          -1 |      2 |       -2 | Remove small wheels           | small_wheel=False     |

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

waterfall_logging-0.1.0.tar.gz (20.0 kB view hashes)

Uploaded Source

Built Distribution

waterfall_logging-0.1.0-py3-none-any.whl (19.9 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page