Skip to main content

Waterfall statistic logging for data quality or filtering steps.

Project description

Version Downloads ! Docs - GitHub.io

Waterfall-logging

Waterfall-logging is a Python package that enables you to log column counts in a Pandas DataFrames, export it as a Markdown table and plot a Waterfall statistics figure.

Documentation with examples can be found here.

Developed by Louis de Bruijn, https://louisdebruijn.com.

Installation

Install to use

Install Waterfall-logging using PyPi:

pip install waterfall-logging

Install to contribute

git clone https://github.com/LouisdeBruijn/waterfall-logging
python -m pip install -e .

pre-commit install --hook-type pre-commit --hook-type pre-push

Documentation

Documentation can be created via

mkdocs serve

Usage

Instructions are provided in the documentation's how-to-guides.

import pandas as pd
from waterfall_logging.log import PandasWaterfall

bicycle_rides = pd.DataFrame(data=[
    ['Shimano', 'race', 28, '2023-02-13', 1],
    ['Gazelle', 'comfort', 31, '2023-02-15', 1],
    ['Shimano', 'race', 31, '2023-02-16', 2],
    ['Batavia', 'comfort', 30, '2023-02-17', 3],
], columns=['brand', 'ride_type', 'wheel_size', 'date', 'bike_id']
)

bicycle_rides_log = PandasWaterfall(table_name='rides', columns=['brand', 'ride_type', 'wheel_size'],
    distinct_columns=['bike_id'])
bicycle_rides_log.log(table=bicycle_rides, reason='Logging initial column values', configuration_flag='')

bicycle_rides = bicycle_rides.loc[lambda row: row['wheel_size'] > 30]
bicycle_rides_log.log(table=bicycle_rides, reason="Remove small wheels",
    configuration_flag='small_wheel=False')

print(bicycle_rides_log.to_markdown())

| Table   |   brand |   Δ brand |   ride_type |   Δ ride_type |   wheel_size |   Δ wheel_size |   bike_id |   Δ bike_id |   Rows |   Δ Rows | Reason                        | Configurations flag   |
|:--------|--------:|----------:|------------:|--------------:|-------------:|---------------:|----------:|------------:|-------:|---------:|:------------------------------|:----------------------|
| rides   |       4 |         0 |           4 |             0 |            4 |              0 |         3 |           0 |      4 |        0 | Logging initial column values |                       |
| rides   |       2 |        -2 |           2 |            -2 |            2 |             -2 |         2 |          -1 |      2 |       -2 | Remove small wheels           | small_wheel=False     |

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

waterfall_logging-0.1.0.tar.gz (20.0 kB view details)

Uploaded Source

Built Distribution

waterfall_logging-0.1.0-py3-none-any.whl (19.9 kB view details)

Uploaded Python 3

File details

Details for the file waterfall_logging-0.1.0.tar.gz.

File metadata

  • Download URL: waterfall_logging-0.1.0.tar.gz
  • Upload date:
  • Size: 20.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.10.9 Darwin/22.3.0

File hashes

Hashes for waterfall_logging-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3e024e6f42dd723080f072d6375b19125c48c2e7008d36c2d8e66aac0f685950
MD5 4f27f9c4c8a3ee7a2a5d963040bf481e
BLAKE2b-256 2b00fbf115dcacc8d80f67852236a9274298e22bcf67ba5ae5665732a61df24e

See more details on using hashes here.

File details

Details for the file waterfall_logging-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for waterfall_logging-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a9c5b1bb4dd02d8a64e7fd06002376ec3d9384326969d7016b0f75937d10f0f6
MD5 c349f9dd342416e1cdeb58ca611e9e05
BLAKE2b-256 330b68f1fbb7dd0ff7228575f4e48d878c219ce15b72790e462c38e0c080a059

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page