Waterfall statistic logging for data quality or filtering steps.
Project description
Waterfall-logging
Waterfall-logging is a Python package that enables you to log column counts in a Pandas DataFrames, export it as a Markdown table and plot a Waterfall statistics figure.
Documentation with examples can be found here.
Developed by Louis de Bruijn, https://louisdebruijn.com.
Installation
Install to use
Install Waterfall-logging using PyPi:
pip install waterfall-logging
Install to contribute
git clone https://github.com/LouisdeBruijn/waterfall-logging
python -m pip install -e .
pre-commit install --hook-type pre-commit --hook-type pre-push
Documentation
Documentation can be created via
mkdocs serve
Usage
Instructions are provided in the documentation's how-to-guides.
import pandas as pd
from waterfall_logging.log import PandasWaterfall
bicycle_rides = pd.DataFrame(data=[
['Shimano', 'race', 28, '2023-02-13', 1],
['Gazelle', 'comfort', 31, '2023-02-15', 1],
['Shimano', 'race', 31, '2023-02-16', 2],
['Batavia', 'comfort', 30, '2023-02-17', 3],
], columns=['brand', 'ride_type', 'wheel_size', 'date', 'bike_id']
)
bicycle_rides_log = PandasWaterfall(table_name='rides', columns=['brand', 'ride_type', 'wheel_size'],
distinct_columns=['bike_id'])
bicycle_rides_log.log(table=bicycle_rides, reason='Logging initial column values', configuration_flag='')
bicycle_rides = bicycle_rides.loc[lambda row: row['wheel_size'] > 30]
bicycle_rides_log.log(table=bicycle_rides, reason="Remove small wheels",
configuration_flag='small_wheel=False')
print(bicycle_rides_log.to_markdown())
| Table | brand | Δ brand | ride_type | Δ ride_type | wheel_size | Δ wheel_size | bike_id | Δ bike_id | Rows | Δ Rows | Reason | Configurations flag |
|:--------|--------:|----------:|------------:|--------------:|-------------:|---------------:|----------:|------------:|-------:|---------:|:------------------------------|:----------------------|
| rides | 4 | 0 | 4 | 0 | 4 | 0 | 3 | 0 | 4 | 0 | Logging initial column values | |
| rides | 2 | -2 | 2 | -2 | 2 | -2 | 2 | -1 | 2 | -2 | Remove small wheels | small_wheel=False |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
waterfall_logging-0.1.0.tar.gz
(20.0 kB
view details)
Built Distribution
File details
Details for the file waterfall_logging-0.1.0.tar.gz
.
File metadata
- Download URL: waterfall_logging-0.1.0.tar.gz
- Upload date:
- Size: 20.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.10.9 Darwin/22.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e024e6f42dd723080f072d6375b19125c48c2e7008d36c2d8e66aac0f685950 |
|
MD5 | 4f27f9c4c8a3ee7a2a5d963040bf481e |
|
BLAKE2b-256 | 2b00fbf115dcacc8d80f67852236a9274298e22bcf67ba5ae5665732a61df24e |
File details
Details for the file waterfall_logging-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: waterfall_logging-0.1.0-py3-none-any.whl
- Upload date:
- Size: 19.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.10.9 Darwin/22.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a9c5b1bb4dd02d8a64e7fd06002376ec3d9384326969d7016b0f75937d10f0f6 |
|
MD5 | c349f9dd342416e1cdeb58ca611e9e05 |
|
BLAKE2b-256 | 330b68f1fbb7dd0ff7228575f4e48d878c219ce15b72790e462c38e0c080a059 |