Waterfall statistic logging for data quality or filtering steps.
Project description
Waterfall-logging
Waterfall-logging is a Python package that enables you to log column counts in a Pandas DataFrames, export it as a Markdown table and plot a Waterfall statistics figure.
Documentation with examples can be found here.
Developed by Louis de Bruijn, https://louisdebruijn.com.
Installation
Install to use
Install Waterfall-logging using PyPi:
pip install waterfall-logging
Install to contribute
git clone https://github.com/LouisdeBruijn/waterfall-logging
python -m pip install -e .
pre-commit install --hook-type pre-commit --hook-type pre-push
Documentation
Documentation can be created via
mkdocs serve
Usage
Instructions are provided in the documentation's how-to-guides.
import pandas as pd
from waterfall_logging.log import PandasWaterfall
bicycle_rides = pd.DataFrame(data=[
['Shimano', 'race', 28, '2023-02-13', 1],
['Gazelle', 'comfort', 31, '2023-02-15', 1],
['Shimano', 'race', 31, '2023-02-16', 2],
['Batavia', 'comfort', 30, '2023-02-17', 3],
], columns=['brand', 'ride_type', 'wheel_size', 'date', 'bike_id']
)
bicycle_rides_log = PandasWaterfall(table_name='rides', columns=['brand', 'ride_type', 'wheel_size'],
distinct_columns=['bike_id'])
bicycle_rides_log.log(table=bicycle_rides, reason='Logging initial column values', configuration_flag='')
bicycle_rides = bicycle_rides.loc[lambda row: row['wheel_size'] > 30]
bicycle_rides_log.log(table=bicycle_rides, reason="Remove small wheels",
configuration_flag='small_wheel=False')
print(bicycle_rides_log.to_markdown())
| Table | brand | Δ brand | ride_type | Δ ride_type | wheel_size | Δ wheel_size | bike_id | Δ bike_id | Rows | Δ Rows | Reason | Configurations flag |
|:--------|--------:|----------:|------------:|--------------:|-------------:|---------------:|----------:|------------:|-------:|---------:|:------------------------------|:----------------------|
| rides | 4 | 0 | 4 | 0 | 4 | 0 | 3 | 0 | 4 | 0 | Logging initial column values | |
| rides | 2 | -2 | 2 | -2 | 2 | -2 | 2 | -1 | 2 | -2 | Remove small wheels | small_wheel=False |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file waterfall_logging-0.1.0.tar.gz.
File metadata
- Download URL: waterfall_logging-0.1.0.tar.gz
- Upload date:
- Size: 20.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.10.9 Darwin/22.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3e024e6f42dd723080f072d6375b19125c48c2e7008d36c2d8e66aac0f685950
|
|
| MD5 |
4f27f9c4c8a3ee7a2a5d963040bf481e
|
|
| BLAKE2b-256 |
2b00fbf115dcacc8d80f67852236a9274298e22bcf67ba5ae5665732a61df24e
|
File details
Details for the file waterfall_logging-0.1.0-py3-none-any.whl.
File metadata
- Download URL: waterfall_logging-0.1.0-py3-none-any.whl
- Upload date:
- Size: 19.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.3.2 CPython/3.10.9 Darwin/22.3.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a9c5b1bb4dd02d8a64e7fd06002376ec3d9384326969d7016b0f75937d10f0f6
|
|
| MD5 |
c349f9dd342416e1cdeb58ca611e9e05
|
|
| BLAKE2b-256 |
330b68f1fbb7dd0ff7228575f4e48d878c219ce15b72790e462c38e0c080a059
|