PiperRider CLI
Project description
Data reliability tool for profiling and testing your data
Docs | Roadmap | Discord | Blog
Data Reliability = Profiling + Testing
Piperider is a CLI tool that allows you to build data profiles and write assertion tests for easily evaluating and tracking your data's reliability over time.
Core Concepts
- Profile Your Data to explore/understand what kind of dataset you're dealing with e.g. completeness, duplicates, missing values, distributions
- Test Your Data to verify that your data is within acceptable range and formatted correctly
- Observe & Monitor Your Data to keep an eye on how that data changes over time
Key Features
- SQL-based (additionally supports CSV)
- Data Profiling Characteristics
- Provides rich data profiling metrics
- e.g.
missing
,uniqueness
,duplicate_rows
,quantiles
,histogram
- Test datasets with a mix of custom and built-in assertion definitions
- Auto-generates recommended assertions based on your single-run profiles
- Generates single-run reports to visualize your data profile and assertion test results (example)
- Generates comparison reports to visualize how your data has changed over time (example)
- Supported Datasources: Snowflake, BigQuery, Redshift, Postgres, SQLite, DuckDB, CSV, Parquet.
Quickstart
Installation
pip install piperider
By default, PipeRider supports built-in SQLite connector, extra connectors are available:
connectors | install |
---|---|
snowflake | pip install 'piperider[snowflake]' |
postgres | pip install 'piperider[postgres]' |
bigquery | pip install 'piperider[bigquery]' |
redshift | pip install 'piperider[redshift]' |
parquet | pip install 'piperider[parquet]' |
csv | pip install 'piperider[csv]' |
duckdb | pip install 'piperider[duckdb]' |
Use comma to install multiple connectors in one line:
pip install 'piperider[postgres,snowflake]'
Initialize Project & Diagnose Settings
Once installed, initialize a new project with the following command.
piperider init # initializes project config
piperider diagnose # verifies your data source connection & project config
Profiling and Testing Your Data
Next, execute piperider run
, which will do a number of things:
- Create a single-run profile of your data source
- Auto-generate recommended or template assertions files (first-run only)
- Test that single-run profile against any available assertions, including custom and/or recommended assertions
- Generate a static HTML report, which helps visualize the single-run profile and its assertion results.
Common Usages/Tips:
piperider run # profile all tables in the data source.
piperider run --table $TABLENAME # profile a specific table
piperider generate-report -o $PATHNAME # Specify the output location of the generated report
piperider generate-assertions # To re-generate the recommended assertions after the first-run
Comparing Your Data Profiles
With at least two runs completed, you can then run piperider compare-reports
, which will generate a comparison report that presents the changes between them (e.g. schema changes, column renaming, distributions).
Common Usages/Tips:
piperider compare-reports --last # Compare the last two reports automatically using
For more details on the generated report, see the doc
Example Report Demo
See Generated Single-Run Report
Development
See setup dev environment and the contributing guildlines to get started.
We're in an early stage, so let us know if you have any questions, feedback, or need help trying out PipeRider! :heart:
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file piperider-nightly-0.18.0.20230118.tar.gz
.
File metadata
- Download URL: piperider-nightly-0.18.0.20230118.tar.gz
- Upload date:
- Size: 3.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8bb092f8ba079008b44141b96ff85046847091cab8ce13f786e19483f083cff1 |
|
MD5 | 7daf782640e06f2ba6887d3d34d9d983 |
|
BLAKE2b-256 | d3a959141a5324be264437c3acd3473e10e36903bd8f5b2b528bf72e7622ee74 |
File details
Details for the file piperider_nightly-0.18.0.20230118-py3-none-any.whl
.
File metadata
- Download URL: piperider_nightly-0.18.0.20230118-py3-none-any.whl
- Upload date:
- Size: 3.7 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | bddd36136a13987725308a930f2714a131623d76a410cfd4b2b3c61e3b446353 |
|
MD5 | ca96a3700512b170a5c41a75ad996bdb |
|
BLAKE2b-256 | 566e66e02b465a64050fb452f68eb9cedbffc375e3530190fe75f31991bf3d42 |