Skip to main content

PiperRider CLI

Project description

PipeRider

Data reliability tool for profiling and testing your data

ci-tests release pipy python downloads license InfuseAI Discord Invite

Docs | Roadmap | Discord | Blog

Data Reliability = Profiling + Testing

Piperider is a CLI tool that allows you to build data profiles and write assertion tests for easily evaluating and tracking your data's reliability over time.

Core Concepts

  1. Profile Your Data to explore/understand what kind of dataset you're dealing with e.g. completeness, duplicates, missing values, distributions
  2. Test Your Data to verify that your data is within acceptable range and formatted correctly
  3. Observe & Monitor Your Data to keep an eye on how that data changes over time

Key Features

  • SQL-based (additionally supports CSV)
  • Data Profiling Characteristics
    • Provides rich data profiling metrics
    • e.g. missing, uniqueness, duplicate_rows, quantiles, histogram
  • Test datasets with a mix of custom and built-in assertion definitions
  • Auto-generates recommended assertions based on your single-run profiles
  • Generates single-run reports to visualize your data profile and assertion test results (example)
  • Generates comparison reports to visualize how your data has changed over time (example)
  • Supported Datasources: Snowflake, BigQuery, Redshift, Postgres, SQLite, DuckDB, CSV, Parquet.

Quickstart

Installation

pip install piperider

By default, PipeRider supports built-in SQLite connector, extra connectors are available:

connectors install
snowflake pip install 'piperider[snowflake]'
postgres pip install 'piperider[postgres]'
bigquery pip install 'piperider[bigquery]'
redshift pip install 'piperider[redshift]'
parquet pip install 'piperider[parquet]'
csv pip install 'piperider[csv]'
duckdb pip install 'piperider[duckdb]'

Use comma to install multiple connectors in one line:

pip install 'piperider[postgres,snowflake]'

Initialize Project & Diagnose Settings

Once installed, initialize a new project with the following command.

piperider init        # initializes project config
piperider diagnose    # verifies your data source connection & project config

Profiling and Testing Your Data

Next, execute piperider run, which will do a number of things:

  1. Create a single-run profile of your data source
  2. Auto-generate recommended or template assertions files (first-run only)
  3. Test that single-run profile against any available assertions, including custom and/or recommended assertions
  4. Generate a static HTML report, which helps visualize the single-run profile and its assertion results.

Common Usages/Tips:

piperider run                           # profile all tables in the data source.

piperider run --table $TABLENAME        # profile a specific table

piperider generate-report -o $PATHNAME  # Specify the output location of the generated report

piperider generate-assertions           # To re-generate the recommended assertions after the first-run

Comparing Your Data Profiles

With at least two runs completed, you can then run piperider compare-reports, which will generate a comparison report that presents the changes between them (e.g. schema changes, column renaming, distributions).

Common Usages/Tips:

piperider compare-reports --last        # Compare the last two reports automatically using

For more details on the generated report, see the doc

Example Report Demo

See Generated Single-Run Report

See Comparison Report

Development

See setup dev environment and the contributing guildlines to get started.

We're in an early stage, so let us know if you have any questions, feedback, or need help trying out PipeRider! :heart:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

piperider-0.16.0rc2.tar.gz (3.6 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

piperider-0.16.0rc2-py3-none-any.whl (3.7 MB view details)

Uploaded Python 3

File details

Details for the file piperider-0.16.0rc2.tar.gz.

File metadata

  • Download URL: piperider-0.16.0rc2.tar.gz
  • Upload date:
  • Size: 3.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.15

File hashes

Hashes for piperider-0.16.0rc2.tar.gz
Algorithm Hash digest
SHA256 4182b9fc935702980ae3cfcd5a606b7fd1de8299514b5336516a5b873d19f59b
MD5 351b032bec9480917239b40be3bf076f
BLAKE2b-256 61776715c7a7e60640c6c152abccb8a745c6878c8e145e2bbf5179413b46f2e9

See more details on using hashes here.

File details

Details for the file piperider-0.16.0rc2-py3-none-any.whl.

File metadata

  • Download URL: piperider-0.16.0rc2-py3-none-any.whl
  • Upload date:
  • Size: 3.7 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.15

File hashes

Hashes for piperider-0.16.0rc2-py3-none-any.whl
Algorithm Hash digest
SHA256 9b29eefc02fec8e1f1548567c85877abf0cb75f3958bb55e1a4d35b0dfda6a46
MD5 b89af855adf6c680ec92f1c20b8234a5
BLAKE2b-256 d24d5ffaf5cec2dda8d4802da2b13fe0b5024798e6b834dced331461daa37467

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page