Profile and monitor your ML data pipeline end-to-end
Project description
The open standard for data logging
Documentation • Slack Community • Python Quickstart • WhyLabs Quickstart
What is whylogs
whylogs is an open source library for logging any kind of data. With whylogs, users are able to generate summaries of their datasets (called whylogs profiles) which they can use to:
- Track changes in their dataset
- Create data constraints to know whether their data looks the way it should
- Quickly visualize key summary statistics about their datasets
These three functionalities enable a variety of use cases for data scientists, machine learning engineers, and data engineers:
- Detect data drift in model input features
- Detect training-serving skew, concept drift, and model performance degradation
- Validate data quality in model inputs or in a data pipeline
- Perform exploratory data analysis of massive datasets
- Track data distributions & data quality for ML experiments
- Enable data auditing and governance across the organization
- Standardize data documentation practices across the organization
- And more
Quickstart
Install whylogs using the pip package manager in a terminal by running:
pip install whylogs
Then you can log data in python as simply as this:
import whylogs as why
import pandas as pd
df = pd.read_csv("path/to/file.csv")
results = why.log(df)
And voilà, you now have a whylogs profile. To learn more about what a whylogs profile is and what you can do with it, check out our docs and our examples.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.