A comprehensive assertion-and-validation toolkit for ML workflows.

These details have not been verified by PyPI

Project links

Project description

ml-assert

ml-assert logo

A lightweight, chainable assertion toolkit for validating data and models in ML workflows.

ml-assert is a Python library that provides a fluent, expressive API to act as a guardrail in your automated ML pipelines. It doesn't just calculate metrics; it asserts that your data and models meet specific, mission-critical criteria. If an assertion fails, it fails loudly and immediately, stopping the pipeline to prevent bad models or corrupt data from moving downstream.

This is crucial for building robust, production-ready ML systems where data quality, model performance, and artifact integrity are non-negotiable.

Core Features

DataFrame Assertions: Validate pandas DataFrame properties like schema, null values, column uniqueness, value ranges, and set membership.
Statistical Drift Detection: Use low-level statistical tests (Kolmogorov-Smirnov, Chi-Squared, Wasserstein) or a high-level assert_no_drift function to detect changes between datasets.
Model Performance Assertions: Chain assertions for key classification metrics (Accuracy, Precision, Recall, F1, ROC AUC) to ensure your model meets performance targets.
Extensible Plugin System: Leverage built-in plugins (file_exists, dvc_check) or create your own to add custom checks.
Declarative CLI: Define your assertion suite in a single config.yaml and run it from the command line, generating JSON and HTML reports.

Installation

pip install ml-assert

How It Works: Assertion vs. Calculation

A typical metrics library might calculate an accuracy of 75% and let the pipeline continue. ml-assert asserts that accuracy must be >= 80%. If it's 75%, it raises an AssertionError, halting execution.

This paradigm shift from passive calculation to active assertion is what makes ml-assert a powerful tool for ML Ops.

Usage Examples

1. DataFrameAssertion DSL

Chain assertions to validate a pandas DataFrame. The chain stops at the first failure.

import pandas as pd
import numpy as np
from ml_assert import Assertion, schema

# DataFrame with a column full of nulls and an out-of-range value
data = {
    'user_id': list(range(100, 110)),
    'age': [25, 30, 99, 45, 30, 50, 60, 22, 33, 41], # 99 is out of range
    'plan_type': ['basic', 'premium', 'basic', 'premium', 'premium', 'basic', 'free', 'free', 'premium', 'basic'],
    'empty_col': [np.nan] * 10
}
df = pd.DataFrame(data)

# This check will FAIL because `age` has a value > 70
try:
    s = schema()
    s.col("user_id").is_unique()
    s.col("age").in_range(18, 70)
    s.col("plan_type").is_type("object")

    Assertion(df).satisfies(s).no_nulls().validate()
except AssertionError as e:
    print(f"As expected, validation failed: {e}")

# This check will PASS because we only check specific columns
s2 = schema()
s2.col("user_id").is_unique()
Assertion(df).satisfies(s2).no_nulls(['user_id', 'age', 'plan_type']).validate()

print("Partial validation passed!")

2. High-Level Drift Detection

Detect distributional drift between a reference (training) and current (inference) dataset. assert_no_drift intelligently applies KS tests to numeric columns and Chi-Squared tests to categorical columns.

import pandas as pd
import numpy as np
from ml_assert.stats.drift import assert_no_drift

# Reference dataset
df_ref = pd.DataFrame({
    'temperature': np.random.normal(20, 5, 500),
    'city': np.random.choice(['NY', 'LA', 'SF'], 500, p=[0.5, 0.3, 0.2])
})

# Current dataset with a deliberate drift
df_cur = pd.DataFrame({
    'temperature': np.random.normal(30, 5, 500), # Mean shifted by +10
    'city': np.random.choice(['NY', 'LA', 'SF'], 500, p=[0.2, 0.3, 0.5]) # Proportions changed
})

# This will FAIL and identify the drifting column ('temperature').
try:
    assert_no_drift(df_ref, df_cur, alpha=0.05)
except AssertionError as e:
    print(f"As expected, drift was detected: {e}")

# This will PASS because the data is identical.
assert_no_drift(df_ref, df_ref.copy(), alpha=0.05)
print("No drift detected in identical datasets.")

3. Model Performance Assertions

Ensure your model's predictions meet your minimum quality bar.

from sklearn.datasets import make_classification
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from ml_assert import assert_model

# Generate data and train a simple model
X, y = make_classification(random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)
model = LogisticRegression().fit(X_train, y_train)
y_pred = model.predict(X_test)
y_scores = model.predict_proba(X_test)[:, 1]

# Chain assertions for key metrics
# This will PASS if all metrics meet their thresholds.
assert_model(y_test, y_pred, y_scores) \
    .accuracy(min_score=0.80) \
    .precision(min_score=0.80) \
    .recall(min_score=0.80) \
    .f1(min_score=0.80) \
    .roc_auc(min_score=0.90) \
    .validate()

print("All model performance metrics passed!")

4. CLI for Automated Runs

Define a suite of checks in a YAML file and execute it with the ml_assert CLI. This is perfect for CI/CD pipelines.

config.yaml

steps:
  - type: drift
    train: 'ref.csv'
    test: 'cur.csv'
    alpha: 0.05
    # The CLI run will fail on this step due to drift

  - type: model_performance
    y_true: 'y_true.csv'
    y_pred: 'y_pred.csv'
    y_scores: 'y_scores.csv'
    assertions:
      accuracy: 0.75
      roc_auc: 0.80

  - type: file_exists
    path: 'my_model.pkl'

  - type: dvc_check
    path: 'model_data.csv'

Run from your terminal:

# poetry run ml_assert run config.yaml
# The command will fail because of the drift, and generate reports.
ml_assert run config.yaml

This command generates two reports:

config.report.json: A machine-readable summary.
config.report.html: A human-friendly HTML report.

Contributing

Contributions are welcome! Please see CONTRIBUTING.md for details on how to get started.

License

This project is licensed under the Apache 2.0 License - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.0.5

Jun 11, 2025

This version

1.0.4

Jun 11, 2025

1.0.3

Jun 8, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ml_assert-1.0.4.tar.gz (27.7 kB view details)

Uploaded Jun 11, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

ml_assert-1.0.4-py3-none-any.whl (31.3 kB view details)

Uploaded Jun 11, 2025 Python 3

File details

Details for the file ml_assert-1.0.4.tar.gz.

File metadata

Download URL: ml_assert-1.0.4.tar.gz
Upload date: Jun 11, 2025
Size: 27.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ml_assert-1.0.4.tar.gz
Algorithm	Hash digest
SHA256	`c19cce966b6870197e75ff7d9ac7ecf56192e6640bfaa3e4a29ff430fea28155`
MD5	`e8de6fe47fba7ad3808f5586d36cc4a8`
BLAKE2b-256	`c17ccba6030899bbee56e37307eba5d91bd0c222a78de1d78bf07d21408d4a99`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ml_assert-1.0.4.tar.gz:

Publisher: ci.yml on HeyShinde/ml-assert

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ml_assert-1.0.4.tar.gz
- Subject digest: c19cce966b6870197e75ff7d9ac7ecf56192e6640bfaa3e4a29ff430fea28155
- Sigstore transparency entry: 234860779
- Sigstore integration time: Jun 11, 2025
Source repository:
- Permalink: HeyShinde/ml-assert@d3552bba1de5e789344ed04397ef43adf5e95550
- Branch / Tag: refs/tags/v1.0.4
- Owner: https://github.com/HeyShinde
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@d3552bba1de5e789344ed04397ef43adf5e95550
- Trigger Event: push

File details

Details for the file ml_assert-1.0.4-py3-none-any.whl.

File metadata

Download URL: ml_assert-1.0.4-py3-none-any.whl
Upload date: Jun 11, 2025
Size: 31.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ml_assert-1.0.4-py3-none-any.whl
Algorithm	Hash digest
SHA256	`68328f9a7926aa3b99442e691b0c60e86dff91730629df7a779fccde78503eb3`
MD5	`7333e0cbad6d6ee3e9039e5768877bf4`
BLAKE2b-256	`0f167c5c3618128c5da5e56d590915893fd9d14302b0546dc76cbe1087d88c3b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for ml_assert-1.0.4-py3-none-any.whl:

Publisher: ci.yml on HeyShinde/ml-assert

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: ml_assert-1.0.4-py3-none-any.whl
- Subject digest: 68328f9a7926aa3b99442e691b0c60e86dff91730629df7a779fccde78503eb3
- Sigstore transparency entry: 234860794
- Sigstore integration time: Jun 11, 2025
Source repository:
- Permalink: HeyShinde/ml-assert@d3552bba1de5e789344ed04397ef43adf5e95550
- Branch / Tag: refs/tags/v1.0.4
- Owner: https://github.com/HeyShinde
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: ci.yml@d3552bba1de5e789344ed04397ef43adf5e95550
- Trigger Event: push

ml-assert 1.0.4

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

ml-assert

Core Features

Installation

How It Works: Assertion vs. Calculation

Usage Examples

1. DataFrameAssertion DSL

2. High-Level Drift Detection

3. Model Performance Assertions

4. CLI for Automated Runs

Contributing

License

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance