Skip to main content

A declarative, polars-native data frame validation library

Project description


dataframely — A declarative, 🐻‍❄️-native data frame validation library

CI Nightly CI conda-forge pypi-version python-version codecov

🗂 Table of Contents

📖 Introduction

Dataframely is a Python package to validate the schema and content of polars data frames. Its purpose is to make data pipelines more robust by ensuring that data meets expectations and more readable by adding schema information to data frame type hints.

💿 Installation

You can install dataframely using your favorite package manager, e.g., pixi or pip:

pixi add dataframely
pip install dataframely

🎯 Usage

Defining a data frame schema

import dataframely as dy
import polars as pl

class HouseSchema(dy.Schema):
    zip_code = dy.String(nullable=False, min_length=3)
    num_bedrooms = dy.UInt8(nullable=False)
    num_bathrooms = dy.UInt8(nullable=False)
    price = dy.Float64(nullable=False)

    @dy.rule()
    def reasonable_bathroom_to_bedroom_ratio() -> pl.Expr:
        ratio = pl.col("num_bathrooms") / pl.col("num_bedrooms")
        return (ratio >= 1 / 3) & (ratio <= 3)

    @dy.rule(group_by=["zip_code"])
    def minimum_zip_code_count() -> pl.Expr:
        return pl.len() >= 2

Validating data against schema

import polars as pl

df = pl.DataFrame({
    "zip_code": ["01234", "01234", "1", "213", "123", "213"],
    "num_bedrooms": [2, 2, 1, None, None, 2],
    "num_bathrooms": [1, 2, 1, 1, 0, 8],
    "price": [100_000, 110_000, 50_000, 80_000, 60_000, 160_000]
})

# Validate the data and cast columns to expected types
validated_df: dy.DataFrame[HouseSchema] = HouseSchema.validate(df, cast=True)

See more advanced usage examples in the documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataframely-1.7.3.tar.gz (326.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

dataframely-1.7.3-pp310-pypy310_pp73-macosx_10_12_x86_64.whl (509.3 kB view details)

Uploaded PyPymacOS 10.12+ x86-64

dataframely-1.7.3-cp310-abi3-win_amd64.whl (414.4 kB view details)

Uploaded CPython 3.10+Windows x86-64

dataframely-1.7.3-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (546.3 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

dataframely-1.7.3-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (539.1 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64

dataframely-1.7.3-cp310-abi3-macosx_11_0_arm64.whl (493.3 kB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

File details

Details for the file dataframely-1.7.3.tar.gz.

File metadata

  • Download URL: dataframely-1.7.3.tar.gz
  • Upload date:
  • Size: 326.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for dataframely-1.7.3.tar.gz
Algorithm Hash digest
SHA256 1e768b12c6db8774513062541aeb9211bd1bdb6d72a38ff274b73c4e89fe016a
MD5 ef803c4ac298a49a0b7fc55ee885999c
BLAKE2b-256 81b6c18c60812912c2885b9c05ef3e5aa2a9bdb9c73d360c5a667201cf338a86

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-1.7.3.tar.gz:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataframely-1.7.3-pp310-pypy310_pp73-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for dataframely-1.7.3-pp310-pypy310_pp73-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 3cf03732508152ce3f63a4c0cac91ef09fbbdfd5c620fba63a4cc1f9b40c6517
MD5 d079b8de233355ecce1b80f233650b61
BLAKE2b-256 762c7cf8a918b2086f59d7ccd4a1f366f773afa6456516f72ad1b309450bfc41

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-1.7.3-pp310-pypy310_pp73-macosx_10_12_x86_64.whl:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataframely-1.7.3-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: dataframely-1.7.3-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 414.4 kB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for dataframely-1.7.3-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 0bc18a21dd216ae36b1de88ace07eddc643a7788e02cdd0f3e1b04a6e458a297
MD5 5bd961f3d97fe2ee9c4cfee9d984fcc1
BLAKE2b-256 0ad562eda5c15a343c8b6b23869441b2b35e81aeb20557ecfac3c91d6f6cdc3a

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-1.7.3-cp310-abi3-win_amd64.whl:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataframely-1.7.3-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dataframely-1.7.3-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 d783a80a1a7bdfa0578379df84b07735a87000c9810e99fa01d51ba7d39ab541
MD5 4ccea089c1f05f1e0661dc26f1c1b77a
BLAKE2b-256 bd6ff35874641b54a00d1fa403c8665359d35c08f6cbbbf11c4043246edd01bf

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-1.7.3-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataframely-1.7.3-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for dataframely-1.7.3-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 45ce0163964093bca3323784044ff99ce94235bf95d01453164291a4c356cdc0
MD5 295188c0362b7ab3293f3ea9090bb205
BLAKE2b-256 38962786913bd4bdb86b1f2f1def1f39cde6de1e3a58cb5c40fd73f65913f369

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-1.7.3-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataframely-1.7.3-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for dataframely-1.7.3-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0b23fd5d7d4721faad339a81b1e06cbf425fde99bc284b3cd57a5ec05341e12b
MD5 d91fbffcf0c3ed496a69523533d0b25d
BLAKE2b-256 21fb96f6fa996ca5148fe8a4372e04ef0e9401fdf512bbc5fc86623fae9400cd

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-1.7.3-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page