Skip to main content

A declarative, polars-native data frame validation library

Project description


dataframely — A declarative, 🐻‍❄️-native data frame validation library

CI Nightly CI conda-forge pypi-version python-version codecov

🗂 Table of Contents

📖 Introduction

Dataframely is a Python package to validate the schema and content of polars data frames. Its purpose is to make data pipelines more robust by ensuring that data meets expectations and more readable by adding schema information to data frame type hints.

💿 Installation

You can install dataframely using your favorite package manager, e.g., pixi or pip:

pixi add dataframely
pip install dataframely

🎯 Usage

Defining a data frame schema

import dataframely as dy
import polars as pl

class HouseSchema(dy.Schema):
    zip_code = dy.String(nullable=False, min_length=3)
    num_bedrooms = dy.UInt8(nullable=False)
    num_bathrooms = dy.UInt8(nullable=False)
    price = dy.Float64(nullable=False)

    @dy.rule()
    def reasonable_bathroom_to_bedroom_ratio(cls) -> pl.Expr:
        ratio = pl.col("num_bathrooms") / pl.col("num_bedrooms")
        return (ratio >= 1 / 3) & (ratio <= 3)

    @dy.rule(group_by=["zip_code"])
    def minimum_zip_code_count(cls) -> pl.Expr:
        return pl.len() >= 2

Validating data against schema

import polars as pl

df = pl.DataFrame({
    "zip_code": ["01234", "01234", "1", "213", "123", "213"],
    "num_bedrooms": [2, 2, 1, None, None, 2],
    "num_bathrooms": [1, 2, 1, 1, 0, 8],
    "price": [100_000, 110_000, 50_000, 80_000, 60_000, 160_000]
})

# Validate the data and cast columns to expected types
validated_df: dy.DataFrame[HouseSchema] = HouseSchema.validate(df, cast=True)

See more advanced usage examples in the documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataframely-2.4.0.tar.gz (380.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

dataframely-2.4.0-cp310-abi3-win_amd64.whl (5.3 MB view details)

Uploaded CPython 3.10+Windows x86-64

dataframely-2.4.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.3 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

dataframely-2.4.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (5.0 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64

dataframely-2.4.0-cp310-abi3-macosx_11_0_arm64.whl (4.8 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

dataframely-2.4.0-cp310-abi3-macosx_10_12_x86_64.whl (5.2 MB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file dataframely-2.4.0.tar.gz.

File metadata

  • Download URL: dataframely-2.4.0.tar.gz
  • Upload date:
  • Size: 380.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dataframely-2.4.0.tar.gz
Algorithm Hash digest
SHA256 9dc6fb2b7d1202ce87245b26597d0c003631bb6f088e1bb82becd072b2e5155b
MD5 83db1b8e9030a6bf12d233a7ea709c28
BLAKE2b-256 2c7ca5bbc7d24de5b0a5361bc2e3ed4178759b57f32fdd97f8e3ab786881c7dd

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-2.4.0.tar.gz:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataframely-2.4.0-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: dataframely-2.4.0-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 5.3 MB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dataframely-2.4.0-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 5881250b8998d01dd01a38695b0b7094ead2688c6084dc489ca7ac75f1cd0369
MD5 57bb6f6ebe9551a1872b984ef9f3ca86
BLAKE2b-256 8c4a458776c85e8ecd88c3b78dc43dbd6d792f29726276ab4910633cb56fd177

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-2.4.0-cp310-abi3-win_amd64.whl:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataframely-2.4.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dataframely-2.4.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 260762dccf719aac66d91d8f054c056f53cf4c28408ab43d5dbc84e1a0264f16
MD5 582d8997f8e2c4b3becca167a139bc15
BLAKE2b-256 8941a1bec597bcca12028fba70c53d000784a83f0e28a0a8c85452c4a549565f

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-2.4.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataframely-2.4.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for dataframely-2.4.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 079cae35f496160b9fd3733347767bffc58c794d4a6d0333152335ddb92e1873
MD5 0efd56acc773c2f9a16d2f5b23668b9f
BLAKE2b-256 b1e79b798279740a682023ee82d7c0bf08bc3e2882539de396f6baac2fb1d834

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-2.4.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataframely-2.4.0-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for dataframely-2.4.0-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3eedd4eedb546b6784cca486afedc0d5ebb728b6a371a17f12a687dd14c90dbb
MD5 8f8d709c438667e7676a2e5aab4a887d
BLAKE2b-256 5c32426735de7b5f05787d68611192569dd47d7cf9a54e2447c5eca828b2db0d

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-2.4.0-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataframely-2.4.0-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for dataframely-2.4.0-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 19ca4cf51c94dae8f6a4e0c6b6e80cb2d48019958d80c1c9874864e03a32c78b
MD5 46575325810fb1a9ecb0530569380cb9
BLAKE2b-256 405d49ff6b71e46e8c3645a39967390d34aa721aa3c2bbcaa3554447a570e4c1

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-2.4.0-cp310-abi3-macosx_10_12_x86_64.whl:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page