Skip to main content

A declarative, polars-native data frame validation library

Project description


dataframely — A declarative, 🐻‍❄️-native data frame validation library

CI Nightly CI conda-forge pypi-version python-version codecov

🗂 Table of Contents

📖 Introduction

Dataframely is a Python package to validate the schema and content of polars data frames. Its purpose is to make data pipelines more robust by ensuring that data meets expectations and more readable by adding schema information to data frame type hints.

💿 Installation

You can install dataframely using your favorite package manager, e.g., pixi or pip:

pixi add dataframely
pip install dataframely

🎯 Usage

Defining a data frame schema

import dataframely as dy
import polars as pl

class HouseSchema(dy.Schema):
    zip_code = dy.String(nullable=False, min_length=3)
    num_bedrooms = dy.UInt8(nullable=False)
    num_bathrooms = dy.UInt8(nullable=False)
    price = dy.Float64(nullable=False)

    @dy.rule()
    def reasonable_bathroom_to_bedroom_ratio() -> pl.Expr:
        ratio = pl.col("num_bathrooms") / pl.col("num_bedrooms")
        return (ratio >= 1 / 3) & (ratio <= 3)

    @dy.rule(group_by=["zip_code"])
    def minimum_zip_code_count() -> pl.Expr:
        return pl.len() >= 2

Validating data against schema

import polars as pl

df = pl.DataFrame({
    "zip_code": ["01234", "01234", "1", "213", "123", "213"],
    "num_bedrooms": [2, 2, 1, None, None, 2],
    "num_bathrooms": [1, 2, 1, 1, 0, 8],
    "price": [100_000, 110_000, 50_000, 80_000, 60_000, 160_000]
})

# Validate the data and cast columns to expected types
validated_df: dy.DataFrame[HouseSchema] = HouseSchema.validate(df, cast=True)

See more advanced usage examples in the documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataframely-1.12.0.tar.gz (307.2 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

dataframely-1.12.0-pp310-pypy310_pp73-macosx_10_12_x86_64.whl (523.8 kB view details)

Uploaded PyPymacOS 10.12+ x86-64

dataframely-1.12.0-cp310-abi3-win_amd64.whl (427.7 kB view details)

Uploaded CPython 3.10+Windows x86-64

dataframely-1.12.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (560.8 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

dataframely-1.12.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (552.0 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64

dataframely-1.12.0-cp310-abi3-macosx_11_0_arm64.whl (506.4 kB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

File details

Details for the file dataframely-1.12.0.tar.gz.

File metadata

  • Download URL: dataframely-1.12.0.tar.gz
  • Upload date:
  • Size: 307.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for dataframely-1.12.0.tar.gz
Algorithm Hash digest
SHA256 f1defdc52f94cdd3e8f6ae324c992ee017e6fa86128240c6143fbade0987920c
MD5 60e838243b09a6faa0d37fbf1304522d
BLAKE2b-256 3dde7b82a179a3b8d418a47434fd074a562640040b4cb608e0423ca2378247a4

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-1.12.0.tar.gz:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataframely-1.12.0-pp310-pypy310_pp73-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for dataframely-1.12.0-pp310-pypy310_pp73-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 911080cbf367e7a99ec93b90953c63888693453d2ea96ef33de1bbf3e0c406c5
MD5 1b909aeace7f928ddb8211cf13c0b96a
BLAKE2b-256 51843ce7b234d93443d981306cd8876e8bdbfd58c348cd738961af87ab5583d2

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-1.12.0-pp310-pypy310_pp73-macosx_10_12_x86_64.whl:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataframely-1.12.0-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: dataframely-1.12.0-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 427.7 kB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for dataframely-1.12.0-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 dc28f6ecbbfdfa79aae0307d5c254a33f0545c6dcdbf4d6ddfb2656053d4abe0
MD5 1c1c48b8ad0512ca82dc1753a0571e60
BLAKE2b-256 6adff1bee09884d7093eb82216e39d728718655c5fb32578ffb4438464f3fd33

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-1.12.0-cp310-abi3-win_amd64.whl:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataframely-1.12.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dataframely-1.12.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b5d4e7fb18c889bc6378c5ecad1132c6e85a9b208e7eeee86a54badc4c9ef2b1
MD5 6854c67248b87e20ccee0add8413467a
BLAKE2b-256 193b70afa9a67dcff78fd14c1432e15d2f9e56ebe91505b451e361bbd64336c2

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-1.12.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataframely-1.12.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for dataframely-1.12.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 8fabb3a446f2fb801102030f804b289cb08c5dde9793c55a6270e08c735855f3
MD5 5cdaf324939b405b933d3d79031d6439
BLAKE2b-256 5f80f1c30b441429756c3071bb496834be5dc968520294b42f0bf99fd28e76b3

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-1.12.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataframely-1.12.0-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for dataframely-1.12.0-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 68217ca721b3b6a9c717fea1471fe3c23f83bcd8f975106c0f3c0c78f4c39029
MD5 2c8bbb5feddd6f03b1e653061dfe115d
BLAKE2b-256 38121a0cadbb0dd8c84c454c6e9629f1ffcd36f63c6e3bc44161bbbf4a11dce3

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-1.12.0-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page