Skip to main content

A declarative, polars-native data frame validation library

Project description


dataframely — A declarative, 🐻‍❄️-native data frame validation library

CI Nightly CI conda-forge pypi-version python-version codecov

🗂 Table of Contents

📖 Introduction

Dataframely is a Python package to validate the schema and content of polars data frames. Its purpose is to make data pipelines more robust by ensuring that data meets expectations and more readable by adding schema information to data frame type hints.

💿 Installation

You can install dataframely using your favorite package manager, e.g., pixi or pip:

pixi add dataframely
pip install dataframely

🎯 Usage

Defining a data frame schema

import dataframely as dy
import polars as pl

class HouseSchema(dy.Schema):
    zip_code = dy.String(nullable=False, min_length=3)
    num_bedrooms = dy.UInt8(nullable=False)
    num_bathrooms = dy.UInt8(nullable=False)
    price = dy.Float64(nullable=False)

    @dy.rule()
    def reasonable_bathroom_to_bedroom_ratio() -> pl.Expr:
        ratio = pl.col("num_bathrooms") / pl.col("num_bedrooms")
        return (ratio >= 1 / 3) & (ratio <= 3)

    @dy.rule(group_by=["zip_code"])
    def minimum_zip_code_count() -> pl.Expr:
        return pl.len() >= 2

Validating data against schema

import polars as pl

df = pl.DataFrame({
    "zip_code": ["01234", "01234", "1", "213", "123", "213"],
    "num_bedrooms": [2, 2, 1, None, None, 2],
    "num_bathrooms": [1, 2, 1, 1, 0, 8],
    "price": [100_000, 110_000, 50_000, 80_000, 60_000, 160_000]
})

# Validate the data and cast columns to expected types
validated_df: dy.DataFrame[HouseSchema] = HouseSchema.validate(df, cast=True)

See more advanced usage examples in the documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataframely-1.8.0.tar.gz (300.0 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

dataframely-1.8.0-pp310-pypy310_pp73-macosx_10_12_x86_64.whl (512.1 kB view details)

Uploaded PyPymacOS 10.12+ x86-64

dataframely-1.8.0-cp310-abi3-win_amd64.whl (417.2 kB view details)

Uploaded CPython 3.10+Windows x86-64

dataframely-1.8.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (548.4 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

dataframely-1.8.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (539.7 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64

dataframely-1.8.0-cp310-abi3-macosx_11_0_arm64.whl (496.0 kB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

File details

Details for the file dataframely-1.8.0.tar.gz.

File metadata

  • Download URL: dataframely-1.8.0.tar.gz
  • Upload date:
  • Size: 300.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for dataframely-1.8.0.tar.gz
Algorithm Hash digest
SHA256 a9b856468d5a4bc5e56c08eb03e8313e0b0533a5ed78c04cbf9488545d270b9f
MD5 41f831bfdf76d5038293e65494e0d196
BLAKE2b-256 62148a852ba1bf188ba620e781da8f80c633855d4a0c9a66a094060f96652a19

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-1.8.0.tar.gz:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataframely-1.8.0-pp310-pypy310_pp73-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for dataframely-1.8.0-pp310-pypy310_pp73-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 eb0a9a59aab89334368c6d25a2302ae34d6967ab3e2ea2c543e7935ad120019e
MD5 2757b286de7ad10ee45160f13024705c
BLAKE2b-256 75c9b0d244efa6cce2e823d783117d4209c02ba9def9640c0a7c5aa1836845dc

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-1.8.0-pp310-pypy310_pp73-macosx_10_12_x86_64.whl:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataframely-1.8.0-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: dataframely-1.8.0-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 417.2 kB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for dataframely-1.8.0-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 0b63620de365a7d5fbf07318a9064f65a40437a25a3e0349384f6ea105ed1416
MD5 68976f02d386d03f4eb6d252c54ae9c4
BLAKE2b-256 b0c5e63e7a896a6ffed3122024b4ad60c5ea2467009e3e25c5bcf5e6b1ccffcf

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-1.8.0-cp310-abi3-win_amd64.whl:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataframely-1.8.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dataframely-1.8.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 9663240f316ed74141339159a32f41b2314eda60bd74fdeaa3163604fe8e7c23
MD5 55b5295a2ec0e0ae8f4eb85c6ecf19a4
BLAKE2b-256 0f3952923b409e821115f4ddd307916c5338b20251e52a1047fd92e74d46a4a9

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-1.8.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataframely-1.8.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for dataframely-1.8.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 4d4a1f43fc27b032dc2458dd2ce5772857a95d2ccbd67c796ddc0c2a443e88d8
MD5 2bfc76be2b97cdf7de688fe1cb32efa8
BLAKE2b-256 5f1c953d484ab24b0a44205218127aa8f2eb62cee3b22870fe8f424955706583

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-1.8.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataframely-1.8.0-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for dataframely-1.8.0-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 24d6a6cdb7b63f09d9705d1f18c76ab10ac5ba608e9603d8a61f6d45571f7d35
MD5 de5befab1a8edb43a2ef2f5bb6c30f44
BLAKE2b-256 ee3de3b4ae75f73bbffb30efb1b9a63edad7ece6713c7c521199a18f65c86a99

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-1.8.0-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page