Skip to main content

A declarative, polars-native data frame validation library

Project description


dataframely — A declarative, 🐻‍❄️-native data frame validation library

CI Nightly CI conda-forge pypi-version python-version codecov

🗂 Table of Contents

📖 Introduction

Dataframely is a Python package to validate the schema and content of polars data frames. Its purpose is to make data pipelines more robust by ensuring that data meets expectations and more readable by adding schema information to data frame type hints.

💿 Installation

You can install dataframely using your favorite package manager, e.g., pixi or pip:

pixi add dataframely
pip install dataframely

🎯 Usage

Defining a data frame schema

import dataframely as dy
import polars as pl

class HouseSchema(dy.Schema):
    zip_code = dy.String(nullable=False, min_length=3)
    num_bedrooms = dy.UInt8(nullable=False)
    num_bathrooms = dy.UInt8(nullable=False)
    price = dy.Float64(nullable=False)

    @dy.rule()
    def reasonable_bathroom_to_bedroom_ratio() -> pl.Expr:
        ratio = pl.col("num_bathrooms") / pl.col("num_bedrooms")
        return (ratio >= 1 / 3) & (ratio <= 3)

    @dy.rule(group_by=["zip_code"])
    def minimum_zip_code_count() -> pl.Expr:
        return pl.len() >= 2

Validating data against schema

import polars as pl

df = pl.DataFrame({
    "zip_code": ["01234", "01234", "1", "213", "123", "213"],
    "num_bedrooms": [2, 2, 1, None, None, 2],
    "num_bathrooms": [1, 2, 1, 1, 0, 8],
    "price": [100_000, 110_000, 50_000, 80_000, 60_000, 160_000]
})

# Validate the data and cast columns to expected types
validated_df: dy.DataFrame[HouseSchema] = HouseSchema.validate(df, cast=True)

See more advanced usage examples in the documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataframely-1.7.6.tar.gz (326.4 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

dataframely-1.7.6-pp310-pypy310_pp73-macosx_10_12_x86_64.whl (509.7 kB view details)

Uploaded PyPymacOS 10.12+ x86-64

dataframely-1.7.6-cp310-abi3-win_amd64.whl (414.7 kB view details)

Uploaded CPython 3.10+Windows x86-64

dataframely-1.7.6-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (546.8 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

dataframely-1.7.6-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (539.3 kB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64

dataframely-1.7.6-cp310-abi3-macosx_11_0_arm64.whl (493.6 kB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

File details

Details for the file dataframely-1.7.6.tar.gz.

File metadata

  • Download URL: dataframely-1.7.6.tar.gz
  • Upload date:
  • Size: 326.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for dataframely-1.7.6.tar.gz
Algorithm Hash digest
SHA256 184b57868a8ce2ba880022eea2fbcc82c03450c397c0643c054de46e54207180
MD5 0c9ee32938500f7b4aac03999240bca3
BLAKE2b-256 ef106ebad85222c00b9d6785a61095f78e2657ef14889085e60cf09a26e3e956

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-1.7.6.tar.gz:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataframely-1.7.6-pp310-pypy310_pp73-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for dataframely-1.7.6-pp310-pypy310_pp73-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 ba13eb32210e8a873590602ef78d21851ce4276930d3978e606a9b8f1946d78b
MD5 07f6ab1ceaea1277ecf1132d17b26fc4
BLAKE2b-256 a4c15f09361f777d1dc2c6b482b465bdcf8cce2e6cdd33edaf2c474a020b6809

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-1.7.6-pp310-pypy310_pp73-macosx_10_12_x86_64.whl:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataframely-1.7.6-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: dataframely-1.7.6-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 414.7 kB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for dataframely-1.7.6-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 6e269d6b08f703c3e1fbc51de56900940e085dea6a92cfc0fa8228cf1d5de7fb
MD5 c34dad92c9c66c33b1d3981f0860a6dc
BLAKE2b-256 ad84b2ef7d5e202e536b351be4333e8f9029da39a78f243f0627652b3bd720e0

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-1.7.6-cp310-abi3-win_amd64.whl:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataframely-1.7.6-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dataframely-1.7.6-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 4a9a40e1dc278cf252f4a8a42d15fb02e4bcbefa26257bbc7d6f0a4d85e1be76
MD5 0ebd679257663240b8ce5c94c006dc2f
BLAKE2b-256 3b388c80a3bb3256cb9e64a60236c43698a69b50b8e122d017668e6345269211

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-1.7.6-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataframely-1.7.6-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for dataframely-1.7.6-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 45f7f7c2356db1c4b91bd14b61bc2ac8dbf28da892774f4d30d0735f3d4fd7fe
MD5 ced82443e54a31dd5b261b541bbcd786
BLAKE2b-256 74b26d04da4e77ed417f0f23cac98415fe088fdaa9f998c6228d4340261a3393

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-1.7.6-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataframely-1.7.6-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for dataframely-1.7.6-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f556a50ae82db33bc657ef7b58b333879bc735a3c131950d71b6cae73bbd38d1
MD5 270b1f58806f1a88f2f207b3133447e2
BLAKE2b-256 07b3119d6958b02b6200aec3b8d5256aef56a8c8e5600f28fd9d303472b33bb1

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-1.7.6-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page