Skip to main content

A declarative, polars-native data frame validation library

Project description


dataframely — A declarative, 🐻‍❄️-native data frame validation library

CI Nightly CI conda-forge pypi-version python-version codecov

🗂 Table of Contents

📖 Introduction

Dataframely is a Python package to validate the schema and content of polars data frames. Its purpose is to make data pipelines more robust by ensuring that data meets expectations and more readable by adding schema information to data frame type hints.

💿 Installation

You can install dataframely using your favorite package manager, e.g., pixi or pip:

pixi add dataframely
pip install dataframely

🎯 Usage

Defining a data frame schema

import dataframely as dy
import polars as pl

class HouseSchema(dy.Schema):
    zip_code = dy.String(nullable=False, min_length=3)
    num_bedrooms = dy.UInt8(nullable=False)
    num_bathrooms = dy.UInt8(nullable=False)
    price = dy.Float64(nullable=False)

    @dy.rule()
    def reasonable_bathroom_to_bedroom_ratio(cls) -> pl.Expr:
        ratio = pl.col("num_bathrooms") / pl.col("num_bedrooms")
        return (ratio >= 1 / 3) & (ratio <= 3)

    @dy.rule(group_by=["zip_code"])
    def minimum_zip_code_count(cls) -> pl.Expr:
        return pl.len() >= 2

Validating data against schema

import polars as pl

df = pl.DataFrame({
    "zip_code": ["01234", "01234", "1", "213", "123", "213"],
    "num_bedrooms": [2, 2, 1, None, None, 2],
    "num_bathrooms": [1, 2, 1, 1, 0, 8],
    "price": [100_000, 110_000, 50_000, 80_000, 60_000, 160_000]
})

# Validate the data and cast columns to expected types
validated_df: dy.DataFrame[HouseSchema] = HouseSchema.validate(df, cast=True)

See more advanced usage examples in the documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataframely-2.2.0.tar.gz (377.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

dataframely-2.2.0-cp310-abi3-win_amd64.whl (5.3 MB view details)

Uploaded CPython 3.10+Windows x86-64

dataframely-2.2.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (5.3 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

dataframely-2.2.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (5.0 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64

dataframely-2.2.0-cp310-abi3-macosx_11_0_arm64.whl (4.8 MB view details)

Uploaded CPython 3.10+macOS 11.0+ ARM64

dataframely-2.2.0-cp310-abi3-macosx_10_12_x86_64.whl (5.2 MB view details)

Uploaded CPython 3.10+macOS 10.12+ x86-64

File details

Details for the file dataframely-2.2.0.tar.gz.

File metadata

  • Download URL: dataframely-2.2.0.tar.gz
  • Upload date:
  • Size: 377.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dataframely-2.2.0.tar.gz
Algorithm Hash digest
SHA256 7d415c2a2871f5d559c7fa805b7381b9438081f17a7687abab115b4f4bdc8d98
MD5 0a4dc878f5db2dd6e9c6cfe41f7e28b3
BLAKE2b-256 fd60dff5110653aa2a5fb44e10c5d59595f7bc57a5a6adc98f832649642f492b

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-2.2.0.tar.gz:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataframely-2.2.0-cp310-abi3-win_amd64.whl.

File metadata

  • Download URL: dataframely-2.2.0-cp310-abi3-win_amd64.whl
  • Upload date:
  • Size: 5.3 MB
  • Tags: CPython 3.10+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for dataframely-2.2.0-cp310-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 96cc62b391edc904615849d579f3698ddb72329a115f39cec5ce831bde53eb27
MD5 f011bd70a71c6d702dcb9f29e2058ad5
BLAKE2b-256 51e5b28c403dbb16b716477aeeb6ecf323d545422ee1407fc99e260bee73c600

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-2.2.0-cp310-abi3-win_amd64.whl:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataframely-2.2.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dataframely-2.2.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 fd3f17d9e2f5be817ada07b9e4fe92cb5492a70def769cde66691121dd35ce1b
MD5 a680ee8c33665a248ea46b6b880df60e
BLAKE2b-256 ac6c724217c571f0e6869294761499c30cce0f1847add290e0ac255bce210f55

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-2.2.0-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataframely-2.2.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for dataframely-2.2.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 fe4d7dd809d93aed059dc6225f027a696f5fb5fdf2c17a5c7ff77601d2bbf154
MD5 4093ce4f608e100282f3e5322117158f
BLAKE2b-256 f5ae621e45c9bc9fbdee95722dea393ade43fc5a4cbfe5dd1e72f8b600157163

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-2.2.0-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataframely-2.2.0-cp310-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for dataframely-2.2.0-cp310-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 57063e933c57246f02c6819cfa7d398faa4f5dee26ab42c9d20a45a39bc754d6
MD5 9f752b518838ad8fc36127cf3b1ce7e6
BLAKE2b-256 5418bb53575fe146a39b192fdc65f02fd41639eca257dbb291984c7676d7e8f8

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-2.2.0-cp310-abi3-macosx_11_0_arm64.whl:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataframely-2.2.0-cp310-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for dataframely-2.2.0-cp310-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 cb03f465a3f186ead8e72d1d004cc937eeb49a3995a7a5ecb39fc1320544a579
MD5 ea42501b7c5c81b49f2245d907daa685
BLAKE2b-256 d8b9116e41129432d8a3be30d382ebf30ecaec097297d5406629d90f20b43b15

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-2.2.0-cp310-abi3-macosx_10_12_x86_64.whl:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page