Skip to main content

A declarative, polars-native data frame validation library

Project description


dataframely — A declarative, 🐻‍❄️-native data frame validation library

CI conda-forge pypi-version python-version codecov

🗂 Table of Contents

📖 Introduction

Dataframely is a Python package to validate the schema and content of polars data frames. Its purpose is to make data pipelines more robust by ensuring that data meets expectations and more readable by adding schema information to data frame type hints.

💿 Installation

You can install dataframely using your favorite package manager, e.g., pixi or pip:

pixi add dataframely
pip install dataframely

🎯 Usage

Defining a data frame schema

import dataframely as dy
import polars as pl

class HouseSchema(dy.Schema):
    zip_code = dy.String(nullable=False, min_length=3)
    num_bedrooms = dy.UInt8(nullable=False)
    num_bathrooms = dy.UInt8(nullable=False)
    price = dy.Float64(nullable=False)

    @dy.rule()
    def reasonable_bathroom_to_bedrooom_ratio() -> pl.Expr:
        ratio = pl.col("num_bathrooms") / pl.col("num_bedrooms")
        return (ratio >= 1 / 3) & (ratio <= 3)

    @dy.rule(group_by=["zip_code"])
    def minimum_zip_code_count() -> pl.Expr:
        return pl.len() >= 2

Validating data against schema

import polars as pl

df = pl.DataFrame({
    "zip_code": ["01234", "01234", "1", "213", "123", "213"],
    "num_bedrooms": [2, 2, 1, None, None, 2],
    "num_bathrooms": [1, 2, 1, 1, 0, 8],
    "price": [100_000, 110_000, 50_000, 80_000, 60_000, 160_000]
})

# Validate the data and cast columns to expected types
validated_df: dy.DataFrame[HouseSchema] = HouseSchema.validate(df, cast=True)

See more advanced usage examples in the documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dataframely-1.5.0.tar.gz (252.8 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

dataframely-1.5.0-cp311-abi3-win_amd64.whl (403.8 kB view details)

Uploaded CPython 3.11+Windows x86-64

dataframely-1.5.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (538.5 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.17+ x86-64

dataframely-1.5.0-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (531.0 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.17+ ARM64

dataframely-1.5.0-cp311-abi3-macosx_11_0_arm64.whl (482.0 kB view details)

Uploaded CPython 3.11+macOS 11.0+ ARM64

dataframely-1.5.0-cp311-abi3-macosx_10_12_x86_64.whl (498.6 kB view details)

Uploaded CPython 3.11+macOS 10.12+ x86-64

File details

Details for the file dataframely-1.5.0.tar.gz.

File metadata

  • Download URL: dataframely-1.5.0.tar.gz
  • Upload date:
  • Size: 252.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for dataframely-1.5.0.tar.gz
Algorithm Hash digest
SHA256 83e7b03de2e0e52f6342c26311196efeaa9ff3e510854f2beef152fa1c40fafb
MD5 70049993b80916c166b85c86e9f601fb
BLAKE2b-256 a19e8661899c63eaa4dd6044d47442818b54f9e32a7d19344dad60595efec7f1

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-1.5.0.tar.gz:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataframely-1.5.0-cp311-abi3-win_amd64.whl.

File metadata

  • Download URL: dataframely-1.5.0-cp311-abi3-win_amd64.whl
  • Upload date:
  • Size: 403.8 kB
  • Tags: CPython 3.11+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for dataframely-1.5.0-cp311-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 5600b4f643ed69bd0b8fd1fccd80ef15c19d70709969cd29ddf480adcf7eafd3
MD5 986fd6783ba51a4f4e18ae5fe172107b
BLAKE2b-256 323984c4bf231ea54182fd1d649e635ca58ff5726074bef4be9c5ee294945d3f

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-1.5.0-cp311-abi3-win_amd64.whl:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataframely-1.5.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dataframely-1.5.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 1a6599b2fa2607bc71c56fe83394a4d6aa7188a412cfe4d494a4fbac1521024e
MD5 a45ba441a7d45f66092842d12589ffef
BLAKE2b-256 aed15d4f90fb0212b15df9d92122f088161bb965ca1bd506fd6e8adf28236ad2

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-1.5.0-cp311-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataframely-1.5.0-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for dataframely-1.5.0-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 4767919538d92db434950b35eabc60d060a1ff69a4dc1a6cc596a97c2c84af44
MD5 72336ac8473fb8d589e893dc3d08f52c
BLAKE2b-256 02dcb5de05a874a4216af7fad2317aeb922592c04ae6cfbe09a86af46b4a199c

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-1.5.0-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataframely-1.5.0-cp311-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for dataframely-1.5.0-cp311-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 3a0e12861541ca285992329493ac479a069caad4b1f1a8422f058cc46f323dae
MD5 89238a63c41dbb3ab04b80f4251581cf
BLAKE2b-256 e52616ee58458da99cee6b8cd5dbcdcf18294c6749a1d7d4906435dadde21ac1

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-1.5.0-cp311-abi3-macosx_11_0_arm64.whl:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file dataframely-1.5.0-cp311-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for dataframely-1.5.0-cp311-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 599a7df2485252107c44a5975d1bdfb0c1ae9cacf2b8121b57e377146adcc3b4
MD5 19744b7cad2f0bb5b71be10dc727a9a1
BLAKE2b-256 1f9322d5aa25851916f8fa98326dbe66a25b696f303b7e491931638a5bdcf2f7

See more details on using hashes here.

Provenance

The following attestation bundles were made for dataframely-1.5.0-cp311-abi3-macosx_10_12_x86_64.whl:

Publisher: build.yml on Quantco/dataframely

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page