Skip to main content

Efficiently parse and validate all columns in pandas DataFrame.

Project description

Vrame

Vrame is a Python library designed to efficiently parse and validate all columns in pandas DataFrame. It leverages the vectorized operations of pandas to significantly speed up the validation process, making it a powerful tool for data validation in data science and machine learning projects.

Features

  • Vectorized Validation: Utilizes pandas' vectorized operations for fast and efficient data validation.
  • Similar Syntax to Pydantic: Offers a familiar API for those who have used Pydantic, making it easy to adopt.
  • Custom Validators: Allows for the definition of custom validation rules to meet specific data requirements.

Installation

To install Vrame, use pip:

pip install vrame

Usage

First, define a schema for your DataFrame using Vrame's syntax, which is similar to Pydantic's:

import pandas as pd
import numpy as np
from vrame.basemodel import BaseModel
from vrame.column_types import (
    Integer,
    Float,
    Boolean,
    Datetime,
    String,
    List,
    Tuple,
    Dictionary,
    Set,
    Object
)


class Model(BaseModel):
    integer = Integer(lower=-1, upper=6, nullable=True)
    float = Float(lower=-1.0, upper=6.0, nullable=True)
    bool = Boolean(nullable=True)
    datetime = Datetime(lower="2024-03-20", upper="2024-03-21", nullable=True)
    list = List(nullable=True, min_items=1, max_items=3)
    tuple = Tuple(nullable=True, min_items=1, max_items=3)
    dictionary = Dictionary(nullable=True, min_items=1, max_items=3)
    set = Set(nullable=True, min_items=1, max_items=3)
    string = String(min_length=0, max_length=5, nullable=True)
    object = Object(nullable=True)


if __name__ == "__main__":
    df = pd.DataFrame(
        {
            'integer': [1, "2", 3, 4, np.nan],
            'float': [1.0, "2.0", 3, "4", "5"],
            'bool': [True, "False", "True", False, False],
            'datetime': [
                "2024-03-20",
                "2024-03-21",
                "2024-03-21",
                "2024-03-21",
                "2024-03-21"
            ],
            'list': [[1, 2], "[3, 4]", [5, 6], [7, 8], [9, 10]],
            'tuple': [(1, 2), "(3, 4)", (5, 6), (7, 8), (9, 10)],
            'dictionary': [
                {'a': 1, 'b': 2.1},
                "{'e': 3, 'f': 4.0}",
                {'a': 1, 'b': 2.1},
                {'a': 1, 'b': 2.1},
                {'a': 1, 'b': 2.1}
            ],
            'set': [{1, 2}, "{1, 2}", {1, 2}, {1, 2}, {1, 2}],
            'string': ["str1", "str2", "", "12345", "I"],
            'object': [1, 2.0, False, np.nan, None]
        }
    )

    m = Model(df)
    df = m.parse_and_validate()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Vrame-1.0.3.tar.gz (9.9 kB view hashes)

Uploaded Source

Built Distribution

Vrame-1.0.3-py3-none-any.whl (11.5 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page