Skip to main content

A data-designer plugin for creating columns via custom Python functions

Project description

Data Designer Lambda Column Plugin

A plugin for data-designer that allows you to define columns using custom Python functions. This enables you to inject logic, transformations, and computations directly into your data generation pipeline.

Features

  • Row-wise Operations: Apply a function to each row (similar to pandas.DataFrame.apply(axis=1)).
  • Full DataFrame Operations: Apply transformations to the entire DataFrame (e.g., exploding lists, aggregations, filtering, pivoting).
  • Dependency Management: Explicitly declare required columns to ensure execution order.

Installation

This plugin is designed to be used with data-designer.

pip install data-designer-lambda-column

Usage

Basic Row-wise Transformation

Use operation_type="row" (default) to calculate values based on other columns in the same row.

from data_designer_lambda_column.plugin import LambdaColumnConfig
from data_designer.essentials import DataDesignerConfigBuilder, SamplerColumnConfig, CategorySamplerParams

builder = DataDesignerConfigBuilder()

# 1. Add some base data
builder.add_column(
    SamplerColumnConfig(
        name="quantity",
        sampler_type="category",
        params=CategorySamplerParams(values=[10, 20, 30]),
    )
)

builder.add_column(
    SamplerColumnConfig(
        name="price",
        sampler_type="category",
        params=CategorySamplerParams(values=[5.0, 10.0]),
    )
)

# 2. Add a computed column using a lambda function
builder.add_column(
    LambdaColumnConfig(
        name="total_cost",
        required_cols=["quantity", "price"],
        operation_type="row",  # default
        column_function=lambda row: row["quantity"] * row["price"]
    )
)

Advanced Full DataFrame Transformation

Use operation_type="full" when you need to change the shape of the DataFrame (e.g., explode, melt) or perform operations that require the full context.

Note: When using operation_type="full", your function receives the entire DataFrame and must return the modified DataFrame.

Warning: Operations that change the number of rows (like explode) may not work as expected in the current version due to validation checks on update records in data_designer.

from data_designer_lambda_column.plugin import LambdaColumnConfig
from data_designer.essentials import DataDesignerConfigBuilder

# Define a function to explode a list column
def explode_items(df):
    # Assume 'items_list' is a column containing lists of items
    # e.g., [['apple', 'banana'], ['orange']]
    
    # Explode the list so each item gets its own row
    expanded_df = df.explode("items_list")
    
    # Ensure dependencies are met
    # The new column name 'single_item' must exist in the returned DataFrame
    expanded_df["single_item"] = expanded_df["items_list"]
    
    return expanded_df

builder.add_column(
    LambdaColumnConfig(
        name="single_item",
        required_cols=["items_list"],
        operation_type="full",
        column_function=explode_items
    )
)

Configuration

LambdaColumnConfig accepts the following parameters:

Parameter Type Default Description
name str Required The name of the column to generate.
column_function callable Required The Python function to execute.
required_cols list[str] [] List of column names that must exist before this column is generated.
operation_type Literal["row", "full"] "row" Type of operation. "row" passes a Series (row) to the function. "full" passes the entire DataFrame.

Plugin Registration

This package exposes a standard data_designer plugin entry point:

  • Entry Point: data_designer.plugins
  • Name: lambda-column
  • Impl: data_designer_lambda_column.plugin.LambdaColumnGenerator

It will be automatically discovered by data-designer when installed in the same environment.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_designer_lambda_column-0.1.0.tar.gz (151.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

data_designer_lambda_column-0.1.0-py3-none-any.whl (4.5 kB view details)

Uploaded Python 3

File details

Details for the file data_designer_lambda_column-0.1.0.tar.gz.

File metadata

File hashes

Hashes for data_designer_lambda_column-0.1.0.tar.gz
Algorithm Hash digest
SHA256 d49b18b41031cca039c19106fa4da2aa52aa60aff9dc5c63a38143d538964cdb
MD5 4389efa1fc0ed9088df1c8a49de02b87
BLAKE2b-256 041a0f9685a1e7a7836ab34f2daa33c6e511d6042f5773cb66906e129712297b

See more details on using hashes here.

File details

Details for the file data_designer_lambda_column-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for data_designer_lambda_column-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3e68f6c661c5d2f6c96b0614afd4b48efeb11eadb9750791e57fdacb27ee3c62
MD5 c6726c04aaa96a274da24c8ed03eafba
BLAKE2b-256 5751ceb5e6e8c65c854312e97880e212caa8b394e5bc35cb4f61c7e469581383

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page