Skip to main content

Safe, reproducible data transformations with built-in auditing and validation

Project description

TransformPlan

TransformPlan: Auditable Data Transformation Pipelines

Python 3.10+ Coverage

Features

  • Declarative transformations: Build transformation pipelines using method chaining
  • Schema validation: Validate operations before execution with dry-run capability
  • Audit trails: Generate complete audit protocols with deterministic DataFrame hashing
  • Multi-backend support: Polars (default) and DuckDB backends with a pluggable Backend ABC
  • Serializable pipelines: Save and load transformation plans as JSON

Quick Example

from transformplan import TransformPlan, Col

# Build readable pipelines with 88 chainable operations
plan = (
    TransformPlan()
    # Standardize column names
    .col_rename(column="PatientID", new_name="patient_id")
    .col_rename(column="DOB", new_name="date_of_birth")
    .str_strip(column="patient_id")

    # Calculate derived values
    .dt_age_years(column="date_of_birth", new_column="age")
    .math_clamp(column="age", min_value=0, max_value=120)

    # Categorize patients age
    .map_discretize(column="age", bins=[18, 40, 65], labels=["young", "adult", "senior"], new_column="age_group")

    # Filter and clean
    .rows_filter(Col("age") >= 18)
    .rows_drop_nulls(columns=["patient_id", "age"])
    .col_drop(column="date_of_birth")
)

# Execute with schema validation — catch errors before they hit production
df_result, protocol = plan.process(df, validate=True)

# Serialize pipelines to JSON — version control your transformations
plan.to_json("patient_transform.json")

# Reload and reapply — reproducible results across environments
plan = TransformPlan.from_json("patient_transform.json")
df_result, protocol = plan.process(new_data)

Full Audit Trail — Every Step Tracked and Hashed

protocol.print(show_params=False)
======================================================================
TRANSFORM PROTOCOL
======================================================================
Input:  1000 rows × 5 cols  [a4f8b2c1]
Output: 847 rows × 5 cols   [e7d3f9a2]
Total time: 0.0247s
----------------------------------------------------------------------

#    Operation            Rows         Cols         Time       Hash
----------------------------------------------------------------------
0    input                1000         5            -          a4f8b2c1
1    col_rename           1000         5            0.0012s    b2e4a7f3
2    col_rename           1000         5            0.0008s    c9d1e5b8
3    str_strip            1000         5            0.0013s    c9d1e5b8        ○
4    dt_age_years         1000         6 (+1)       0.0041s    d4f2c8a1
5    math_clamp           1000         6            0.0015s    e1b7d3f9
6    map_discretize       1000         7 (+1)       0.0028s    f8a4c2e6
7    rows_filter          858 (-142)   7            0.0037s    a2e9f4b7
8    rows_drop_nulls      847 (-11)    7            0.0019s    b5c1d8e3
9    col_drop             847          6 (-1)       0.0006s    e7d3f9a2
======================================================================
○ = no effect (steps 3 did not change data)

DuckDB Backend

Run the same pipelines on DuckDB for SQL-based execution and native large-file handling:

import duckdb
from transformplan import TransformPlan, Col
from transformplan.backends.duckdb import DuckDBBackend

con = duckdb.connect()
rel = con.sql("SELECT * FROM 'patients.parquet'")

# Same plan — backend chosen at execution time
plan = (
    TransformPlan()
    .col_rename(column="PatientID", new_name="patient_id")
    .rows_filter(Col("age") >= 18)
    .math_round(column="score", decimals=2)
)

result, protocol = plan.process(rel, backend=DuckDBBackend(con))

Available Operations

Category Description Examples
col_ Column operations col_rename, col_drop, col_cast, col_add, col_select
math_ Arithmetic & scaling math_add, math_multiply, math_standardize, math_minmax, math_clamp
rows_ Row filtering & reshaping rows_filter, rows_drop_nulls, rows_sort, rows_unique, rows_pivot
str_ String operations str_lower, str_upper, str_strip, str_replace, str_split
dt_ Datetime operations dt_year, dt_month, dt_parse, dt_age_years, dt_diff_days
map_ Value mapping & encoding map_values, map_discretize, map_onehot, map_ordinal

Installation

pip install transformplan

Or with uv:

uv add transformplan

Development Setup

make install-dev   # Install with dev dependencies
make test          # Run the test suite
make lint          # Run ruff linting and pyright type checking
make format        # Fix import sorting and format code

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transformplan-0.1.3.tar.gz (114.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

transformplan-0.1.3-py3-none-any.whl (71.9 kB view details)

Uploaded Python 3

File details

Details for the file transformplan-0.1.3.tar.gz.

File metadata

  • Download URL: transformplan-0.1.3.tar.gz
  • Upload date:
  • Size: 114.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for transformplan-0.1.3.tar.gz
Algorithm Hash digest
SHA256 80d4455da26b83199ee5c238990170ec89c661f4b29bf024427530af900498a1
MD5 482278308aa94addf57c2c8fec821df1
BLAKE2b-256 b0197e3eaf3a28733e125a8d646936df702eb290cd5b701f3b3b1aa0bf912c60

See more details on using hashes here.

File details

Details for the file transformplan-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: transformplan-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 71.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.12 {"installer":{"name":"uv","version":"0.10.12","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"22.04","id":"jammy","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for transformplan-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 a6f7eeb2b9abcd59d5f12f991768cd6cb565c80166953fadb6b1732049505330
MD5 c94384994cc47c52c421dbc8550cd305
BLAKE2b-256 ef940944c73fe47c2af6687585ec57ae77d73e8fe51d7ed6d9b845ff2ce2ce72

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page