Skip to main content

Safe, reproducible data transformations with built-in auditing and validation

Project description

TransformPlan

TransformPlan: Auditable Data Transformation Pipelines

Python 3.10+ Coverage

Features

  • Declarative transformations: Build transformation pipelines using method chaining
  • Schema validation: Validate operations before execution with dry-run capability
  • Audit trails: Generate complete audit protocols with deterministic DataFrame hashing
  • Multi-backend support: Polars (default) and DuckDB backends with a pluggable Backend ABC
  • Serializable pipelines: Save and load transformation plans as JSON

Quick Example

from transformplan import TransformPlan, Col

# Build readable pipelines with 88 chainable operations
plan = (
    TransformPlan()
    # Standardize column names
    .col_rename(column="PatientID", new_name="patient_id")
    .col_rename(column="DOB", new_name="date_of_birth")
    .str_strip(column="patient_id")

    # Calculate derived values
    .dt_age_years(column="date_of_birth", new_column="age")
    .math_clamp(column="age", min_value=0, max_value=120)

    # Categorize patients age
    .map_discretize(column="age", bins=[18, 40, 65], labels=["young", "adult", "senior"], new_column="age_group")

    # Filter and clean
    .rows_filter(Col("age") >= 18)
    .rows_drop_nulls(columns=["patient_id", "age"])
    .col_drop(column="date_of_birth")
)

# Execute with schema validation — catch errors before they hit production
df_result, protocol = plan.process(df, validate=True)

# Serialize pipelines to JSON — version control your transformations
plan.to_json("patient_transform.json")

# Reload and reapply — reproducible results across environments
plan = TransformPlan.from_json("patient_transform.json")
df_result, protocol = plan.process(new_data)

Full Audit Trail — Every Step Tracked and Hashed

protocol.print(show_params=False)
======================================================================
TRANSFORM PROTOCOL
======================================================================
Input:  1000 rows × 5 cols  [a4f8b2c1]
Output: 847 rows × 5 cols   [e7d3f9a2]
Total time: 0.0247s
----------------------------------------------------------------------

#    Operation            Rows         Cols         Time       Hash
----------------------------------------------------------------------
0    input                1000         5            -          a4f8b2c1
1    col_rename           1000         5            0.0012s    b2e4a7f3
2    col_rename           1000         5            0.0008s    c9d1e5b8
3    str_strip            1000         5            0.0013s    c9d1e5b8        ○
4    dt_age_years         1000         6 (+1)       0.0041s    d4f2c8a1
5    math_clamp           1000         6            0.0015s    e1b7d3f9
6    map_discretize       1000         7 (+1)       0.0028s    f8a4c2e6
7    rows_filter          858 (-142)   7            0.0037s    a2e9f4b7
8    rows_drop_nulls      847 (-11)    7            0.0019s    b5c1d8e3
9    col_drop             847          6 (-1)       0.0006s    e7d3f9a2
======================================================================
○ = no effect (steps 3 did not change data)

DuckDB Backend

Run the same pipelines on DuckDB for SQL-based execution and native large-file handling:

import duckdb
from transformplan import TransformPlan, Col
from transformplan.backends.duckdb import DuckDBBackend

con = duckdb.connect()
rel = con.sql("SELECT * FROM 'patients.parquet'")

plan = (
    TransformPlan(backend=DuckDBBackend(con))
    .col_rename(column="PatientID", new_name="patient_id")
    .rows_filter(Col("age") >= 18)
    .math_round(column="score", decimals=2)
)

result, protocol = plan.process(rel)

Available Operations

Category Description Examples
col_ Column operations col_rename, col_drop, col_cast, col_add, col_select
math_ Arithmetic & scaling math_add, math_multiply, math_standardize, math_minmax, math_clamp
rows_ Row filtering & reshaping rows_filter, rows_drop_nulls, rows_sort, rows_unique, rows_pivot
str_ String operations str_lower, str_upper, str_strip, str_replace, str_split
dt_ Datetime operations dt_year, dt_month, dt_parse, dt_age_years, dt_diff_days
map_ Value mapping & encoding map_values, map_discretize, map_onehot, map_ordinal

Installation

pip install transformplan

Or with uv:

uv add transformplan

Development Setup

make install-dev   # Install with dev dependencies
make test          # Run the test suite
make lint          # Run ruff linting and pyright type checking
make format        # Fix import sorting and format code

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transformplan-0.1.2.tar.gz (110.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

transformplan-0.1.2-py3-none-any.whl (69.5 kB view details)

Uploaded Python 3

File details

Details for the file transformplan-0.1.2.tar.gz.

File metadata

  • Download URL: transformplan-0.1.2.tar.gz
  • Upload date:
  • Size: 110.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for transformplan-0.1.2.tar.gz
Algorithm Hash digest
SHA256 6631e48c08156302c79043d35c5d40a2057f8d74e937edd1a002d23478149bfc
MD5 8143862a1eb5333a9936dd5baa4dcb76
BLAKE2b-256 05992bb0b6ee664571c3df82e47f3f0c7aa487b871a1604ecbff97c40ef78dbd

See more details on using hashes here.

File details

Details for the file transformplan-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: transformplan-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 69.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.10.11 {"installer":{"name":"uv","version":"0.10.11","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for transformplan-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7274c75e701762fb8babaf692e7d25d0d39f73d4c901cfc2bbb8b50ed9ea10b7
MD5 460715d0e5853bcc576c14c8b23c954b
BLAKE2b-256 de05f4aaf73f82732ddd08ab72ef484e053e415f2766fad709168d2ae5c5b215

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page