Skip to main content

Safe, reproducible data transformations with built-in auditing and validation

Project description

TransformPlan

TransformPlan: Auditable Data Transformation Pipelines

Python 3.10+ Coverage

Features

  • Declarative transformations: Build transformation pipelines using method chaining
  • Schema validation: Validate operations before execution with dry-run capability
  • Audit trails: Generate complete audit protocols with deterministic DataFrame hashing
  • Multi-backend support: Works with both Polars (primary) and Pandas DataFrames
  • Serializable pipelines: Save and load transformation plans as JSON

Quick Example

from transformplan import TransformPlan, Col

# Build readable pipelines with 75+ chainable operations
plan = (
    TransformPlan()
    # Standardize column names
    .col_rename(column="PatientID", new_name="patient_id")
    .col_rename(column="DOB", new_name="date_of_birth")
    .str_strip(column="patient_id")

    # Calculate derived values
    .dt_age_years(column="date_of_birth", new_column="age")
    .math_clamp(column="age", min_value=0, max_value=120)

    # Categorize patients age
    .map_discretize(column="age", bins=[18, 40, 65], labels=["young", "adult", "senior"], new_column="age_group")

    # Filter and clean
    .rows_filter(Col("age") >= 18)
    .rows_drop_nulls(columns=["patient_id", "age"])
    .col_drop(column="date_of_birth")
)

# Execute with schema validation — catch errors before they hit production
df_result, protocol = plan.process(df, validate=True)

# Serialize pipelines to JSON — version control your transformations
plan.to_json("patient_transform.json")

# Reload and reapply — reproducible results across environments
plan = TransformPlan.from_json("patient_transform.json")
df_result, protocol = plan.process(new_data)

Full Audit Trail — Every Step Tracked and Hashed

protocol.print(show_params=False)
======================================================================
TRANSFORM PROTOCOL
======================================================================
Input:  1000 rows × 5 cols  [a4f8b2c1]
Output: 847 rows × 5 cols   [e7d3f9a2]
Total time: 0.0247s
----------------------------------------------------------------------

#    Operation            Rows         Cols         Time       Hash
----------------------------------------------------------------------
0    input                1000         5            -          a4f8b2c1
1    col_rename           1000         5            0.0012s    b2e4a7f3
2    col_rename           1000         5            0.0008s    c9d1e5b8
3    str_strip            1000         5            0.0013s    c9d1e5b8        ○
4    dt_age_years         1000         6 (+1)       0.0041s    d4f2c8a1
5    math_clamp           1000         6            0.0015s    e1b7d3f9
6    map_discretize       1000         7 (+1)       0.0028s    f8a4c2e6
7    rows_filter          858 (-142)   7            0.0037s    a2e9f4b7
8    rows_drop_nulls      847 (-11)    7            0.0019s    b5c1d8e3
9    col_drop             847          6 (-1)       0.0006s    e7d3f9a2
======================================================================
○ = no effect (steps 3 did not change data)

Available Operations

Category Description Examples
col_ Column operations col_rename, col_drop, col_cast, col_add, col_select
math_ Arithmetic & scaling math_add, math_multiply, math_standardize, math_minmax, math_clamp
rows_ Row filtering & reshaping rows_filter, rows_drop_nulls, rows_sort, rows_unique, rows_pivot
str_ String operations str_lower, str_upper, str_strip, str_replace, str_split
dt_ Datetime operations dt_year, dt_month, dt_parse, dt_age_years, dt_diff_days
map_ Value mapping & encoding map_values, map_discretize, map_onehot, map_ordinal

Installation

pip install transformplan

Or with uv:

uv add transformplan

Development Setup

make install-dev   # Install with dev dependencies
make test          # Run the test suite
make lint          # Run ruff linting and pyright type checking
make format        # Fix import sorting and format code

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transformplan-0.1.1.tar.gz (85.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

transformplan-0.1.1-py3-none-any.whl (52.8 kB view details)

Uploaded Python 3

File details

Details for the file transformplan-0.1.1.tar.gz.

File metadata

  • Download URL: transformplan-0.1.1.tar.gz
  • Upload date:
  • Size: 85.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for transformplan-0.1.1.tar.gz
Algorithm Hash digest
SHA256 b03b598a0a3319ff290d8e24e0a6efc6395ee23a58409dd91f579704868232e1
MD5 3a4c15efdc9a27ed49cabf8bf2f5d745
BLAKE2b-256 a62ddb68fb8d12e25a695281d3f7ce739068b824194f63e6586d3e7debfdfa2a

See more details on using hashes here.

File details

Details for the file transformplan-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: transformplan-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 52.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for transformplan-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 5d211519a6d910456edfa700939f1e07de5ab95065cdb861cb47fcf850ce0467
MD5 33e0ca9bc37bd0ceb78d9b4c164d6b5f
BLAKE2b-256 27abadad69e259a72dd2a0392de5d519cd5d9587d6bf8a5dd5cc45047fadb709

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page