Skip to main content

Safe, reproducible data transformations with built-in auditing and validation

Project description

TransformPlan

TransformPlan: Auditable Data Transformation Pipelines

Python 3.10+ Coverage

Features

  • Declarative transformations: Build transformation pipelines using method chaining
  • Schema validation: Validate operations before execution with dry-run capability
  • Audit trails: Generate complete audit protocols with deterministic DataFrame hashing
  • Multi-backend support: Works with both Polars (primary) and Pandas DataFrames
  • Serializable pipelines: Save and load transformation plans as JSON

Quick Example

from transformplan import TransformPlan, Col

# Build readable pipelines with 75+ chainable operations
plan = (
    TransformPlan()
    # Standardize column names
    .col_rename(column="PatientID", new_name="patient_id")
    .col_rename(column="DOB", new_name="date_of_birth")
    .str_strip(column="patient_id")

    # Calculate derived values
    .dt_age_years(column="date_of_birth", new_column="age")
    .math_clamp(column="age", min_value=0, max_value=120)

    # Categorize patients age
    .map_discretize(column="age", bins=[18, 40, 65], labels=["young", "adult", "senior"], new_column="age_group")

    # Filter and clean
    .rows_filter(Col("age") >= 18)
    .rows_drop_nulls(columns=["patient_id", "age"])
    .col_drop(column="date_of_birth")
)

# Execute with schema validation — catch errors before they hit production
df_result, protocol = plan.process(df, validate=True)

# Serialize pipelines to JSON — version control your transformations
plan.to_json("patient_transform.json")

# Reload and reapply — reproducible results across environments
plan = TransformPlan.from_json("patient_transform.json")
df_result, protocol = plan.process(new_data)

Full Audit Trail — Every Step Tracked and Hashed

protocol.print(show_params=False)
======================================================================
TRANSFORM PROTOCOL
======================================================================
Input:  1000 rows × 5 cols  [a4f8b2c1]
Output: 847 rows × 5 cols   [e7d3f9a2]
Total time: 0.0247s
----------------------------------------------------------------------

#    Operation            Rows         Cols         Time       Hash
----------------------------------------------------------------------
0    input                1000         5            -          a4f8b2c1
1    col_rename           1000         5            0.0012s    b2e4a7f3
2    col_rename           1000         5            0.0008s    c9d1e5b8
3    str_strip            1000         5            0.0013s    c9d1e5b8        ○
4    dt_age_years         1000         6 (+1)       0.0041s    d4f2c8a1
5    math_clamp           1000         6            0.0015s    e1b7d3f9
6    map_discretize       1000         7 (+1)       0.0028s    f8a4c2e6
7    rows_filter          858 (-142)   7            0.0037s    a2e9f4b7
8    rows_drop_nulls      847 (-11)    7            0.0019s    b5c1d8e3
9    col_drop             847          6 (-1)       0.0006s    e7d3f9a2
======================================================================
○ = no effect (steps 3 did not change data)

Available Operations

Category Description Examples
col_ Column operations col_rename, col_drop, col_cast, col_add, col_select
math_ Arithmetic & scaling math_add, math_multiply, math_standardize, math_minmax, math_clamp
rows_ Row filtering & reshaping rows_filter, rows_drop_nulls, rows_sort, rows_unique, rows_pivot
str_ String operations str_lower, str_upper, str_strip, str_replace, str_split
dt_ Datetime operations dt_year, dt_month, dt_parse, dt_age_years, dt_diff_days
map_ Value mapping & encoding map_values, map_discretize, map_onehot, map_ordinal

Installation

pip install transformplan

Or with uv:

uv add transformplan

Development Setup

make install-dev   # Install with dev dependencies
make test          # Run the test suite
make lint          # Run ruff linting and pyright type checking
make format        # Fix import sorting and format code

License

MIT License - see LICENSE for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transformplan-0.1.0.tar.gz (71.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

transformplan-0.1.0-py3-none-any.whl (34.6 kB view details)

Uploaded Python 3

File details

Details for the file transformplan-0.1.0.tar.gz.

File metadata

  • Download URL: transformplan-0.1.0.tar.gz
  • Upload date:
  • Size: 71.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for transformplan-0.1.0.tar.gz
Algorithm Hash digest
SHA256 19038c2760488ac96dd22770fc51f4a15e8c0647dc897064d1c3bd5df385e3fb
MD5 da37cbb91a913650d7a4bf1b6b8e2add
BLAKE2b-256 87a5876c33de080c539fd2f7c5be4a453f2291656b8721572df35d84ba724f44

See more details on using hashes here.

File details

Details for the file transformplan-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: transformplan-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 34.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.28 {"installer":{"name":"uv","version":"0.9.28","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for transformplan-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 81fee4c8bf1e86a3b54feecd821f44d528fc98f6f78ca1b869ed842b5a9be0f0
MD5 a2bfc26ab071b7a70e564649161a5999
BLAKE2b-256 bddc9da585868a6c2b25638e4da22398d1ef3fae07211cbe7396096955b4befc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page