Skip to main content

Pipe based dataframe manipulation library that can also transform data on SQL databases

Project description

pydiverse.transform

CI

Pipe based dataframe manipulation library that can also transform data on SQL databases

This is an early stage version 0.x, however, it is already used in real projects. We are happy to receive your feedback as issues on the GitHub repo. Feel free to also comment on existing issues to extend them to your needs or to add solution ideas.

Usage

pydiverse.transform can either be installed via pypi with pip install pydiverse-transform or via conda-forge with conda install pydiverse-transform -c conda-forge. Our recommendation would be to use pixi which is also based on conda-forge:

mkdir my_project
pixi init
pixi add pydiverse-transform

With pixi, you run python like this:

pixi run python -c 'import pydiverse.transform'

or this:

pixi run python my_script.py

Example

This code illustrates how to use pydiverse.transform with pandas and SQL:

from pydiverse.transform import Table
from pydiverse.transform.lazy import SQLTableImpl
from pydiverse.transform.eager import PandasTableImpl
from pydiverse.transform.core.verbs import *
import pandas as pd
import sqlalchemy as sqa


def main():
    dfA = pd.DataFrame(
        {
            "x": [1],
            "y": [2],
        }
    )
    dfB = pd.DataFrame(
        {
            "a": [2, 1, 0, 1],
            "x": [1, 1, 2, 2],
        }
    )

    input1 = Table(PandasTableImpl("dfA", dfA))
    input2 = Table(PandasTableImpl("dfB", dfB))

    transform = (
        input1
        >> left_join(input2 >> select(), input1.x == input2.x)
        >> mutate(x5=input1.x * 5, a=input2.a)
    )
    out1 = transform >> collect()
    print("\nPandas based result:")
    print(out1)

    engine = sqa.create_engine("sqlite:///:memory:")
    dfA.to_sql("dfA", engine, index=False, if_exists="replace")
    dfB.to_sql("dfB", engine, index=False, if_exists="replace")
    input1 = Table(SQLTableImpl(engine, "dfA"))
    input2 = Table(SQLTableImpl(engine, "dfB"))
    transform = (
        input1
        >> left_join(input2 >> select(), input1.x == input2.x)
        >> mutate(x5=input1.x * 5, a=input2.a)
    )
    out2 = transform >> collect()
    print("\nSQL query:")
    print(transform >> build_query())
    print("\nSQL based result:")
    print(out2)

    out1 = out1.sort_values("a").reset_index(drop=True)
    out2 = out2.sort_values("a").reset_index(drop=True)
    assert len(out1.compare(out2)) == 0


if __name__ == "__main__":
    main()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydiverse_transform-0.7.0.tar.gz (308.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pydiverse_transform-0.7.0-py3-none-any.whl (112.6 kB view details)

Uploaded Python 3

File details

Details for the file pydiverse_transform-0.7.0.tar.gz.

File metadata

  • Download URL: pydiverse_transform-0.7.0.tar.gz
  • Upload date:
  • Size: 308.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for pydiverse_transform-0.7.0.tar.gz
Algorithm Hash digest
SHA256 e5f90805f637e0af8e8cc5ace5fb5196512716a3da3dd746941c5736fe3a2dc7
MD5 0b9760a24a89e23aadea46dd1aae669f
BLAKE2b-256 3a8dc9da3acbffb88cf46952cb3a6fa7eb79a2c012cfce31739353805562e0af

See more details on using hashes here.

Provenance

The following attestation bundles were made for pydiverse_transform-0.7.0.tar.gz:

Publisher: release.yml on pydiverse/pydiverse.transform

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pydiverse_transform-0.7.0-py3-none-any.whl.

File metadata

File hashes

Hashes for pydiverse_transform-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 69dbcea153c5b4a7531a6619d728580339727be517c6c05121645f0716f686c9
MD5 92e3c21f5173d3931aa9ae4ba589c34c
BLAKE2b-256 cc45d31419ce10ee86c154ae189731d14b3589abed15ef18b9772fd9b2d03a8c

See more details on using hashes here.

Provenance

The following attestation bundles were made for pydiverse_transform-0.7.0-py3-none-any.whl:

Publisher: release.yml on pydiverse/pydiverse.transform

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page