Skip to main content

Pipe based dataframe manipulation library that can also transform data on SQL databases

Project description

pydiverse.transform

CI

Pipe based dataframe manipulation library that can also transform data on SQL databases

This is an early stage version 0.x which lacks documentation. Please contact https://github.com/orgs/pydiverse/teams/code-owners if you like to become an early adopter or to contribute early stage usage examples.

Usage

pydiverse.transform can either be installed via pypi with pip install pydiverse-transform or via conda-forge with conda install pydiverse-transform -c conda-forge.

Example

This code illustrates how to use pydiverse.transform with pandas and SQL:

from pydiverse.transform import Table
from pydiverse.transform.lazy import SQLTableImpl
from pydiverse.transform.eager import PandasTableImpl
from pydiverse.transform.core.verbs import *
import pandas as pd
import sqlalchemy as sqa


def main():
    dfA = pd.DataFrame(
        {
            "x": [1],
            "y": [2],
        }
    )
    dfB = pd.DataFrame(
        {
            "a": [2, 1, 0, 1],
            "x": [1, 1, 2, 2],
        }
    )

    input1 = Table(PandasTableImpl("dfA", dfA))
    input2 = Table(PandasTableImpl("dfB", dfB))

    transform = (
        input1
        >> left_join(input2 >> select(), input1.x == input2.x)
        >> mutate(x5=input1.x * 5, a=input2.a)
    )
    out1 = transform >> collect()
    print("\nPandas based result:")
    print(out1)

    engine = sqa.create_engine("sqlite:///:memory:")
    dfA.to_sql("dfA", engine, index=False, if_exists="replace")
    dfB.to_sql("dfB", engine, index=False, if_exists="replace")
    input1 = Table(SQLTableImpl(engine, "dfA"))
    input2 = Table(SQLTableImpl(engine, "dfB"))
    transform = (
        input1
        >> left_join(input2 >> select(), input1.x == input2.x)
        >> mutate(x5=input1.x * 5, a=input2.a)
    )
    out2 = transform >> collect()
    print("\nSQL query:")
    print(transform >> build_query())
    print("\nSQL based result:")
    print(out2)

    out1 = out1.sort_values("a").reset_index(drop=True)
    out2 = out2.sort_values("a").reset_index(drop=True)
    assert len(out1.compare(out2)) == 0


if __name__ == "__main__":
    main()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pydiverse_transform-0.2.2.tar.gz (185.9 kB view details)

Uploaded Source

Built Distribution

pydiverse_transform-0.2.2-py3-none-any.whl (61.4 kB view details)

Uploaded Python 3

File details

Details for the file pydiverse_transform-0.2.2.tar.gz.

File metadata

  • Download URL: pydiverse_transform-0.2.2.tar.gz
  • Upload date:
  • Size: 185.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.0 CPython/3.12.7

File hashes

Hashes for pydiverse_transform-0.2.2.tar.gz
Algorithm Hash digest
SHA256 017151c342113fde3716a358be6e9a59bfaf9671c9983aba347420daf85fa434
MD5 91ed2f6e83e0fabeeb8d18e41b8c752e
BLAKE2b-256 f60331bb167ec075dc474951d5712acc992fc0e1503de0106118edd02ade1bb6

See more details on using hashes here.

File details

Details for the file pydiverse_transform-0.2.2-py3-none-any.whl.

File metadata

File hashes

Hashes for pydiverse_transform-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 a3e9da2dd2e7e242597b00148a3751b878d57bd56d1f776a7a7a416b186fa690
MD5 67c66fc588841d865c340b7c4292bb16
BLAKE2b-256 742d35d15d9f57801573d07fe29607e0cdead7e20529aadc600baed6993e0cf3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page