Pipe based dataframe manipulation library that can also transform data on SQL databases
Project description
pydiverse.transform
Pipe based dataframe manipulation library that can also transform data on SQL databases
This is an early stage version 0.x, however, it is already used in real projects. We are happy to receive your feedback as issues on the GitHub repo. Feel free to also comment on existing issues to extend them to your needs or to add solution ideas.
Usage
pydiverse.transform can either be installed via pypi with pip install pydiverse-transform or via conda-forge
with conda install pydiverse-transform -c conda-forge. Our recommendation would be
to use pixi which is also based on conda-forge:
mkdir my_project
pixi init
pixi add pydiverse-transform
With pixi, you run python like this:
pixi run python -c 'import pydiverse.transform'
or this:
pixi run python my_script.py
Example
This code illustrates how to use pydiverse.transform with pandas and SQL:
from pydiverse.transform import Table
from pydiverse.transform.lazy import SQLTableImpl
from pydiverse.transform.eager import PandasTableImpl
from pydiverse.transform.core.verbs import *
import pandas as pd
import sqlalchemy as sqa
def main():
dfA = pd.DataFrame(
{
"x": [1],
"y": [2],
}
)
dfB = pd.DataFrame(
{
"a": [2, 1, 0, 1],
"x": [1, 1, 2, 2],
}
)
input1 = Table(PandasTableImpl("dfA", dfA))
input2 = Table(PandasTableImpl("dfB", dfB))
transform = (
input1
>> left_join(input2 >> select(), input1.x == input2.x)
>> mutate(x5=input1.x * 5, a=input2.a)
)
out1 = transform >> collect()
print("\nPandas based result:")
print(out1)
engine = sqa.create_engine("sqlite:///:memory:")
dfA.to_sql("dfA", engine, index=False, if_exists="replace")
dfB.to_sql("dfB", engine, index=False, if_exists="replace")
input1 = Table(SQLTableImpl(engine, "dfA"))
input2 = Table(SQLTableImpl(engine, "dfB"))
transform = (
input1
>> left_join(input2 >> select(), input1.x == input2.x)
>> mutate(x5=input1.x * 5, a=input2.a)
)
out2 = transform >> collect()
print("\nSQL query:")
print(transform >> build_query())
print("\nSQL based result:")
print(out2)
out1 = out1.sort_values("a").reset_index(drop=True)
out2 = out2.sort_values("a").reset_index(drop=True)
assert len(out1.compare(out2)) == 0
if __name__ == "__main__":
main()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pydiverse_transform-0.7.0.tar.gz.
File metadata
- Download URL: pydiverse_transform-0.7.0.tar.gz
- Upload date:
- Size: 308.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e5f90805f637e0af8e8cc5ace5fb5196512716a3da3dd746941c5736fe3a2dc7
|
|
| MD5 |
0b9760a24a89e23aadea46dd1aae669f
|
|
| BLAKE2b-256 |
3a8dc9da3acbffb88cf46952cb3a6fa7eb79a2c012cfce31739353805562e0af
|
Provenance
The following attestation bundles were made for pydiverse_transform-0.7.0.tar.gz:
Publisher:
release.yml on pydiverse/pydiverse.transform
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pydiverse_transform-0.7.0.tar.gz -
Subject digest:
e5f90805f637e0af8e8cc5ace5fb5196512716a3da3dd746941c5736fe3a2dc7 - Sigstore transparency entry: 805054530
- Sigstore integration time:
-
Permalink:
pydiverse/pydiverse.transform@2be867d3acf7c7d52387fb3135c0a981dc4e4538 -
Branch / Tag:
refs/tags/0.7.0 - Owner: https://github.com/pydiverse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@2be867d3acf7c7d52387fb3135c0a981dc4e4538 -
Trigger Event:
push
-
Statement type:
File details
Details for the file pydiverse_transform-0.7.0-py3-none-any.whl.
File metadata
- Download URL: pydiverse_transform-0.7.0-py3-none-any.whl
- Upload date:
- Size: 112.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
69dbcea153c5b4a7531a6619d728580339727be517c6c05121645f0716f686c9
|
|
| MD5 |
92e3c21f5173d3931aa9ae4ba589c34c
|
|
| BLAKE2b-256 |
cc45d31419ce10ee86c154ae189731d14b3589abed15ef18b9772fd9b2d03a8c
|
Provenance
The following attestation bundles were made for pydiverse_transform-0.7.0-py3-none-any.whl:
Publisher:
release.yml on pydiverse/pydiverse.transform
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
pydiverse_transform-0.7.0-py3-none-any.whl -
Subject digest:
69dbcea153c5b4a7531a6619d728580339727be517c6c05121645f0716f686c9 - Sigstore transparency entry: 805054534
- Sigstore integration time:
-
Permalink:
pydiverse/pydiverse.transform@2be867d3acf7c7d52387fb3135c0a981dc4e4538 -
Branch / Tag:
refs/tags/0.7.0 - Owner: https://github.com/pydiverse
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@2be867d3acf7c7d52387fb3135c0a981dc4e4538 -
Trigger Event:
push
-
Statement type: