AutoMapper for Spark

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

SparkAutoMapper

Fluent API to map data from one view to another in Spark.

Uses native Spark functions underneath so it is just as fast as hand writing the transformations.

Since this is just Python, you can use any Python editor. Since everything is typed using Python typings, most editors will auto-complete and warn you when you do something wrong

Usage

pip install sparkautomapper

Documentation

https://icanbwell.github.io/SparkAutoMapper/

SparkAutoMapper input and output

You can pass either a dataframe to SparkAutoMapper or specify the name of a Spark view to read from.

You can receive the result as a dataframe or (optionally) pass in the name of a view where you want the result.

Dynamic Typing Examples

Set a column in destination to a text value (read from pass in data frame and return the result in a new dataframe)

Set a column in destination to a text value

from spark_auto_mapper.automappers.automapper import AutoMapper

mapper = AutoMapper(
    keys=["member_id"]
).columns(
    dst1="hello"
)

Set a column in destination to a text value (read from a Spark view and put result in another Spark view)

Set a column in destination to a text value

from spark_auto_mapper.automappers.automapper import AutoMapper

mapper = AutoMapper(
    view="members",
    source_view="patients",
    keys=["member_id"]
).columns(
    dst1="hello"
)

Set a column in destination to an int value

Set a column in destination to a text value

from spark_auto_mapper.automappers.automapper import AutoMapper

mapper = AutoMapper(
    view="members",
    source_view="patients",
    keys=["member_id"]
).columns(
    dst1=1050
)

Copy a column (src1) from source_view to destination view column (dst1)

from spark_auto_mapper.automappers.automapper import AutoMapper
from spark_auto_mapper.helpers.automapper_helpers import AutoMapperHelpers as A

mapper = AutoMapper(
    view="members",
    source_view="patients",
    keys=["member_id"]
).columns(
    dst1=A.column("src1")
)

Or you can use the shortcut for specifying a column (wrap column name in [])

from spark_auto_mapper.automappers.automapper import AutoMapper

mapper = AutoMapper(
    view="members",
    source_view="patients",
    keys=["member_id"]
).columns(
    dst1="[src1]"
)

Convert data type for a column (or string literal)

from spark_auto_mapper.automappers.automapper import AutoMapper
from spark_auto_mapper.helpers.automapper_helpers import AutoMapperHelpers as A

mapper = AutoMapper(
    view="members",
    source_view="patients",
    keys=["member_id"]
).columns(
    birthDate=A.date(A.column("date_of_birth"))
)

Use a Spark SQL Expression (Any valid Spark SQL expression can be used)

from spark_auto_mapper.automappers.automapper import AutoMapper
from spark_auto_mapper.helpers.automapper_helpers import AutoMapperHelpers as A

mapper = AutoMapper(
    view="members",
    source_view="patients",
    keys=["member_id"]
).columns(
    gender=A.expression(
    """
    CASE
        WHEN `Member Sex` = 'F' THEN 'female'
        WHEN `Member Sex` = 'M' THEN 'male'
        ELSE 'other'
    END
    """
    )
)

Specify multiple transformations

from spark_auto_mapper.automappers.automapper import AutoMapper
from spark_auto_mapper.helpers.automapper_helpers import AutoMapperHelpers as A

mapper = AutoMapper(
    view="members",
    source_view="patients",
    keys=["member_id"]
).columns(
    dst1="[src1]",
    birthDate=A.date("[date_of_birth]"),
    gender=A.expression(
                """
    CASE
        WHEN `Member Sex` = 'F' THEN 'female'
        WHEN `Member Sex` = 'M' THEN 'male'
        ELSE 'other'
    END
    """
    )
)

Use variables or parameters

from spark_auto_mapper.automappers.automapper import AutoMapper
from spark_auto_mapper.helpers.automapper_helpers import AutoMapperHelpers as A

def mapping(parameters: dict):
    mapper = AutoMapper(
        view="members",
        source_view="patients",
        keys=["member_id"]
    ).columns(
        dst1=A.column(parameters["my_column_name"])
    )

Use conditional logic

from spark_auto_mapper.automappers.automapper import AutoMapper
from spark_auto_mapper.helpers.automapper_helpers import AutoMapperHelpers as A

def mapping(parameters: dict):
    mapper = AutoMapper(
        view="members",
        source_view="patients",
        keys=["member_id"]
    ).columns(
        dst1=A.column(parameters["my_column_name"])
    )
    
    if parameters["customer"] == "Microsoft":
        mapper = mapper.columns(
            important_customer=1,
            customer_name=parameters["customer"]
        )
    return mapper

Using nested array columns

from spark_auto_mapper.automappers.automapper import AutoMapper
from spark_auto_mapper.helpers.automapper_helpers import AutoMapperHelpers as A
mapper = AutoMapper(
    view="members",
    source_view="patients",
    keys=["member_id"]
).withColumn(
    dst2=A.list(
        [
            "address1",
            "address2"
        ]
    )
)

Using nested struct columns

from spark_auto_mapper.automappers.automapper import AutoMapper
from spark_auto_mapper.helpers.automapper_helpers import AutoMapperHelpers as A
mapper = AutoMapper(
    view="members",
    source_view="patients",
    keys=["member_id"]
).columns(
    dst2=A.complex(
        use="usual",
        family="imran"
    )
)

Using lists of structs

from spark_auto_mapper.automappers.automapper import AutoMapper
from spark_auto_mapper.helpers.automapper_helpers import AutoMapperHelpers as A
mapper = AutoMapper(
    view="members",
    source_view="patients",
    keys=["member_id"]
).columns(
    dst2=A.list(
        [
            A.complex(
                use="usual",
                family="imran"
            ),
            A.complex(
                use="usual",
                family="[last_name]"
            )
        ]
    )
)

Executing the AutoMapper

spark.createDataFrame(
    [
        (1, 'Qureshi', 'Imran'),
        (2, 'Vidal', 'Michael'),
    ],
    ['member_id', 'last_name', 'first_name']
).createOrReplaceTempView("patients")

source_df: DataFrame = spark.table("patients")

df = source_df.select("member_id")
df.createOrReplaceTempView("members")

result_df: DataFrame = mapper.transform(df=df)

Statically Typed Examples

To improve the auto-complete and syntax checking even more, you can define Complex types:

Define a custom data type:

from spark_auto_mapper.type_definitions.automapper_defined_types import AutoMapperTextInputType
from spark_auto_mapper.helpers.automapper_value_parser import AutoMapperValueParser
from spark_auto_mapper.data_types.date import AutoMapperDateDataType
from spark_auto_mapper.data_types.list import AutoMapperList
from spark_auto_mapper_fhir.fhir_types.automapper_fhir_data_type_complex_base import AutoMapperFhirDataTypeComplexBase


class AutoMapperFhirDataTypePatient(AutoMapperFhirDataTypeComplexBase):
    # noinspection PyPep8Naming
    def __init__(self,
                 id_: AutoMapperTextInputType,
                 birthDate: AutoMapperDateDataType,
                 name: AutoMapperList,
                 gender: AutoMapperTextInputType
                 ) -> None:
        super().__init__()
        self.value = dict(
            id=AutoMapperValueParser.parse_value(id_),
            birthDate=AutoMapperValueParser.parse_value(birthDate),
            name=AutoMapperValueParser.parse_value(name),
            gender=AutoMapperValueParser.parse_value(gender)
        )

Now you get auto-complete and syntax checking:

from spark_auto_mapper.automappers.automapper import AutoMapper
from spark_auto_mapper.helpers.automapper_helpers import AutoMapperHelpers as A
mapper = AutoMapperFhir(
    view="members",
    source_view="patients",
    keys=["member_id"]
).withResource(
    resource=F.patient(
        id_=A.column("a.member_id"),
        birthDate=A.date(
            A.column("date_of_birth")
        ),
        name=A.list(
            F.human_name(
                use="usual",
                family=A.column("last_name")
            )
        ),
        gender="female"
    )
)

Publishing a new package

Edit VERSION to increment the version
Create a new release
The GitHub Action should automatically kick in and publish the package
You can see the status in the Actions tab

Project details

These details have not been verified by PyPI

Project links

Homepage

Development Status
- 4 - Beta
License
- OSI Approved :: Apache Software License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

3.0.6

Sep 11, 2025

3.0.5

Sep 10, 2025

3.0.4

Aug 16, 2025

3.0.3

May 15, 2025

3.0.2

Oct 31, 2024

3.0.1

Aug 23, 2024

3.0.0

Aug 21, 2024

2.0.8

Aug 15, 2024

2.0.7

Apr 26, 2024

2.0.6

Oct 6, 2023

2.0.5

Oct 5, 2023

2.0.4

Oct 4, 2023

2.0.3

Oct 4, 2023

2.0.2

Oct 3, 2023

2.0.1

Aug 8, 2023

2.0.0

Jul 27, 2023

1.1.2

Jul 13, 2023

1.1.1

Feb 3, 2023

1.1.0

Dec 21, 2022

1.0.20

Nov 30, 2022

1.0.19

Oct 19, 2022

1.0.18

Oct 16, 2022

1.0.17

Sep 10, 2022

1.0.16a1 pre-release

Jul 21, 2022

1.0.15

May 11, 2022

1.0.14

May 11, 2022

1.0.13

May 11, 2022

1.0.12

Apr 26, 2022

This version

1.0.11

Apr 26, 2022

1.0.10

Apr 6, 2022

1.0.9

Apr 3, 2022

1.0.8

Mar 22, 2022

1.0.7

Mar 22, 2022

1.0.6

Mar 21, 2022

1.0.5

Mar 15, 2022

1.0.4

Mar 14, 2022

1.0.3

Mar 12, 2022

1.0.2b1 pre-release

Mar 12, 2022

1.0.1

Mar 10, 2022

1.0.0

Mar 9, 2022

0.2.60

Mar 5, 2022

0.2.59

Mar 4, 2022

0.2.58

Mar 4, 2022

0.2.57

Feb 23, 2022

0.2.56

Feb 23, 2022

0.2.55

Feb 23, 2022

0.2.54

Feb 20, 2022

0.2.53

Nov 12, 2021

0.2.52

Nov 12, 2021

0.2.51

Nov 9, 2021

0.2.50

Aug 25, 2021

0.2.49

Aug 23, 2021

0.2.48

Aug 17, 2021

0.2.47

Aug 12, 2021

0.2.46

Aug 12, 2021

0.2.45

Aug 2, 2021

0.2.44

Jul 20, 2021

0.2.43

Jul 14, 2021

0.2.42

Jul 8, 2021

0.2.41

Jul 8, 2021

0.2.40

Jun 30, 2021

0.2.39

Jun 16, 2021

0.2.38

Jun 16, 2021

0.2.37

Jun 13, 2021

0.2.36

Jun 12, 2021

0.2.35

Jun 7, 2021

0.2.34

Jun 6, 2021

0.2.33

Jun 6, 2021

0.2.32

Jun 5, 2021

0.2.31

Jun 4, 2021

0.2.30

May 28, 2021

0.2.29

May 18, 2021

0.2.28

May 18, 2021

0.2.27

Apr 30, 2021

0.2.26

Apr 26, 2021

0.2.25

Apr 25, 2021

0.2.24

Apr 23, 2021

0.2.23

Apr 16, 2021

0.2.22

Apr 13, 2021

0.2.21

Apr 12, 2021

0.2.20

Apr 5, 2021

0.2.19

Mar 31, 2021

0.2.18

Mar 31, 2021

0.2.17

Mar 24, 2021

0.2.16

Mar 11, 2021

0.2.15

Mar 11, 2021

0.2.14

Mar 10, 2021

0.2.13

Mar 9, 2021

0.2.12

Mar 7, 2021

0.2.11

Mar 7, 2021

0.2.10

Mar 7, 2021

0.2.9

Mar 6, 2021

0.2.8

Mar 6, 2021

0.2.7

Mar 6, 2021

0.2.6

Feb 20, 2021

0.2.5

Feb 8, 2021

0.2.4

Jan 29, 2021

0.2.3

Jan 27, 2021

0.2.2

Jan 22, 2021

0.2.1

Jan 22, 2021

0.1.99

Jan 22, 2021

0.1.98

Jan 21, 2021

0.1.97

Jan 15, 2021

0.1.96

Jan 15, 2021

0.1.95

Jan 15, 2021

0.1.94

Jan 11, 2021

0.1.93

Jan 11, 2021

0.1.92

Jan 8, 2021

0.1.91

Jan 8, 2021

0.1.90

Jan 7, 2021

0.1.89

Jan 7, 2021

0.1.88

Jan 7, 2021

0.1.87

Jan 7, 2021

0.1.86

Jan 7, 2021

0.1.85

Jan 7, 2021

0.1.84

Jan 6, 2021

0.1.83

Jan 6, 2021

0.1.82

Jan 6, 2021

0.1.81

Jan 5, 2021

0.1.80

Jan 5, 2021

0.1.79

Jan 5, 2021

0.1.78

Dec 8, 2020

0.1.77

Dec 1, 2020

0.1.76

Dec 1, 2020

0.1.75

Dec 1, 2020

0.1.74

Nov 24, 2020

0.1.73

Nov 24, 2020

0.1.72

Nov 23, 2020

0.1.71

Nov 23, 2020

0.1.70

Nov 23, 2020

0.1.69

Nov 23, 2020

0.1.68

Nov 23, 2020

0.1.67

Nov 23, 2020

0.1.66

Nov 23, 2020

0.1.65

Nov 23, 2020

0.1.64

Nov 23, 2020

0.1.63

Nov 23, 2020

0.1.62

Nov 22, 2020

0.1.61

Nov 22, 2020

0.1.60

Nov 22, 2020

0.1.59

Nov 22, 2020

0.1.58

Nov 21, 2020

0.1.57

Nov 21, 2020

0.1.56

Nov 21, 2020

0.1.55

Nov 20, 2020

0.1.54

Nov 20, 2020

0.1.53

Nov 20, 2020

0.1.52

Nov 20, 2020

0.1.51

Nov 20, 2020

0.1.50

Nov 20, 2020

0.1.49

Nov 17, 2020

0.1.48

Nov 17, 2020

0.1.47

Nov 16, 2020

0.1.46

Nov 16, 2020

0.1.45

Nov 15, 2020

0.1.44

Nov 15, 2020

0.1.43

Nov 15, 2020

0.1.42

Nov 8, 2020

0.1.41

Nov 6, 2020

0.1.40

Nov 5, 2020

0.1.39

Nov 5, 2020

0.1.38

Nov 5, 2020

0.1.37

Nov 4, 2020

0.1.36

Nov 3, 2020

0.1.35

Nov 3, 2020

0.1.34

Nov 3, 2020

0.1.33

Oct 31, 2020

0.1.32

Oct 28, 2020

0.1.31

Oct 28, 2020

0.1.30

Oct 28, 2020

0.1.29

Oct 22, 2020

0.1.28

Oct 19, 2020

0.1.27

Oct 17, 2020

0.1.25

Oct 12, 2020

0.1.24

Oct 12, 2020

0.1.23

Oct 11, 2020

0.1.22

Oct 11, 2020

0.1.21

Oct 11, 2020

0.1.20

Oct 11, 2020

0.1.19

Oct 11, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparkautomapper-1.0.11.tar.gz (69.6 kB view details)

Uploaded Apr 26, 2022 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sparkautomapper-1.0.11-py3-none-any.whl (200.6 kB view details)

Uploaded Apr 26, 2022 Python 3

File details

Details for the file sparkautomapper-1.0.11.tar.gz.

File metadata

Download URL: sparkautomapper-1.0.11.tar.gz
Upload date: Apr 26, 2022
Size: 69.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.0 CPython/3.7.12

File hashes

Hashes for sparkautomapper-1.0.11.tar.gz
Algorithm	Hash digest
SHA256	`b69de2a72960a62676f5abc1aae623ac4bd501634a9cef799930e1f6c97c2e11`
MD5	`20479d0b41eb0d6581ca8ff971820e9f`
BLAKE2b-256	`69d16f94d2d99a30e08811f6f97509841c8201d7d4d5e9579e876dc5b175eaca`

See more details on using hashes here.

File details

Details for the file sparkautomapper-1.0.11-py3-none-any.whl.

File metadata

Download URL: sparkautomapper-1.0.11-py3-none-any.whl
Upload date: Apr 26, 2022
Size: 200.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.0 CPython/3.7.12

File hashes

Hashes for sparkautomapper-1.0.11-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d596f6e85ff81046c9268aad1d6408ceb455fb88c7285a7e53e7818b2f0fc43e`
MD5	`f5be3937602e03f131c0c2eb3c28767c`
BLAKE2b-256	`fced5a1e4e4611d850c81e9978afea66b2f4458708261f2dc28b63d1c38589fb`

See more details on using hashes here.

sparkautomapper 1.0.11

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

SparkAutoMapper

Usage

Documentation

SparkAutoMapper input and output

Dynamic Typing Examples

Set a column in destination to a text value (read from pass in data frame and return the result in a new dataframe)

Set a column in destination to a text value (read from a Spark view and put result in another Spark view)

Set a column in destination to an int value

Copy a column (src1) from source_view to destination view column (dst1)

Convert data type for a column (or string literal)

Use a Spark SQL Expression (Any valid Spark SQL expression can be used)

Specify multiple transformations

Use variables or parameters

Use conditional logic

Using nested array columns

Using nested struct columns

Using lists of structs

Executing the AutoMapper

Statically Typed Examples

Publishing a new package

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes