Skip to main content

sefazetllib is a library that provides a simplified and abstracted way to construct ETL/ELT pipelines.

Project description

sefazetllib

License Ruff Checked with mypy Code style: black


Documentation: https://main.d32to2oidohzrl.amplifyapp.com/

Source code: AWS CodeCommit


sefazetllib is a library that provides a simplified and abstracted way to construct ETL/ELT pipelines.

Features

  • Easy to use and understand library for constructing ETL/ELT pipelines.
  • Compatibility with popular data processing frameworks, such as pandas and PySpark.
  • Support for file formats such as CSV and Parquet.
  • Provides the ability to extract, transform and load data with customizable configurations.

Requirements

sefazetllib requires the following to run:

Installation

Use pip to install sefazetllib:

pip install sefazetllib

Usage

Here is an example of how to use the sefazetllib:

from typing import Tuple

from pandas import DataFrame

from sefazetllib import Builder
from sefazetllib.etl import ETL
from sefazetllib.extract import ExtractLocal
from sefazetllib.factory.platform import PlatformFactory
from sefazetllib.load import LoadLocal
from sefazetllib.transform import Transform
from sefazetllib.utils.key import SurrogateKey


@Builder
class TestingDataFrame(Transform):
    def execute(self) -> Tuple[str, DataFrame]:
        return (
            "dataframe",
            DataFrame(
                [["tom", 10], ["nick", 15], ["juli", 14]], columns=["Name", "Age"]
            ),
        )


(
    ETL()
    .setPlatform(PlatformFactory("Pandas").create(name="test_pandas"))
    .transform(TestingDataFrame)
    .load(
        LoadLocal()
        .setFileFormat("parquet")
        .setEntity("load_test")
        .setMode("overwrite")
        .setReference("dataframe")
        .setDuplicates(True)
        .setKey(SurrogateKey().setColumns(["Name", "Age"]).setDistribute(False))
    )
    .extract(
        ExtractLocal()
        .setFileFormat("parquet")
        .setUrl("load_test.parquet")
        .setReference("extract_test")
    )
)

Testing

To run the unit tests, run the following command:

py -m unittest tests/main.py -v

License

sefazetllib is released under the Apache-2.0.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sefazetllib-0.1.55.tar.gz (33.1 kB view hashes)

Uploaded Source

Built Distribution

sefazetllib-0.1.55-py3-none-any.whl (59.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page