Skip to main content

Simple library to make pipelines or ETL

Project description

Pypelines-ETL

Simple library to make pipelines or ETL

Installation

$ pip install pypelines-etl

Usage

pypelines allows you to build ETL pipeline. For that, you simply need the combination of an Extractor, some Transformer or Filter, and a Loader.

Extractor

Making an extractor is fairly easy. Simply decorate a function that return the data with Extractor:

import pandas
from pypelines import Extractor

@Extractor
def read_iris_dataset(filepath: str) -> pandas.Dataframe:
    return pandas.read_csv(filepath)

Transformer or Filter

The Transformer and Filter decorators are equivalent.

Making a Transformer or a Filter is even more easy:

import pandas
from pypelines import Filter, Transformer

@Filter
def keep_setosa(df: pandas.DataFrame) -> pandas.DataFrame:
    return df[df['class'] == 'Iris-setosa']


@Filter
def keep_petal_length(df: pandas.DataFrame) -> pandas.Series:
    return df['petallength']


@Transformer
def mean(series: pandas.Series) -> float:
    return series.mean()

Note that it is possible to combine the Transformer and the Filter to shorten the pipeline syntax. For example:

new_transformer = keep_setosa | keep_petal_length | mean
pipeline = read_iris_dataset('filepath.csv') | new_transformer
print(pipeline.value)
# 1.464

Loader

In order to build a Loader, it suffices to decorate a function that takes at least one data parameter.

import json
from pypelines import Loader

@Loader
def write_to_json(output_filepath: str, data: float) -> None:
    with open(output_filepath, 'w') as file:
        json.dump({'mean-petal-lenght': {'value': data, 'units': 'cm'}}, file)

A Loader can be called without the data parameter, which loads arguments (like an URL or a path). For example, calling write_to_json(output.json) will not execute the function, but store the output_filepath argument until the Loader execution in a pipeline. The standard execution of the function (with the data argument) is however still usable write_to_json(output.json, data=1.464).

ETL pipeline

To make and run the pipeline, simply combine the Extractor with the Transformer, the Filter and the Loader

read_iris_dataset('filepath.csv') | keep_setosa | keep_petal_length | mean | write_to_json('output.json')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypelines-etl-0.1.0.tar.gz (3.5 kB view details)

Uploaded Source

Built Distribution

pypelines_etl-0.1.0-py3-none-any.whl (4.4 kB view details)

Uploaded Python 3

File details

Details for the file pypelines-etl-0.1.0.tar.gz.

File metadata

  • Download URL: pypelines-etl-0.1.0.tar.gz
  • Upload date:
  • Size: 3.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.9 CPython/3.7.5 Linux/5.8.15-201.fc32.x86_64

File hashes

Hashes for pypelines-etl-0.1.0.tar.gz
Algorithm Hash digest
SHA256 1087bf86f7916a2906af0d150fa1be1b98578db31d45f82594542b46fdc5d7e3
MD5 52f20f3ba7ad4d1fbc719dfc9e68311f
BLAKE2b-256 be158edab0f202804119b5724b98357a9e51b2f3c2766a33c1088fe6e149be4e

See more details on using hashes here.

File details

Details for the file pypelines_etl-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: pypelines_etl-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 4.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.0.9 CPython/3.7.5 Linux/5.8.15-201.fc32.x86_64

File hashes

Hashes for pypelines_etl-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ba111ae86b316103a019c06058739d6e06b42369829db873560e1dd87879d7c1
MD5 d2ae6be4ab7ea7995ca53bdf997512a8
BLAKE2b-256 2b660056dbc12c0f40550166091a46b7b19b920684660c4a00c92a4acffef302

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page