Project description

Pypelines-ETL

Simple library to make pipelines or ETL

Installation

$ pip install pypelines-etl

Usage

pypelines allows you to build ETL pipeline. For that, you simply need the combination of an Extractor, some Transformer or Filter, and a Loader.

Extractor

Making an extractor is fairly easy. Simply decorate a function that return the data with Extractor:

import pandas
from pypelines import Extractor

@Extractor
def read_iris_dataset(filepath: str) -> pandas.Dataframe:
    return pandas.read_csv(filepath)

Transformer or Filter

The Transformer and Filter decorators are equivalent.

Making a Transformer or a Filter is even more easy:

import pandas
from pypelines import Filter, Transformer

@Filter
def keep_setosa(df: pandas.DataFrame) -> pandas.DataFrame:
    return df[df['class'] == 'Iris-setosa']


@Filter
def keep_petal_length(df: pandas.DataFrame) -> pandas.Series:
    return df['petallength']


@Transformer
def mean(series: pandas.Series) -> float:
    return series.mean()

Note that it is possible to combine the Transformer and the Filter to shorten the pipeline syntax. For example:

new_transformer = keep_setosa | keep_petal_length | mean
pipeline = read_iris_dataset('filepath.csv') | new_transformer
print(pipeline.value)
# 1.464

Loader

In order to build a Loader, it suffices to decorate a function that takes at least one data parameter.

import json
from pypelines import Loader

@Loader
def write_to_json(output_filepath: str, data: float) -> None:
    with open(output_filepath, 'w') as file:
        json.dump({'mean-petal-lenght': {'value': data, 'units': 'cm'}}, file)

A Loader can be called without the data parameter, which loads arguments (like an URL or a path). For example, calling write_to_json(output.json) will not execute the function, but store the output_filepath argument until the Loader execution in a pipeline. The standard execution of the function (with the data argument) is however still usable write_to_json(output.json, data=1.464).

ETL pipeline

To make and run the pipeline, simply combine the Extractor with the Transformer, the Filter and the Loader

read_iris_dataset('filepath.csv') | keep_setosa | keep_petal_length | mean | write_to_json('output.json')

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

This version

0.1.0

Nov 2, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypelines-etl-0.1.0.tar.gz (3.5 kB view hashes)

Uploaded Nov 2, 2020 Source

Built Distribution

pypelines_etl-0.1.0-py3-none-any.whl (4.4 kB view hashes)

Uploaded Nov 2, 2020 Python 3

Hashes for pypelines-etl-0.1.0.tar.gz

Hashes for pypelines-etl-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`1087bf86f7916a2906af0d150fa1be1b98578db31d45f82594542b46fdc5d7e3`
MD5	`52f20f3ba7ad4d1fbc719dfc9e68311f`
BLAKE2b-256	`be158edab0f202804119b5724b98357a9e51b2f3c2766a33c1088fe6e149be4e`

Hashes for pypelines_etl-0.1.0-py3-none-any.whl

Hashes for pypelines_etl-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ba111ae86b316103a019c06058739d6e06b42369829db873560e1dd87879d7c1`
MD5	`d2ae6be4ab7ea7995ca53bdf997512a8`
BLAKE2b-256	`2b660056dbc12c0f40550166091a46b7b19b920684660c4a00c92a4acffef302`