Simple library to make pipelines or ETL
Project description
Pypelines-ETL
Simple library to make pipelines or ETL
Installation
$ pip install pypelines-etl
Usage
pypelines
allows you to build ETL pipeline. For that, you simply need
the combination of an Extractor
, some Transformer
or Filter
, and a Loader
.
Extractor
Making an extractor is fairly easy. Simply decorate a function that return
the data with Extractor
:
import pandas
from pypelines import Extractor
@Extractor
def read_iris_dataset(filepath: str) -> pandas.Dataframe:
return pandas.read_csv(filepath)
Transformer or Filter
The Transformer
and Filter
decorators are equivalent.
Making a Transformer
or a Filter
is even more easy:
import pandas
from pypelines import Filter, Transformer
@Filter
def keep_setosa(df: pandas.DataFrame) -> pandas.DataFrame:
return df[df['class'] == 'Iris-setosa']
@Filter
def keep_petal_length(df: pandas.DataFrame) -> pandas.Series:
return df['petallength']
@Transformer
def mean(series: pandas.Series) -> float:
return series.mean()
Note that it is possible to combine the Transformer
and the Filter
to shorten the pipeline syntax. For example:
new_transformer = keep_setosa | keep_petal_length | mean
pipeline = read_iris_dataset('filepath.csv') | new_transformer
print(pipeline.value)
# 1.464
Loader
In order to build a Loader
, it suffices to decorate a function that takes at
least one data
parameter.
import json
from pypelines import Loader
@Loader
def write_to_json(output_filepath: str, data: float) -> None:
with open(output_filepath, 'w') as file:
json.dump({'mean-petal-lenght': {'value': data, 'units': 'cm'}}, file)
A Loader
can be called without the data
parameter,
which loads arguments (like an URL or a path). For example, calling write_to_json(output.json)
will not execute the function, but store the output_filepath
argument until the Loader
execution in a pipeline.
The standard execution of the function (with the data
argument) is however still usable write_to_json(output.json, data=1.464)
.
ETL pipeline
To make and run the pipeline, simply combine the Extractor
with the Transformer
, the Filter
and the Loader
read_iris_dataset('filepath.csv') | keep_setosa | keep_petal_length | mean | write_to_json('output.json')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pypelines_etl-0.1.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ba111ae86b316103a019c06058739d6e06b42369829db873560e1dd87879d7c1 |
|
MD5 | d2ae6be4ab7ea7995ca53bdf997512a8 |
|
BLAKE2b-256 | 2b660056dbc12c0f40550166091a46b7b19b920684660c4a00c92a4acffef302 |