Simple library to make pipelines or ETL
Project description
Pypelines-ETL
Simple library to make pipelines or ETL
Installation
$ pip install pypelines-etl
Usage
pypelines
allows you to build ETL pipeline. For that, you simply need
the combination of an Extractor
, some Transformer
or Filter
, and a Loader
.
Extractor
Making an extractor is fairly easy. Simply decorate a function that return
the data with Extractor
:
import pandas
from pypelines import Extractor
@Extractor
def read_iris_dataset(filepath: str) -> pandas.Dataframe:
return pandas.read_csv(filepath)
Transformer or Filter
The Transformer
and Filter
decorators are equivalent.
Making a Transformer
or a Filter
is even more easy:
import pandas
from pypelines import Filter, Transformer
@Filter
def keep_setosa(df: pandas.DataFrame) -> pandas.DataFrame:
return df[df['class'] == 'Iris-setosa']
@Filter
def keep_petal_length(df: pandas.DataFrame) -> pandas.Series:
return df['petallength']
@Transformer
def mean(series: pandas.Series) -> float:
return series.mean()
Note that it is possible to combine the Transformer
and the Filter
to shorten the pipeline syntax. For example:
new_transformer = keep_setosa | keep_petal_length | mean
pipeline = read_iris_dataset('filepath.csv') | new_transformer
print(pipeline.value)
# 1.464
Loader
In order to build a Loader
, it suffices to decorate a function that takes at
least one data
parameter.
import json
from pypelines import Loader
@Loader
def write_to_json(output_filepath: str, data: float) -> None:
with open(output_filepath, 'w') as file:
json.dump({'mean-petal-lenght': {'value': data, 'units': 'cm'}}, file)
A Loader
can be called without the data
parameter,
which loads arguments (like an URL or a path). For example, calling write_to_json(output.json)
will not execute the function, but store the output_filepath
argument until the Loader
execution in a pipeline.
The standard execution of the function (with the data
argument) is however still usable write_to_json(output.json, data=1.464)
.
ETL pipeline
To make and run the pipeline, simply combine the Extractor
with the Transformer
, the Filter
and the Loader
read_iris_dataset('filepath.csv') | keep_setosa | keep_petal_length | mean | write_to_json('output.json')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pypelines-etl-0.1.0.tar.gz
.
File metadata
- Download URL: pypelines-etl-0.1.0.tar.gz
- Upload date:
- Size: 3.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.0.9 CPython/3.7.5 Linux/5.8.15-201.fc32.x86_64
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1087bf86f7916a2906af0d150fa1be1b98578db31d45f82594542b46fdc5d7e3 |
|
MD5 | 52f20f3ba7ad4d1fbc719dfc9e68311f |
|
BLAKE2b-256 | be158edab0f202804119b5724b98357a9e51b2f3c2766a33c1088fe6e149be4e |
File details
Details for the file pypelines_etl-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: pypelines_etl-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.0.9 CPython/3.7.5 Linux/5.8.15-201.fc32.x86_64
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ba111ae86b316103a019c06058739d6e06b42369829db873560e1dd87879d7c1 |
|
MD5 | d2ae6be4ab7ea7995ca53bdf997512a8 |
|
BLAKE2b-256 | 2b660056dbc12c0f40550166091a46b7b19b920684660c4a00c92a4acffef302 |