Simple library to make pipelines or ETL
Project description
Pypelines-ETL
Simple library to make pipelines or ETL
Installation
$ pip install pypelines-etl
Usage
pypelines allows you to build ETL pipeline. For that, you simply need
the combination of an Extractor, some Transformer or Filter, and a Loader.
Extractor
Making an extractor is fairly easy. Simply decorate a function that return
the data with Extractor:
import pandas
from pypelines import Extractor
@Extractor
def read_iris_dataset(filepath: str) -> pandas.Dataframe:
return pandas.read_csv(filepath)
Transformer or Filter
The Transformer and Filter decorators are equivalent.
Making a Transformer or a Filter is even more easy:
import pandas
from pypelines import Filter, Transformer
@Filter
def keep_setosa(df: pandas.DataFrame) -> pandas.DataFrame:
return df[df['class'] == 'Iris-setosa']
@Filter
def keep_petal_length(df: pandas.DataFrame) -> pandas.Series:
return df['petallength']
@Transformer
def mean(series: pandas.Series) -> float:
return series.mean()
Note that it is possible to combine the Transformer and the Filter
to shorten the pipeline syntax. For example:
new_transformer = keep_setosa | keep_petal_length | mean
pipeline = read_iris_dataset('filepath.csv') | new_transformer
print(pipeline.value)
# 1.464
Loader
In order to build a Loader, it suffices to decorate a function that takes at
least one data parameter.
import json
from pypelines import Loader
@Loader
def write_to_json(output_filepath: str, data: float) -> None:
with open(output_filepath, 'w') as file:
json.dump({'mean-petal-lenght': {'value': data, 'units': 'cm'}}, file)
A Loader can be called without the data parameter,
which loads arguments (like an URL or a path). For example, calling write_to_json(output.json)
will not execute the function, but store the output_filepath argument until the Loader execution in a pipeline.
The standard execution of the function (with the data argument) is however still usable write_to_json(output.json, data=1.464).
ETL pipeline
To make and run the pipeline, simply combine the Extractor with the Transformer, the Filter and the Loader
read_iris_dataset('filepath.csv') | keep_setosa | keep_petal_length | mean | write_to_json('output.json')
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pypelines-etl-0.1.0.tar.gz.
File metadata
- Download URL: pypelines-etl-0.1.0.tar.gz
- Upload date:
- Size: 3.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.0.9 CPython/3.7.5 Linux/5.8.15-201.fc32.x86_64
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1087bf86f7916a2906af0d150fa1be1b98578db31d45f82594542b46fdc5d7e3
|
|
| MD5 |
52f20f3ba7ad4d1fbc719dfc9e68311f
|
|
| BLAKE2b-256 |
be158edab0f202804119b5724b98357a9e51b2f3c2766a33c1088fe6e149be4e
|
File details
Details for the file pypelines_etl-0.1.0-py3-none-any.whl.
File metadata
- Download URL: pypelines_etl-0.1.0-py3-none-any.whl
- Upload date:
- Size: 4.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.0.9 CPython/3.7.5 Linux/5.8.15-201.fc32.x86_64
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ba111ae86b316103a019c06058739d6e06b42369829db873560e1dd87879d7c1
|
|
| MD5 |
d2ae6be4ab7ea7995ca53bdf997512a8
|
|
| BLAKE2b-256 |
2b660056dbc12c0f40550166091a46b7b19b920684660c4a00c92a4acffef302
|