Firelink is based on scikit-learn pipeline and adding the functionality to store the pipeline in `.yaml` or `.ember` file for production.
Project description
Firelink
Firelink is based on scikit-learn pipeline and adding the functionality to store the pipeline in .yaml
or .ember
file for production.
Quickstart
Installation
pip install firelink
Basic Usage
import pandas as pd
from pandas.testing import assert_frame_equal
from firelink.pandas_transform import Drop_duplicates, Filter
from firelink.pipeline import FirePipeline
df = pd.DataFrame(
{
"a": range(10),
"b": range(10, 20),
"c": range(20, 30),
"d": ["a", "n", "d", "f", "g", "h", "h", "j", "q", "w"],
"e": ["a", "d", "a", "d", "e", "e", "a", "a", "d", "d"],
}
)
trans_1 = Filter(["a", "e"])
trans_2 = Drop_duplicates(["e"], keep="first")
pipe_1 = FirePipeline(
[("filter column a and e", trans_1), ("drop duplicate for column e", trans_2)]
)
pipe_1.save_fire("pipe_1.ember", file_type="ember")
pipe_2 = FirePipeline.link_fire("pipe_1.ember")
df1 = pipe_1.fit_transform(df)
df2 = pipe_2.fit_transform(df)
assert_frame_equal(df1, df2)
Spark Usage
import pandas as pd
from pandas.testing import assert_frame_equal
from firelink.spark_transform import WithColumn
from firelink.pandas_transform import Assign
from firelink.pipeline import FirePipeline
from pyspark.sql import SparkSession, functions as F
spark = SparkSession.builder.appName("spark_session").enableHiveSupport().getOrCreate()
df = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"]})
sdf = spark.createDataFrame(df)
add1 = WithColumn("Country", "F.lit('Canada')")
add2 = WithColumn("City", "F.lit('Toronto')")
spark_pipe = FirePipeline([("Add Country", add1), ("Add City", add2)])
# set_config(display="diagram")
# set_config(display="text")
spark_pipe
sdf = spark_pipe.fit_transform(sdf)
sdf.show()
add1 = Assign({"Country": "Canada"})
add2 = Assign({"City": "Toronto"})
pandas_pipe = FirePipeline([("Add Country", add1), ("Add City", add2)])
pandas_pipe.fit_transform(df)
assert_frame_equal(sdf.toPandas(), pandas_pipe.fit_transform(df))
Pipeline Example Structure Visualization
Detailed Documentation
For the detailed documentation, please go through this portal.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
firelink-0.1.2.tar.gz
(17.5 kB
view details)
Built Distribution
firelink-0.1.2-py3-none-any.whl
(11.7 kB
view details)
File details
Details for the file firelink-0.1.2.tar.gz
.
File metadata
- Download URL: firelink-0.1.2.tar.gz
- Upload date:
- Size: 17.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 526ca157a65f984db6beb3e874c690f46d538fcd89a5f68ddf4458696e1ddc78 |
|
MD5 | 2bde829043cc37097687ec8c34a3240a |
|
BLAKE2b-256 | fdc4949c3a182c48bbf5e6c79987e5e386fa186599a7fd97753351840c5b580b |
File details
Details for the file firelink-0.1.2-py3-none-any.whl
.
File metadata
- Download URL: firelink-0.1.2-py3-none-any.whl
- Upload date:
- Size: 11.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.8.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4e31c27b0a987c9545c11c2ec26c3cb3a99b1b9469fd911b8f06175f1582ed6e |
|
MD5 | 379d8bbcbaedcd9ee8ac545712eb4a56 |
|
BLAKE2b-256 | cd4a8986e1611738d7eba3177692f4ea5d972759fe2e971ff5ac41c69f758ceb |