Firelink is based on scikit-learn pipeline and adding the functionality to store the pipeline in `.yaml` or `.ember` file for production.
Project description
Firelink
Firelink is based on scikit-learn pipeline and adding the functionality to store the pipeline in .yaml or .ember file for production.
Quickstart
Installation
pip install firelink
Basic Usage
import pandas as pd
from pandas.testing import assert_frame_equal
from firelink.pandas_transform import Drop_duplicates, Filter
from firelink.pipeline import FirePipeline
df = pd.DataFrame(
{
"a": range(10),
"b": range(10, 20),
"c": range(20, 30),
"d": ["a", "n", "d", "f", "g", "h", "h", "j", "q", "w"],
"e": ["a", "d", "a", "d", "e", "e", "a", "a", "d", "d"],
}
)
trans_1 = Filter(["a", "e"])
trans_2 = Drop_duplicates(["e"], keep="first")
pipe_1 = FirePipeline(
[("filter column a and e", trans_1), ("drop duplicate for column e", trans_2)]
)
pipe_1.save_fire("pipe_1.ember", file_type="ember")
pipe_2 = FirePipeline.link_fire("pipe_1.ember")
df1 = pipe_1.fit_transform(df)
df2 = pipe_2.fit_transform(df)
assert_frame_equal(df1, df2)
Spark Usage
import pandas as pd
from pandas.testing import assert_frame_equal
from firelink.spark_transform import WithColumn
from firelink.pandas_transform import Assign
from firelink.pipeline import FirePipeline
from pyspark.sql import SparkSession, functions as F
spark = SparkSession.builder.appName("spark_session").enableHiveSupport().getOrCreate()
df = pd.DataFrame({"col1": [1, 2, 3], "col2": ["a", "b", "c"]})
sdf = spark.createDataFrame(df)
add1 = WithColumn("Country", "F.lit('Canada')")
add2 = WithColumn("City", "F.lit('Toronto')")
spark_pipe = FirePipeline([("Add Country", add1), ("Add City", add2)])
# set_config(display="diagram")
# set_config(display="text")
spark_pipe
sdf = spark_pipe.fit_transform(sdf)
sdf.show()
add1 = Assign({"Country": "Canada"})
add2 = Assign({"City": "Toronto"})
pandas_pipe = FirePipeline([("Add Country", add1), ("Add City", add2)])
pandas_pipe.fit_transform(df)
assert_frame_equal(sdf.toPandas(), pandas_pipe.fit_transform(df))
Pipeline Example Structure Visualization
Detailed Documentation
For the detailed documentation, please go through this portal.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file firelink-0.1.3.tar.gz.
File metadata
- Download URL: firelink-0.1.3.tar.gz
- Upload date:
- Size: 21.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
827aac359b06de7781d36e6bab25c3228a165a3743ab80f9f659f6e2278ae3f7
|
|
| MD5 |
03b5c382e9776d62b0bdbbd9415c614d
|
|
| BLAKE2b-256 |
5fca8253a0739329862c018dad0cc5d5d988ca8899fddd13a602d1aa2e13efd1
|
File details
Details for the file firelink-0.1.3-py3-none-any.whl.
File metadata
- Download URL: firelink-0.1.3-py3-none-any.whl
- Upload date:
- Size: 13.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.10.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3cc62819e96ad379140c3fbdd3011c79cfb902a4386d66e489050728d4da1ca2
|
|
| MD5 |
4d1c0f3d4bc1afab9a31a62c9d8755ec
|
|
| BLAKE2b-256 |
7a579429e6d85d0993150d97cbc1ff2ac17d99fa32b2d0621040051edbeee868
|