peperoncino: A library for easy data processing for pandas
Project description
peperoncino: A library for easy data processing for pandas
Install
$ pip install peperoncino
How to use
Processing DataFrame
import peperoncino as pp
pipeline = pp.Pipeline(
# query data
pp.Query("bar <= 3"),
# assign new feature
pp.Assign(hoge="foo * bar"),
# generate combination feature
pp.Combinations(["foo", "baz"], ["*", "/"]),
# target encoding
pp.TargetEncoding(["baz"], "y", ref=0),
# select features
pp.Select(
["hoge", "*_foo_baz", "TARGET_ENC_baz_BY_y", "y"],
lackable_cols=["y"],
)
)
# execute the processing
train_df, val_df, test_df = \
pipeline.process([train_df, val_df, test_df])
Predefined processings
name | description |
---|---|
ApplyColumn |
Apply a function to a column. |
AsCategory |
Assign category dtype to columns. |
Assign |
Assign a feature by a formula. |
Combinations |
Create combination features. |
DropColumns |
Drop columns. |
DropDuplicates |
Drop duplicate rows. |
Pipeline |
Chain processings. |
Query |
Query rows by a given condition. |
RenameCOlumns |
Rename columns. |
Select |
Select columns. |
StatsEncoding |
Encode columns by statistical values of another column. |
TargetEncoding |
Target Encoding with smoothing. |
Define processing
All processings are subclass of pp.BaseProcessing
.
All you need is define the _process(self, dfs: List[pd.DataFrame]) -> List[pd.DataFrame]
function.
class ExampleProcessing(pp.BaseProcessing):
def _process(self, dfs: List[pd.DataFrame]) -> List[pd.DataFrame]:
return [df + 1 for df in dfs]
If your processing doesn't depent on each other data frames, then use pp.SeparatedProcessing
.
class ExampleProcessing(pp.SeparatedProcessing):
def sep_process(self, df: pd.DataFrame) -> pd.DataFrame:
return df * 2
If you need to merge all dataframes and then apply your processing, use pp.MergedProcessing
.
class ExampleProcessing(pp.SeparatedProcessing):
def simul_process(self, df: pd.DataFrame) -> pd.DataFrame:
return df.assign(col1_mean=df['col1'].mean())
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
peperoncino-0.0.4.tar.gz
(9.9 kB
view hashes)
Built Distribution
Close
Hashes for peperoncino-0.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c6912cb6ca92f34a5aed56bc385e08f36884f132c0ba282ef96eadf55e309ab1 |
|
MD5 | 2477bfc321a8e65eff01fef263953bef |
|
BLAKE2b-256 | 76ebefb84a581217e0b63dc0bac6d797eb90a912540070b4601c20b62163a875 |