peperoncino: A library for easy data processing for pandas
Project description
peperoncino: A library for easy data processing for pandas
Install
$ pip install peperoncino
How to use
Processing DataFrame
import peperoncino as pp pipeline = pp.Pipeline( # query data pp.Query("bar <= 3"), # assign new feature pp.Assign(hoge="foo * bar"), # generate combination feature pp.Combinations(["foo", "baz"], ["*", "/"]), # target encoding pp.TargetEncoding(["baz"], "y", ref=0), # select features pp.Select( ["hoge", "*_foo_baz", "TARGET_ENC_baz_BY_y", "y"], lackable_cols=["y"], ) ) # execute the processing train_df, val_df, test_df = \ pipeline.process([train_df, val_df, test_df])
Predefined processings
name | description |
---|---|
ApplyColumn |
Apply a function to a column. |
AsCategory |
Assign category dtype to columns. |
Assign |
Assign a feature by a formula. |
Combinations |
Create combination features. |
DropColumns |
Drop columns. |
DropDuplicates |
Drop duplicate rows. |
Pipeline |
Chain processings. |
Query |
Query rows by a given condition. |
RenameCOlumns |
Rename columns. |
Select |
Select columns. |
StatsEncoding |
Encode columns by statistical values of another column. |
TargetEncoding |
Target Encoding with smoothing. |
Define processing
All processings are subclass of pp.BaseProcessing
.
All you need is define the _process(self, dfs: List[pd.DataFrame]) -> List[pd.DataFrame]
function.
class ExampleProcessing(pp.BaseProcessing): def _process(self, dfs: List[pd.DataFrame]) -> List[pd.DataFrame]: return [df + 1 for df in dfs]
If your processing doesn't depent on each other data frames, then use pp.SeparatedProcessing
.
class ExampleProcessing(pp.SeparatedProcessing): def sep_process(self, df: pd.DataFrame) -> pd.DataFrame: return df * 2
If you need to merge all dataframes and then apply your processing, use pp.MergedProcessing
.
class ExampleProcessing(pp.SeparatedProcessing): def simul_process(self, df: pd.DataFrame) -> pd.DataFrame: return df.assign(col1_mean=df['col1'].mean())
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Filename, size | File type | Python version | Upload date | Hashes |
---|---|---|---|---|
Filename, size peperoncino-0.0.5-py3-none-any.whl (15.4 kB) | File type Wheel | Python version py3 | Upload date | Hashes View |
Filename, size peperoncino-0.0.5.tar.gz (10.1 kB) | File type Source | Python version None | Upload date | Hashes View |
Close
Hashes for peperoncino-0.0.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c019a0422a371b38de126d1de072f805a547fe676e82b38f7605e8172aa92fc4 |
|
MD5 | 3e4b6c587f33b561d68e35713fdc4ca6 |
|
BLAKE2-256 | 170db8c96148f06d30dee8861225ea6204d112cbea45fbd880a4312e709033af |