peperoncino: A library for easy data processing for pandas
Project description
peperoncino: A library for easy data processing for pandas
Install
$ pip install peperoncino
How to use
Processing DataFrame
import peperoncino as pp
pipeline = pp.Pipeline(
# query data
pp.Query("bar <= 3"),
# assign new feature
pp.Assign(hoge="foo * bar"),
# generate combination feature
pp.Combinations(["foo", "baz"], ["*", "/"]),
# target encoding
pp.TargetEncoding(["baz"], "y", ref=0),
# select features
pp.Select(
["hoge", "*_foo_baz", "TARGET_ENC_baz_BY_y", "y"],
lackable_cols=["y"],
)
)
# execute the processing
train_df, val_df, test_df = \
pipeline.process([train_df, val_df, test_df])
Predefined processings
| name | description |
|---|---|
ApplyColumn |
Apply a function to a column. |
AsCategory |
Assign category dtype to columns. |
Assign |
Assign a feature by a formula. |
Combinations |
Create combination features. |
DropColumns |
Drop columns. |
DropDuplicates |
Drop duplicate rows. |
Pipeline |
Chain processings. |
Query |
Query rows by a given condition. |
RenameCOlumns |
Rename columns. |
Select |
Select columns. |
StatsEncoding |
Encode columns by statistical values of another column. |
TargetEncoding |
Target Encoding with smoothing. |
Define processing
All processings are subclass of pp.BaseProcessing.
All you need is define the _process(self, dfs: List[pd.DataFrame]) -> List[pd.DataFrame] function.
class ExampleProcessing(pp.BaseProcessing):
def _process(self, dfs: List[pd.DataFrame]) -> List[pd.DataFrame]:
return [df + 1 for df in dfs]
If your processing doesn't depent on each other data frames, then use pp.SeparatedProcessing.
class ExampleProcessing(pp.SeparatedProcessing):
def sep_process(self, df: pd.DataFrame) -> pd.DataFrame:
return df * 2
If you need to merge all dataframes and then apply your processing, use pp.MergedProcessing.
class ExampleProcessing(pp.SeparatedProcessing):
def simul_process(self, df: pd.DataFrame) -> pd.DataFrame:
return df.assign(col1_mean=df['col1'].mean())
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file peperoncino-0.0.5.tar.gz.
File metadata
- Download URL: peperoncino-0.0.5.tar.gz
- Upload date:
- Size: 10.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.0.2 CPython/3.7.5 Linux/5.0.0-1027-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f7cc3fb2a4e18278544dedd5590c798f89b01825d76baf1c8cf3e407bcedb1fc
|
|
| MD5 |
42009a63501a98150de9ebd2fdffca69
|
|
| BLAKE2b-256 |
a933ed4e3b05e0df6fa5ffccdde07b9a00113eedf3ca2492fac0f3d28ee326b1
|
File details
Details for the file peperoncino-0.0.5-py3-none-any.whl.
File metadata
- Download URL: peperoncino-0.0.5-py3-none-any.whl
- Upload date:
- Size: 15.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.0.2 CPython/3.7.5 Linux/5.0.0-1027-azure
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c019a0422a371b38de126d1de072f805a547fe676e82b38f7605e8172aa92fc4
|
|
| MD5 |
3e4b6c587f33b561d68e35713fdc4ca6
|
|
| BLAKE2b-256 |
170db8c96148f06d30dee8861225ea6204d112cbea45fbd880a4312e709033af
|