routine pandas method chain links wrapped with scikit-lego
Project description
Purpose
- a collection of pandas DataFrame methods and functions I commonly used decorated with scikit-lego's
@log_stepto enable logging within method chaining - Wanted to use the logging of common pandas methods across projects
- Inspired from Vincent D. Warmerdam's PyData Eindhoven 2019 talk titled Untitled12.ipynb
- See the docs for pandas pipelines of scikit-lego to view the general use case for
@log_step - See
examples/demo.ipynbfor a few workflows and links to other relevant SE questions
In [1]: import pandas as pd; import numpy as np; import logging
In [2]: import pandas_pipe
In [3]: stream_handler = logging.StreamHandler()
In [4]: pandas_pipe.logger.addHandler(stream_handler)
In [5]: dataf_input = pd.DataFrame(
...: {
...: "aaa": {0: 10, 1: 11, 2: 12, 3: 13},
...: "myid": {0: 1, 1: 2, 2: 3, 3: 4},
...: "num": {0: "1, 2, 3", 1: np.nan, 2: "1, 2", 3: np.nan},
...: "text": {0: "aa, bb, cc", 1: np.nan, 2: "cc, dd", 3: "ee"},
...: "states": {
...: 0: "Wyoming; Illinois; New Hampshire",
...: 1: "Pennsylvania",
...: 2: "New York",
...: 3: "Pennsylvania",
...: },
...: }
...: )
In [6]: dataf_input
Out[6]:
aaa myid num text states
0 10 1 1, 2, 3 aa, bb, cc Wyoming; Illinois; New Hampshire
1 11 2 NaN NaN Pennsylvania
2 12 3 1, 2 cc, dd New York
3 13 4 NaN ee Pennsylvania
In [7]: (
...: dataf_input.pipe(pandas_pipe.start_pipeline)
...: .pipe(pandas_pipe.explode_setup, columns=["num", "text"], delimiter=",")
...: .pipe(pandas_pipe.explode_setup, columns=["states"], delimiter=";")
...: .pipe(pandas_pipe.explode, column="num", ignore_index=False)
...: .pipe(pandas_pipe.explode, column="text", ignore_index=False)
...: .pipe(pandas_pipe.explode, column="states", ignore_index=False)
...: .pipe(pandas_pipe.drop_duplicates)
...: )
[start_pipeline(df)] time=0:00:00.000152 n_obs=4, n_col=5 names=['aaa', 'myid', 'num', 'text', 'states']
[explode_setup(df, columns = ['num', 'text'], delimiter = ',')] time=0:00:00.001107 n_obs=4, n_col=5 delta=(0, 0)
[explode_setup(df, columns = ['states'], delimiter = ';')] time=0:00:00.000466 n_obs=4, n_col=5 delta=(0, 0)
[explode(df, column = 'num', ignore_index = False)] time=0:00:00.003697 n_obs=7, n_col=5 delta=(+3, 0)
[explode(df, column = 'text', ignore_index = False)] time=0:00:00.002642 n_obs=15, n_col=5 delta=(+8, 0)
[explode(df, column = 'states', ignore_index = False)] time=0:00:00.002184 n_obs=33, n_col=5 delta=(+18, 0)
[drop_duplicates(df)] time=0:00:00.001671 n_obs=33, n_col=5 delta=(0, 0)
Out[7]:
aaa myid num text states
0 10 1 1 aa Wyoming
0 10 1 1 aa Illinois
0 10 1 1 aa New Hampshire
0 10 1 1 bb Wyoming
0 10 1 1 bb Illinois
0 10 1 1 bb New Hampshire
0 10 1 1 cc Wyoming
0 10 1 1 cc Illinois
0 10 1 1 cc New Hampshire
0 10 1 2 aa Wyoming
0 10 1 2 aa Illinois
0 10 1 2 aa New Hampshire
0 10 1 2 bb Wyoming
0 10 1 2 bb Illinois
0 10 1 2 bb New Hampshire
0 10 1 2 cc Wyoming
0 10 1 2 cc Illinois
0 10 1 2 cc New Hampshire
0 10 1 3 aa Wyoming
0 10 1 3 aa Illinois
0 10 1 3 aa New Hampshire
0 10 1 3 bb Wyoming
0 10 1 3 bb Illinois
0 10 1 3 bb New Hampshire
0 10 1 3 cc Wyoming
0 10 1 3 cc Illinois
0 10 1 3 cc New Hampshire
1 11 2 NaN NaN Pennsylvania
2 12 3 1 cc New York
2 12 3 1 dd New York
2 12 3 2 cc New York
2 12 3 2 dd New York
3 13 4 NaN ee Pennsylvania
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pandas_pipe-0.0.1.tar.gz
(5.2 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pandas_pipe-0.0.1.tar.gz.
File metadata
- Download URL: pandas_pipe-0.0.1.tar.gz
- Upload date:
- Size: 5.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.5 CPython/3.7.3 Darwin/18.7.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
13d87dbf4bd5e7bbdde7de9b2573a6d85018519f154d17d5c6fa7e5a863b4c24
|
|
| MD5 |
bcfb92878121efd2c19b90653c77ed3a
|
|
| BLAKE2b-256 |
a5bc66fc02d6ba725b265e455d01307dbeefb374772e35af6469d4bc01c08048
|
File details
Details for the file pandas_pipe-0.0.1-py3-none-any.whl.
File metadata
- Download URL: pandas_pipe-0.0.1-py3-none-any.whl
- Upload date:
- Size: 5.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.5 CPython/3.7.3 Darwin/18.7.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c29f2766e3823667f08f961d59a09db9e4d90dc7b1548151361a4aa7b1ebfaa7
|
|
| MD5 |
7b4eb68b3427ec3952179a2231fa456d
|
|
| BLAKE2b-256 |
98a2ba9ab6b05496c1bfaaa1b2274c12621dfbe23b46b586853fcbf832fe806f
|