Simple Smart Pipe Operator
Project description
Simple Smart Pipe
SSPipe is a python productivity-tool for rapid data manipulation in python.
It helps you break up any complicated expression into a sequence of simple transformations, increasing human-readability and decreasing the need for matching parentheses!
If you're familiar with
|
operator
of Unix, or
%>%
operator
of R's magrittr, or
DataFrame.pipe
method of pandas library, sspipe
provides the same functionality
for any object in python.
Installation and Usage
Install sspipe using pip:
pip install --upgrade sspipe
Then import it in your scripts.
from sspipe import p, px
The whole functionality
of this library is exposed by two objects p
(as a wrapper for functions to
be called on the piped object) and px
(as a placeholder for piped object).
Examples
Description | Python expression using p and px |
Equivalent python code |
---|---|---|
Simple function call |
"hello world!" | p(print) |
X = "hello world!" print(X) |
Function call with extra args |
"hello" | p(print, "world", end='!') |
X = "hello" print(X, "world", end='!') |
Explicitly positioning piped argument with px placeholder |
"world" | p(print, "hello", px, "!") |
X = "world" print("hello", X, "!") |
Chaining pipes | 5 | px + 2 | px ** 5 + px | p(print) |
X = 5 X = X + 2 X = X ** 5 + X print(X) |
Tailored behavior for builtin map and filter |
( range(5) | p(filter, px % 2 == 0) | p(map, px + 10) | p(list) | p(print) ) |
X = range(5) X = filter((lambda x:x%2==0),X) X = map((lambda x: x + 10), X) print(list(X)) |
NumPy expressions | range(10) | np.sin(px)+1 | p(plt.plot) |
X = range(10) X = np.sin(X) + 1 plt.plot(X) |
Pandas support | people_df | px.loc[px.age > 10, 'name'] |
X = people_df X.loc[X.age > 10, 'name'] |
Assignment | people_df['name'] |= px.str.upper() |
X = people_df['name'] X = X.str.upper() people_df['name'] = X |
Builtin Data Structures |
2 | p({px-1: p([px, p((px+1, 4))])}) |
X = 2 X = {X-1: [X, (X+1, 4)]} |
Introduction
Suppose we want to generate a dict, mapping names of 5 biggest files in current directory to their size in bytes, like below:
{'README.md': 3732, 'setup.py': 1642, '.gitignore': 1203, 'LICENSE': 1068, 'deploy.sh': 89}
One approach is to use os.listdir()
to list files
and directories in current working directory, filter those which are file,
map each to a tuple of (name, size), sort them by size,
take first 5 items, make adict and print it.
Although it is not a good practice to write the whole script in single expression without introducing intermediary variables, it is an exaggerated example, doing it in a single expression for demonstration purpose:
import os
print(
dict(
sorted(
map(
lambda x: [x, os.path.getsize(x)],
filter(os.path.isfile, os.listdir('.'))
), key=lambda x: x[1], reverse=True
)[:5]
)
)
Using sspipe's p
operator, the same single expression can be written in a
more human-readable flow of sequential transformations:
import os
from sspipe import p
(
os.listdir('.')
| p(filter, os.path.isfile)
| p(map, lambda x: [x, os.path.getsize(x)])
| p(sorted, key=lambda x: x[1], reverse=True)[:5]
| p(dict)
| p(print)
)
As you see, the expression is decomposed into a sequence
starting with initial data, os.list('.')
, followed by multiple
| p(...)
stages.
Each | p(...)
stage describes a transformation that is applied to
to left-hand-side of |
.
First argument of p()
defines the function
that is applied on data. For example, x | p(f1) | p(f2) | p(f3)
is
equivalent to f3(f2(f1(x)))
.
Rest of arguments of p()
are passed
to the transforming function of each stage. For example,
x | p(f1, y) | p(f2, k=z)
is equivalent to f2(f1(x, y), k=z)
Advanced Guide
The px
helper
TODO: explain.
px
is implemented by:px = p(lambda x: x)
px
is similar to, but not same as, magrittr's dot(.
) placeholderx | p(f, px+1, y, px+2)
is equivalent tof(x+1, y, x+2)
A+1 | f(px, px[2](px.y))
is equivalent tof(A+1, (A+1)[2]((A+1).y)
px
can be used to prevent adding parenthesesx+1 | px * 2 | np.log(px)+3
is equivalent to:np.log((x+1) * 2) + 3
Integration with Numpy, Pandas, Pytorch
TODO: explain.
p
andpx
are compatible with Numpy, Pandas, Pytorch.[1,2] | p(pd.Series) | px[px ** 2 < np.log(px) + 1]
is equivalent tox=pd.Series([1, 2]); x[x**2 < np.log(x)+1]
Compatibility with JulienPalard/Pipe
This library is inspired by, and depends on, the intelligent and concise work of
JulienPalard/Pipe. If you want
a single pipe.py
script or a lightweight library that implements core
functionality and logic of SSPipe, Pipe is perfect.
SSPipe is focused on facilitating usage of pipes, by integration with
popular libraries and introducing px
concept and overriding python
operators to make pipe a first-class citizen.
Every existing pipe implemented by JulienPalard/Pipe
library is accessible through p.<original_name>
and is compatible with SSPipe.
SSPipe does not implement any specific pipe function and delegates
implementation and naming of pipe functions to JulienPalard/Pipe.
For example, JulienPalard/Pipe's example for solving "Find the sum of all the even-valued terms in Fibonacci which do not exceed four million." can be re-written using sspipe:
def fib():
a, b = 0, 1
while True:
yield a
a, b = b, a + b
from sspipe import p, px
euler2 = (fib() | p.where(lambda x: x % 2 == 0)
| p.take_while(lambda x: x < 4000000)
| p.add())
You can also pass px
shorthands to JulienPalard/Pipe API:
euler2 = (fib() | p.where(px % 2 == 0)
| p.take_while(px < 4000000)
| p.add())
Internals
TODO: explain.
p
is a class that overrides__ror__
(|
) operator to apply the function to operand.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.