Framework for building pipelines for data processing
Project description
pipedata
Chained operations in Python, applied to data processing.
Installation
pip install pipedata
An Example
Core Framework
The core framework provides the building blocks for chaining operations.
Running a stream:
from pipedata.core import StreamStart
result = (
StreamStart(range(10))
.filter(lambda x: x % 2 == 0)
.map(lambda x: x ^ 2)
.map_tuple(lambda x: x, 2)
.to_list()
)
print(result)
#> [(2, 0), (6, 4), (10,)]
Creating a chain and then using it:
import json
from pipedata.core import ChainStart, Stream, StreamStart
chain = (
ChainStart()
.filter(lambda x: x % 2 == 0)
.map(lambda x: x ^ 2)
.map_tuple(lambda x: sum(x), 2)
)
print(Stream(range(10), chain).to_list())
#> [2, 10, 10]
print(json.dumps(chain.get_counts(), indent=4))
#> [
#> {
#> "name": "_identity",
#> "inputs": 10,
#> "outputs": 10
#> },
#> {
#> "name": "<lambda>",
#> "inputs": 10,
#> "outputs": 5
#> },
#> {
#> "name": "<lambda>",
#> "inputs": 5,
#> "outputs": 5
#> },
#> {
#> "name": "<lambda>",
#> "inputs": 5,
#> "outputs": 3
#> }
#> ]
print(StreamStart(range(10)).flat_map(chain).to_list())
#> [2, 10, 10]
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pipedata-0.0.1.tar.gz
(4.9 kB
view hashes)