Experimental in-memory data flow pipelines.
Project description
Experiments in data flow programming.
After some experimentation, Apache Beam’s Python SDK got the API right. Use that instead.
Standard Word Count Example
Grab the 5 most common words in LICENSE.txt
from collections import Counter
from tinyflow.serial import ops, Pipeline
pipe = Pipeline() \
| "Split line into words" >> ops.flatmap(lambda x: x.lower().split()) \
| "Remove empty lines" >> ops.filter(bool) \
| "Produce the 5 most common words" >> ops.counter(5) \
| "Sort by frequency desc" >> ops.sort(key=lambda x: x[1], reverse=True)
with open('LICENSE.txt') as f:
results = dict(pipe(f))
Using only Python’s builtins:
from collections import Counter
import itertools as it
with open('LICENSE.txt') as f:
lines = (line.lower().split() for line in f)
words = it.chain.from_iterable(lines)
count = Counter(words)
results = dict(count.most_common(10))
Developing
$ git clone https://github.com/geowurster/tinyflow.git
$ cd tinyflow
$ pip install -e .\[all\]
$ pytest --cov tinyflow --cov-report term-missing
License
See LICENSE.txt
Changelog
See CHANGES.md
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file tinyflow-0.1.macosx-10.12-x86_64.tar.gz
.
File metadata
- Download URL: tinyflow-0.1.macosx-10.12-x86_64.tar.gz
- Upload date:
- Size: 13.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5cdbf7f936f867abb44814d197d262e9429d786c25d66b8f1743cb4160fcb51f |
|
MD5 | e9b473ed24f4d464b466f58a80d5d83e |
|
BLAKE2b-256 | ccb758adbe143cabeb24162a487b974339d8f33c2594d67490c7dca1db05372a |
File details
Details for the file tinyflow-0.1-py2.py3-none-any.whl
.
File metadata
- Download URL: tinyflow-0.1-py2.py3-none-any.whl
- Upload date:
- Size: 10.9 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 90c71e8d245725b418ee3528b455d1f3626b7432d1560e937205f3a70e150f1c |
|
MD5 | f11c7b3d63cadf1686de8df06f414a69 |
|
BLAKE2b-256 | 122f2fc055de29348a25111b6fce588a9ea013af8f5becb84aa606dbf3f47700 |