Skip to main content

A pipeline library for Python that cuts down your boilerplate code.

Project description

Tubo is a library that provides a simple pipeline system for Python.

Unix pipe system is an excellent example of the concept of separation of responsibility. Each utility does a single thing well. This increases readability, maintainability of code and code reuse. Tubo wants to bring this abstraction to Python.

Installation

pip install tubo

Usage

You have a source of iterable items and you want to perform various operations on them. In a Unix-like system you would write something like this:

cat foo.txt | op1 | op2 | op3

Using Tubo, instead, you would use Python and you would write something like this:

>>> output = tubo.pipeline(file('foo.txt'), op1, op2, op3)

And the output would be available for you, to print it or to further transform it as you prefer. The advantage is that you can write the operations in Python, giving you a lot of flexibility.

Create a pipeline

The central part of Tubo is the method tubo.pipeline. It accepts an arbitrary number of arguments, the first being a data source and the following being operations on iterable data, defined using python generators.

Each operation should `yield` something, so that the following operation can work.

Example: capitalize words that contain a i letter.

text = ['italy', 'germany', 'brazil', 'france', 'england',
    'argentina', 'peru', 'united states', 'australia',
    'sweden', 'china', 'poland', 'portugal']

def capitalize(lines):
    for line in lines:
        for word in line.split(","):
            yield word.capitalize()

def filter_wordwith_i(words):
    for word in words:
        if 'i' in word:
          yield word

output = tubo.pipeline(
    text,
    filter_wordwith_i,
    capitalize,
)

At this point, output is an iterable, and we can do anything we want with it. We can print it or further transform it.

Merge two or more iterables

Sometimes, you need to write functions that take two or more inputs, and process them. In this case, you need to write an operation that accepts a list of iterables.

Example: interleave lines from two or more files (such as the utility paste)

def interleave(listoflines):
    for lines in itertools.izip(*listoflines):
        yield ''.join(lines)

output = tubo.pipeline(
    (file('file1.txt'), file('file2.txt')),
    interleave
)

Consume iterators at C-speed

Once you have your pipeline, it’s time to consume it.

tubo.consume(output)

# Equivalent to:
#
# for element in output:
#     pass

This consumes the iterator at C-speed, and uses this recipe.

Examples

Reverse text of unique lines, append the number of lines

def uniq(lines):
    seen = set()
    for line in lines:
        if line not in seen:
            seen.add(line)
            yield line

def reverse_string(lines):
    for line in lines:
        yield ''.join(reversed(line))

def append_nlines(lines):
    for nlines, line in enumerate(lines):
        yield line
    yield "\nTotal Number of lines: {}".format(nlines+1)

output = tubo.pipeline(
    open(filename),
    uniq,
    reverse_string,
    append_nlines,
)

Concatenate two files 1st words

When we need to merge two inputs, or two results of different pipes, we will use the functions merge and merge_longest, which will

def select_Nth_word(N, lines):
    for line in lines:
        yield line.split(' ')[N]
select_first_word = functools.partial(select_Nth_word, 0)
select_second_word = functools.partial(select_Nth_word, 1)

def concatenate(words):
    for word1, word2 in words:
        yield "{} {}".format(word1, word2)

pipeline1 = tubo.pipeline(
    open(fname1),
    select_first_word,
)
pipeline2 = tubo.pipeline(
    open(fname2),
    select_second_word,
)
output = tubo.pipeline(
    tubo.merge(
        pipeline1,
        pipeline2,
    ),
    concatenate
)

Credits

The library was inspired from a post by Christoph Rauch.

History

0.1.1 (2014-08-02)

Now works for Python3. Wheel added.

0.1.0 (2014-08-02)

Initial concept.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tubo-0.1.1.tar.gz (3.9 kB view details)

Uploaded Source

Built Distribution

tubo-0.1.1-py2.py3-none-any.whl (6.9 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file tubo-0.1.1.tar.gz.

File metadata

  • Download URL: tubo-0.1.1.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for tubo-0.1.1.tar.gz
Algorithm Hash digest
SHA256 13bc9e1b12085bb52ec0eeb8a73e65e7ca571956bb94129bbee5c5a58c23b71c
MD5 8ec64b01e1fafc511ca9358c5f6e2158
BLAKE2b-256 646ef59e21d6453d37126ecf94b536f8e26c5255a3e87acb40a66a712f9fefd7

See more details on using hashes here.

File details

Details for the file tubo-0.1.1-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for tubo-0.1.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 6fa9f73d68dae794e024cbcf68684d269519f543e223314dbab05a699dafb50b
MD5 a9bcef604a65ffb4916a59395527d975
BLAKE2b-256 ac97d0516efb74292b60155bdab44f1f420d1b25b6336ce4efb7557ba70cd6e1

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page