Skip to main content

Distributed Pipeline Framework for Python

Project description

Lineup: Distributed Pipeline Framework for Python

. image:: https://travis-ci.org/gabrielfalcao/lineup.png

Lineup is a redis-based pipeline framework that turns horizontal scalling seamless.

It’s currently providing parallelism through python threads and is pretty useful for writing systems where the workers make lots of network I/O.

It scales horizontally, so you can simply run more workers in as many machines you want.

Installing

easy_install curdling
curd install lineup

Philosophy

Lineup focus in:

  1. Simplicity: easy to create otherwise complex pipelines
  2. Easy-scale: just run more workers and you’re good.

Defining steps

Steps must always implement the method consume(self, instructions) and always call self.produce() with it’s portion of work.

# myapp/tasks.py

from lineup import Step

import requests

class Scraper(Step):
    def consume(self):
        url = instructions['url']
        response = requests.get(url)

        # pretend you generated a
        # list of things or general
        # metadata

        results = [
            'https://2.gravatar.com/avatar/666e2844d622f8714e8ccf8ffb48a47c'
            'https://1.gravatar.com/avatar/b9aa05d9efc6a3c8eda50f7763ad0715'
            'https://0.gravatar.com/avatar/605d445205b61ec11185a28dc4ab9323'
            'https://0.gravatar.com/avatar/666e2844d622f8714e8ccf8ffb48a47c'
            'https://1.gravatar.com/avatar/666e2844d622f8714e8ccf8ffb48a47c'
            'https://0.gravatar.com/avatar/29701ae503ec7d9e670edaf095503067'
            'https://2.gravatar.com/avatar/605d445205b61ec11185a28dc4ab9323'
            'https://2.gravatar.com/avatar/68edef29d4c6826af22d6fcbbf8f1084'
        ]

        self.produce({
            'images-to-download': results,
        })


class Downloader(Step):
    import re

    def make_filename(self, index):
        original_url = self.payload['initial']['url']
        slug = re.sub(r'\W+', '-', original_url)
        return ".".join([slug.strip('-'), index,'.png'])

    def consume(self, instructions):
        images_to_download = instructions['images-to-download']
        filenames = []
        for index, image in enumerate(images_to_download):
            filename = self.make_filename(index)
            response = requests.get(image)

            with open(filename) as f:
                f.write(response.content)
                filenames.append(filename)

        self.produce({
            'filenames': filenames
        })

Defining pipelines

# myapp/pipelines.py

from lineup import Pipeline
from myapp.steps import Scraper, Downloader

class GravatarScraping(Pipeline):
    name = 'gravatars-from-github'

    steps = [
        Scraper,
        Downloader
    ]

Running

lineup gravatars-from-github {'url': 'https://github.com/trending/developers'}

Contributing

Install Dependencies

curd install -r development.txt

Run tests

make test

instanc.es Badge

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for lineup, version 0.0.1
Filename, size File type Python version Upload date Hashes
Filename, size lineup-0.0.1.tar.gz (8.3 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page