Skip to main content

a scalable, distributed task execution system

Project description

Rezina
=======

rezina is a scalable, distributed and easy to use system for executing python code in parallel across multiple processors or many machines. It provides a simple way to make parallel programming in python more easier, flexible and scalable and shipped with features like periodically schedule, load balance, fail tolerate, dynamically add tasks and save result to backend.

why use rezina?
================

Consider this task, I want to gather weather data of ten cities from yahoo weather api. I wrote a scirpt `cityweather.py` to do this.
::
liufujiaodeMacBook-Pro:workspace liufujiao$ time python cityweather.py
{'city': 'Beijing', 'weather': u'Mostly Sunny'}
{'city': 'Berlin', 'weather': u'Partly Cloudy'}
{'city': 'New York', 'weather': u'Mostly Clear'}
{'city': 'London', 'weather': u'Mostly Cloudy'}
{'city': 'Tokyo', 'weather': u'Mostly Sunny'}
{'city': 'Paris', 'weather': u'Mostly Cloudy'}
{'city': 'Chicago', 'weather': u'Partly Cloudy'}
{'city': 'washington', 'weather': u'Clear'}
{'city': 'Venice', 'weather': u'Breezy'}
{'city': 'Houston', 'weather': u'Breezy'}

real 0m18.726s
user 0m0.080s
sys 0m0.034s

The whole process of fetching data take 20s and we could think one city takes 2s on average.

Now here is the problem, what if I want to get the ten weather data every 10 seconds so I could always know the least weather? this requires all data should be fetched in 10s.

One simple way to do this is we could start more threads (or processes) to fetch data in parallel and implement logic to run periodically.

Problem again, now I want to get weather data of 50,000 cities so I can build a weather website to provide weather conditions for my user, how to do it?

One server could not launch 10,000 threads. maybe we could split all cites into several parts and run every part with threads on different servers, but this kind of solution need consider the following problems.

* what if one or more servers down?
* what if we want to add or remove some cites?
* what if we want to launch more processes for time consuming logic?
* what if we want to change the fetch interval from 10s to 5s.
* what if we want to save data in another backend?
* …

We need a easier and flexible way to execute tasks in parallel without pay much attentions on these problems and make us focus only on our business logic. this is the motivation of rezina project.

See Quick Guide for how rezina do this.

Support
==========
**Python**

* python2.7
* python3.5

**OS**

* OS X
* Linux

Dependencies
=============

rezina requires `pyzmq <https://github.com/zeromq/pyzmq>`_ for message transports.

pyzmq is python bindings for `ØMQ <http://zeromq.org/>`_. ØMQ is a lightweight and fast messaging implementation.

Installaion
=============


You can install reizna either via the Python Package Index (PyPI) or from source.

To install using pip:

``pip install rezina``

Downloading and installing from source
---------------------------------------

Before install rezina, `building-and-installation <https://github.com/zeromq/pyzmq#building-and-installation>`_
pyzmq first.

After pyzmq installed, download the latest version of rezina from PyPI:

http://pypi.python.org/pypi/rezina

and install it by doing the following:

``pip install /path/to/rezina-0.x.y.tar.gz``

or

``tar xvfz rezina-0.x.y.tar.gz``

``cd rezina-0.x.y``

``python setup.py install``

Quick Guide
==============

Start rezina
---------------

Once rezina is installed, you can run

``rezina-cli runmaster -H master_ip``

to start rezina master, and run

``rezina-cli runworker -H master_ip -WIP worker_ip``

to start rezina worker

.. note:: we could use `-D` to run master as daemon and `-W` to specify a new workspace. please see documentation for starting rezina correctly.

Example cityweather
--------------------

cityweather code
^^^^^^^^^^^^^^^^^

script name: ``cityweather.py``

put this script into rezina workspace (``~/rezina/workspace`` by default, use -W /path/to/your/workspace when starting master if you want to change it)

::

#!/usr/bin/evn python

import urllib2
import urllib
import json


def get_cities():
cities = ['Beijing', 'Berlin', 'New York', 'London', 'Tokyo', 'Paris',
'Chicago', 'washington', 'Venice', 'Houston']
return cities


# get city weather data from yahoo weather api
def get_city_weather(city):
baseurl = "https://query.yahooapis.com/v1/public/yql?"
yql_query = "select item.condition.text from weather.forecast \
where woeid in (select woeid from geo.places(1) \
where text='%s')" % (city)
yql_url = baseurl + urllib.urlencode({'q': yql_query}) + "&format=json"
result = urllib2.urlopen(yql_url).read()
data = json.loads(result)
# because resule from yahoo api does not include the city name, we add it.
data['city'] = city
return data


# process diffrent output and convert data to a simple format
def one_word_conditions_for_city(city_weather_result):
simple_format_data = {}
simple_format_data['city'] = city_weather_result['city']
if city_weather_result['query']['results'] is not None:
weather = city_weather_result['query']['results']['channel']['item']['condition']['text']
else:
weather = "Unkonw" # simplely set unkonw when result is not avaliable
simple_format_data['weather'] = weather
return simple_format_data

if __name__ == "__main__":
for city in get_cities():
print one_word_conditions_for_city(get_city_weather(city))



Build a typology to run cityweather with rezina
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

script name: ``weathertypo.py``

put it info rezina workspace

.. hint:: you could regard hydrant, notch, bocca as input, filter, output respectively for now. A typology looks like `input | filter1 | filter2 | output` in shell. check the Documentation for more info.


run ``get_city_weather`` function with 2 processes and every process run 5 threads and each thread fetch one city.

run ``one_word_conditions_for_city`` function with 1 process with 1 thread because it is not time consuming one.

::

#!/usr/bin/env python

from rezina.utils.network import get_ip
from rezina import TypologyBuilder
from rezina.backends import Stdout

from cityweather import get_cities, get_city_weather, one_word_conditions_for_city

ip = get_ip() # get master_ip
tb = TypologyBuilder(ip, 12345, 'weather_typo2')
tb.add_hydrant(get_cities)
tb.add_notch(get_city_weather, 2, 5)
tb.add_notch(one_word_conditions_for_city, 1, 1)
tb.add_bocca(Stdout, persistent_mode='stream')

if __name__ == "__main__":
tb.restart(interval=10)


run typology
^^^^^^^^^^^^

rezina typology file is just a python script, run it with

``python weathertypo.py`` or ``./weathertypo.py``` and you get the results.

you could also save the results of your typology to another backend rather than print them.

See documentation for more details.

rezina console
----------------

rezina provides command line tool and web console to manage master, workers, typologies.

you could run

```rezina-cli runconsole -H master_ip``

to start cml or access ``master_ip:31218`` to see web console.

Documentation
================

See http://rezina.readthedocs.io/en/latest/ for more info.


Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rezina-0.1.1.tar.gz (2.6 MB view hashes)

Uploaded Source

Built Distribution

rezina-0.1.1-py2.py3-none-any.whl (2.5 MB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page