Skip to main content
This is a pre-production deployment of Warehouse. Changes made here affect the production instance of PyPI (
Help us improve Python packaging - Donate today!

A pure-python highly-distributed MapReduce cluster.

Project Description

Having just finished reading the original Google MapReduce paper, I obviously felt the need to try to implement such a system in Python.

My goals are to implement enough of the functionality described in the paper to be usable, though I strongly warn against ever using this code for anything real.

Since one of the goals (see Goals, below) is simplicity from an end-user standpoint, I am following some of Kenneth Reitz’s advice and starting with a readme and documentation.


The canonical word-count example:

from pluribus import job

def emit_words(key, value):
    # key: document name
    # value: document contents
    for word in value.split():
        yield word, 1

def sum_occurences(key, values):
    # key: a word
    # values: a list of counts
    return sum(values)

Assuming you’re running everything on one host, you can ignore the network connection information.

Start a pluribus master:

$ pluribus master

Start a pluribus worker (or several hundred):

$ pluribus worker

On the master or on another machine that can talk to the master:

$ pluribus job myjob
# ... wait


Explicit goals are:

  • Simple to use, both as an administrator and end-user.
  • Well-documented.
  • Robust to worker failure.
  • Fast-enough.
  • Use only the Python (2.7+) standard library (at least to run).

Explicit non-goals are:

  • Be a filesystem.
  • Robust to master failure.

Release History

This version
History Node


Supported By

Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Google Google Cloud Servers DreamHost DreamHost Log Hosting