This is a pre-production deployment of Warehouse, however changes made here WILL affect the production instance of PyPI.
Latest Version Dependencies status unknown Test status unknown Test coverage unknown
Project Description

Because mapping and reducing isn’t supposed to be hard.

What is reductio?

Reductio is a minimalistic map-reduce framework for Python. It runs on top of Fabric and setuptools, which you might already use to get your code onto other machines.

It has no database. It has no distributed filesystem. It uses no server other than sshd. Because of this, it has essentially no memory requirement!

Reductio is designed for disk-bound big data tasks, which many of them are. If the pieces you need to map-reduce fit entirely in the RAM of your worker computers, you are paying a huge premium for that. And if they don’t, a system that tries to buffer things in RAM is going to be wasting all its effort. At some point, you won’t see your data again unless you write it to the damn disk, and that’s what is going to take most of the time.

I created reductio out of absolute necessity, so I could start crunching my data. You might notice its documentation is practically nonexistent at the moment. What Reductio does —————— Reductio extends Fabric (http://fabfile.org). It is meant to support the following approximate process:

  • Set up the appropriate Python environment on all your worker machines, including required packages, a place to keep the data, and an up-to-date version of the code you want to run.
  • Tell each worker machine how to contact all the other machines, so they can send their data onward (“scatter”) when they’re done with it.
  • Give each worker machine some Python functions to run over all the data. These will often be “maps” or “reduces”.
  • Collect results and group the things that belong together using Unix’s extremely optimized sort command.

Example

reductio/example/fabfile.py is in example of how to count all the letter bigrams (pairs of adjacent letters) in the ubiquitous “words” file, and aggregate them into a table of frequencies.

Running tasks

If you have a task defined as the function do_stuff in mymodule.py, but first you want to run setup to make sure code and other things are up to date, you would run them both with this command:

fab -f mymodule setup do_stuff

Why not Hadoop?

Face it, if you knew how to configure Hadoop and get your code to run with it, you’d be doing so right now.

Also, Hadoop is written for Java programmers, and Python is distinctly a second-class citizen in its world. Hadoop seems to think all Python code takes the form of standalone scripts with no dependencies, which perhaps says something about what Java programmers think of Python.

Reductio recognizes that none of this is going to work unless you have the right Python setup, so it builds on tools that Python programmers already use to deploy their code.

Why not Disco?

I approve of the Disco project (http://discoproject.org) and its goal of creating a map-reduce ecosystem designed around Python, but I find it too complex and too “magical” at the moment.

It makes it difficult to understand what is going on in its internals, and yet you have to understand its internals when something goes wrong or when you want to do something the designers didn’t expect.

Release History

Release History

0.2.1

This version

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.4

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.3

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.2

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1.1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

0.1

History Node

TODO: Figure out how to actually get changelog content.

Changelog content for this version goes here.

Donec et mollis dolor. Praesent et diam eget libero egestas mattis sit amet vitae augue. Nam tincidunt congue enim, ut porta lorem lacinia consectetur. Donec ut libero sed arcu vehicula ultricies a non tortor. Lorem ipsum dolor sit amet, consectetur adipiscing elit.

Show More

Download Files

Download Files

TODO: Brief introduction on what you do with files - including link to relevant help section.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date
reductio-0.2.1.tar.gz (9.5 kB) Copy SHA256 Checksum SHA256 Source Dec 5, 2011

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS HPE HPE Development Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting