Skip to main content

A light weight MapReduce framework for education.

Project description

Madoop: Michigan Hadoop

PyPI CI main codecov

Michigan Hadoop (madoop) is a light weight MapReduce framework for education. Madoop implements the Hadoop Streaming interface. Madoop is implemented in Python and runs on a single machine.

For an in-depth explanation of how to write MapReduce programs in Python for Hadoop Streaming, see our Hadoop Streaming tutorial.

Quick start

Install Madoop.

$ pip install madoop

Create example MapReduce program with input files.

$ madoop --example
$ tree example
example
├── input
│   ├── input01.txt
│   └── input02.txt
├── map.py
└── reduce.py

Run example word count MapReduce program.

$ madoop \
  -input example/input \
  -output example/output \
  -mapper example/map.py \
  -reducer example/reduce.py

Concatenate and print the output.

$ cat example/output/part-*
Goodbye 1
Bye 1
Hadoop 2
World 2
Hello 2

Comparison with Apache Hadoop and CLI

Madoop implements a subset of the Hadoop Streaming interface. You can simulate the Hadoop Streaming interface at the command line with cat and sort.

Here's how to run our example MapReduce program on Apache Hadoop.

$ hadoop \
    jar path/to/hadoop-streaming-X.Y.Z.jar
    -input example/input \
    -output output \
    -mapper example/map.py \
    -reducer example/reduce.py
$ cat output/part-*

Here's how to run our example MapReduce program at the command line using cat and sort.

$ cat input/* | ./map.py | sort | ./reduce.py
Madoop Hadoop cat/sort
Implement some Hadoop options All Hadoop options No Hadoop options
Multiple mappers and reducers Multiple mappers and reducers One mapper, one reducer
Single machine Many machines Single Machine
jar hadoop-streaming-X.Y.Z.jar argument ignored jar hadoop-streaming-X.Y.Z.jar argument required No arguments
Lines within a group are sorted Lines within a group are sorted Lines within a group are sorted

Contributing

Contributions from the community are welcome! Check out the guide for contributing.

Acknowledgments

Michigan Hadoop is written by Andrew DeOrio awdeorio@umich.edu.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

madoop-1.3.2.tar.gz (21.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

madoop-1.3.2-py3-none-any.whl (11.7 kB view details)

Uploaded Python 3

File details

Details for the file madoop-1.3.2.tar.gz.

File metadata

  • Download URL: madoop-1.3.2.tar.gz
  • Upload date:
  • Size: 21.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for madoop-1.3.2.tar.gz
Algorithm Hash digest
SHA256 5299d1d6c6299cf150ea88fc38df9142dc39a2a0f64a2c4fa2cb713e7d07d2b4
MD5 82982afdf56362312645e8f640f1c449
BLAKE2b-256 f07d7d9cdc3ea0be4bd09ec0e4e4a795266b3b4246545ce16d40b61f28be1dc9

See more details on using hashes here.

File details

Details for the file madoop-1.3.2-py3-none-any.whl.

File metadata

  • Download URL: madoop-1.3.2-py3-none-any.whl
  • Upload date:
  • Size: 11.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for madoop-1.3.2-py3-none-any.whl
Algorithm Hash digest
SHA256 e5a6f950debed50cc27b34074b717b31c69681f986931690a5594f5848014052
MD5 72448cd5a326ef64f335566ed9412edc
BLAKE2b-256 56d0041b34b603a89449326a25a17c623e157075f0d21d26e2974d541cedfafc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page