Skip to main content

A light weight MapReduce framework for education.

Project description

Madoop: Michigan Hadoop

PyPI CI main codecov

Michigan Hadoop (madoop) is a light weight MapReduce framework for education. Madoop implements the Hadoop Streaming interface. Madoop is implemented in Python and runs on a single machine.

For an in-depth explanation of how to write MapReduce programs in Python for Hadoop Streaming, see our Hadoop Streaming tutorial.

Quick start

Install Madoop.

$ pip install madoop

Create example MapReduce program with input files.

$ madoop --example
$ tree example
example
├── input
│   ├── input01.txt
│   └── input02.txt
├── map.py
└── reduce.py

Run example word count MapReduce program.

$ madoop \
  -input example/input \
  -output example/output \
  -mapper example/map.py \
  -reducer example/reduce.py

Concatenate and print the output.

$ cat example/output/part-*
Goodbye 1
Bye 1
Hadoop 2
World 2
Hello 2

Comparison with Apache Hadoop and CLI

Madoop implements a subset of the Hadoop Streaming interface. You can simulate the Hadoop Streaming interface at the command line with cat and sort.

Here's how to run our example MapReduce program on Apache Hadoop.

$ hadoop \
    jar path/to/hadoop-streaming-X.Y.Z.jar
    -input example/input \
    -output output \
    -mapper example/map.py \
    -reducer example/reduce.py
$ cat output/part-*

Here's how to run our example MapReduce program at the command line using cat and sort.

$ cat input/* | ./map.py | sort | ./reduce.py
Madoop Hadoop cat/sort
Implement some Hadoop options All Hadoop options No Hadoop options
Multiple mappers and reducers Multiple mappers and reducers One mapper, one reducer
Single machine Many machines Single Machine
jar hadoop-streaming-X.Y.Z.jar argument ignored jar hadoop-streaming-X.Y.Z.jar argument required No arguments
Lines within a group are sorted Lines within a group are sorted Lines within a group are sorted

Contributing

Contributions from the community are welcome! Check out the guide for contributing.

Acknowledgments

Michigan Hadoop is written by Andrew DeOrio awdeorio@umich.edu.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

madoop-0.3.0.tar.gz (15.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

madoop-0.3.0-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file madoop-0.3.0.tar.gz.

File metadata

  • Download URL: madoop-0.3.0.tar.gz
  • Upload date:
  • Size: 15.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/29.0 requests/2.26.0 requests-toolbelt/0.9.1 urllib3/1.26.6 tqdm/4.59.0 importlib-metadata/3.7.2 keyring/23.0.0 rfc3986/1.4.0 colorama/0.4.4 CPython/3.9.10

File hashes

Hashes for madoop-0.3.0.tar.gz
Algorithm Hash digest
SHA256 40168b6cd4eabde3c25e2a3bb8b746e25829e503afa707f548322145c5d1e29b
MD5 9c6041c23c4a349b5bd0997f58d77dd7
BLAKE2b-256 7d652e2cba935c22b3a76e139adc7ec0714e16b55568a8b17137fd4da619b9f3

See more details on using hashes here.

File details

Details for the file madoop-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: madoop-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 8.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/29.0 requests/2.26.0 requests-toolbelt/0.9.1 urllib3/1.26.6 tqdm/4.59.0 importlib-metadata/3.7.2 keyring/23.0.0 rfc3986/1.4.0 colorama/0.4.4 CPython/3.9.10

File hashes

Hashes for madoop-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b557ee8ce8f7ffd5bda44902be8c1219fcdf0441afb1221ec3c465b6e4caa159
MD5 5de5c5e5e78aca35876cac5aa7a12f37
BLAKE2b-256 adbd1b3f35e92b8d37631a82c069d5610aa28865067780c8db8900324d2b98a7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page