Skip to main content

A light weight MapReduce framework for education.

Project description

Madoop: Michigan Hadoop

PyPI CI main codecov

Michigan Hadoop (madoop) is a light weight MapReduce framework for education. Madoop implements the Hadoop Streaming interface. Madoop is implemented in Python and runs on a single machine.

For an in-depth explanation of how to write MapReduce programs in Python for Hadoop Streaming, see our Hadoop Streaming tutorial.

Quick start

Install Madoop.

$ pip install madoop

Create example MapReduce program with input files.

$ madoop --example
$ tree example
example
├── input
│   ├── input01.txt
│   └── input02.txt
├── map.py
└── reduce.py

Run example word count MapReduce program.

$ madoop \
  -input example/input \
  -output example/output \
  -mapper example/map.py \
  -reducer example/reduce.py

Concatenate and print the output.

$ cat example/output/part-*
Goodbye 1
Bye 1
Hadoop 2
World 2
Hello 2

Comparison with Apache Hadoop and CLI

Madoop implements a subset of the Hadoop Streaming interface. You can simulate the Hadoop Streaming interface at the command line with cat and sort.

Here's how to run our example MapReduce program on Apache Hadoop.

$ hadoop \
    jar path/to/hadoop-streaming-X.Y.Z.jar
    -input example/input \
    -output output \
    -mapper example/map.py \
    -reducer example/reduce.py
$ cat output/part-*

Here's how to run our example MapReduce program at the command line using cat and sort.

$ cat input/* | ./map.py | sort | ./reduce.py
Madoop Hadoop cat/sort
Implement some Hadoop options All Hadoop options No Hadoop options
Multiple mappers and reducers Multiple mappers and reducers One mapper, one reducer
Single machine Many machines Single Machine
jar hadoop-streaming-X.Y.Z.jar argument ignored jar hadoop-streaming-X.Y.Z.jar argument required No arguments
Lines within a group are sorted Lines within a group are sorted Lines within a group are sorted

Contributing

Contributions from the community are welcome! Check out the guide for contributing.

Acknowledgments

Michigan Hadoop is written by Andrew DeOrio awdeorio@umich.edu.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

madoop-1.3.0.tar.gz (20.9 kB view details)

Uploaded Source

Built Distribution

madoop-1.3.0-py3-none-any.whl (11.4 kB view details)

Uploaded Python 3

File details

Details for the file madoop-1.3.0.tar.gz.

File metadata

  • Download URL: madoop-1.3.0.tar.gz
  • Upload date:
  • Size: 20.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.13.0

File hashes

Hashes for madoop-1.3.0.tar.gz
Algorithm Hash digest
SHA256 b897588572384187e333a182beafaedde0aaa350a0e62b4b59ce6f74d45c4432
MD5 9a987d3f97bdf4ff81fe3bf1054e2814
BLAKE2b-256 2baa50559a4248cc2eb15b0e1c173472be43824e41ea4e203fd48aaf987f04b2

See more details on using hashes here.

File details

Details for the file madoop-1.3.0-py3-none-any.whl.

File metadata

  • Download URL: madoop-1.3.0-py3-none-any.whl
  • Upload date:
  • Size: 11.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.13.0

File hashes

Hashes for madoop-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 e8b709be5c65f10f1ff4091af36f80c4084028f48255507fa8ab644387efcfab
MD5 110ee893cee68c6fd15f135739134536
BLAKE2b-256 73c2fab53dc23d3fb17bb4b93303e2838ba9c299ae80ec283f0bd48d4a3396cd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page