Skip to main content

A light weight MapReduce framework for education.

Project description

Madoop: Michigan Hadoop

PyPI CI main codecov

Michigan Hadoop (madoop) is a light weight MapReduce framework for education. Madoop implements the Hadoop Streaming interface. Madoop is implemented in Python and runs on a single machine.

For an in-depth explanation of how to write MapReduce programs in Python for Hadoop Streaming, see our Hadoop Streaming tutorial.

Quick start

Install Madoop.

$ pip install madoop

Create example MapReduce program with input files.

$ madoop --example
$ tree example
example
├── input
│   ├── input01.txt
│   └── input02.txt
├── map.py
└── reduce.py

Run example word count MapReduce program.

$ madoop \
  -input example/input \
  -output example/output \
  -mapper example/map.py \
  -reducer example/reduce.py

Concatenate and print the output.

$ cat example/output/part-*
Goodbye 1
Bye 1
Hadoop 2
World 2
Hello 2

Comparison with Apache Hadoop and CLI

Madoop implements a subset of the Hadoop Streaming interface. You can simulate the Hadoop Streaming interface at the command line with cat and sort.

Here's how to run our example MapReduce program on Apache Hadoop.

$ hadoop \
    jar path/to/hadoop-streaming-X.Y.Z.jar
    -input example/input \
    -output output \
    -mapper example/map.py \
    -reducer example/reduce.py
$ cat output/part-*

Here's how to run our example MapReduce program at the command line using cat and sort.

$ cat input/* | ./map.py | sort | ./reduce.py
Madoop Hadoop cat/sort
Implement some Hadoop options All Hadoop options No Hadoop options
Multiple mappers and reducers Multiple mappers and reducers One mapper, one reducer
Single machine Many machines Single Machine
jar hadoop-streaming-X.Y.Z.jar argument ignored jar hadoop-streaming-X.Y.Z.jar argument required No arguments
Lines within a group are sorted Lines within a group are sorted Lines within a group are sorted

Contributing

Contributions from the community are welcome! Check out the guide for contributing.

Acknowledgments

Michigan Hadoop is written by Andrew DeOrio awdeorio@umich.edu.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

madoop-1.2.2.tar.gz (20.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

madoop-1.2.2-py3-none-any.whl (11.1 kB view details)

Uploaded Python 3

File details

Details for the file madoop-1.2.2.tar.gz.

File metadata

  • Download URL: madoop-1.2.2.tar.gz
  • Upload date:
  • Size: 20.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for madoop-1.2.2.tar.gz
Algorithm Hash digest
SHA256 a198df456bfc3ae2e630831e5a063571416ecad5dcdd82184c62608bcce857e7
MD5 e354f5575d2c83f2db4fc421ec6aa94b
BLAKE2b-256 88c705c0c7503bfac0dc98bb952448ffa49ee905d2d193aed1445b3fcbf152a8

See more details on using hashes here.

File details

Details for the file madoop-1.2.2-py3-none-any.whl.

File metadata

  • Download URL: madoop-1.2.2-py3-none-any.whl
  • Upload date:
  • Size: 11.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for madoop-1.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b304007ac515c4f8b8d31719dd44639560f88deff057ad3279514e744d246c0e
MD5 8de0485ed4ae90982b902ecf0e432df2
BLAKE2b-256 2107ca8eefe6d343ebbb3238ec1a85e52de53b8b9bb750256e09e6aad753c77f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page