Skip to main content

Optimus is the missing framework for cleaning and preprocessing data in a distributed fashion with pyspark.

Project description

Optimus is the missing framework for cleaning and pre-processing data in a distributed fashion. It uses all the power of Apache Spark (optimized via Catalyst) to do it. It implements several handy tools for data wrangling and munging that will make your life much easier. The first obvious advantage over any other public data cleaning library is that it will work on your laptop or your big cluster, and second, it is amazingly easy to install, use and understand.

  • Requirements:

  • Apache Spark 2.2.0

  • Python>=3.5

  • Installation:

In your terminal just type:

$ pip install optimuspyspark

  • Contributors:

  • Project Manager: Argenis León.

  • Original Developers: Andrea Rosales, Hugo Reyes, Alberto Bonsanto.

  • Principal developer and maintainer: Favio Vázquez.

  • License:

Apache 2.0 © Iron

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

optimuspyspark-1.2.1.tar.gz (37.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

optimuspyspark-1.2.1-py3-none-any.whl (41.7 kB view details)

Uploaded Python 3

File details

Details for the file optimuspyspark-1.2.1.tar.gz.

File metadata

File hashes

Hashes for optimuspyspark-1.2.1.tar.gz
Algorithm Hash digest
SHA256 5ce3aa3020fde8d8351dde5f7b308551711ac799d4f48baf9c7966af6bb46281
MD5 21c0d849444e7a3947bf820a48818a11
BLAKE2b-256 b9361b69950d98d34917e7d4d96960c28dc7ee4978cdec829345053c9988dcfc

See more details on using hashes here.

File details

Details for the file optimuspyspark-1.2.1-py3-none-any.whl.

File metadata

File hashes

Hashes for optimuspyspark-1.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e4de2b31f4ab8d9b83534027375ca99c06dca39f6fbda406f73c22398c0c66ca
MD5 b6c84f1c42738cb0f87138699556396a
BLAKE2b-256 562199e280f6644478e239249032324704ece59cbb47675e5d34b00f78f3b5d3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page