Skip to main content

First Automated Data Preparation library powered by Deep Learning to automatically clean and prepare TBs of data on clusters at scale.

Project description

mltrons-auto-data-prep :Tool kit that automate the Data Preparation

What is it?

Mltrons-auto-data-prep is a Python package providing flexible and automated way of data preparation in any size of the raw data.It uses Machine Learning and Deep Leaning techniques with the pyspark back-end architecture to clean and prepare TBs of data on clusters at scale.

Main Features

Here are just a few of the things that Mltrons-auto-data-prep does well:

  • Handle Any size of data even in Tbs using Py-spark

  • Filter out Features with Null values more than the threshold

  • Filter out Features with same value for all rows

  • Automatically detects the data type of features

  • Automatically detects datetime features and split in multiple usefull features

  • Automatically detects features containing URLs and remove duplications

  • Automatically detects Skewed features and minimize skewness

Where to get it

The source code is currently hosted on GitHub at: https://github.com/ms8909/mltrons-auto-data-prep

The pypi project is at : https://pypi.org/project/mltronsAutoDataPrep/

How to install

pip install mltronsAutoDataPrep

Dependencies

Code Architecture

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

mltronsAutoDataPrep-0.0.7-py3-none-any.whl (34.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page