First Automated Data Preparation library powered by Deep Learning to automatically clean and prepare TBs of data on clusters at scale.
Project description
mltrons-auto-data-prep :Tool kit that automate the Data Preparation
What is it?
Mltrons-auto-data-prep is a Python package providing flexible and automated way of data preparation in any size of the raw data.It uses Machine Learning and Deep Leaning techniques with the pyspark back-end architecture to clean and prepare TBs of data on clusters at scale.
Main Features
Here are just a few of the things that Mltrons-auto-data-prep does well:
-
Handle Any size of data even in Tbs using Py-spark
-
Filter out Features with Null values more than the threshold
-
Filter out Features with same value for all rows
-
Automatically detects the data type of features
-
Automatically detects datetime features and split in multiple usefull features
-
Automatically detects features containing URLs and remove duplications
-
Automatically detects Skewed features and minimize skewness
Where to get it
The source code is currently hosted on GitHub at: https://github.com/ms8909/mltrons-auto-data-prep
The pypi project is at : https://pypi.org/project/mltronsAutoDataPrep/
How to install
pip install mltronsAutoDataPrep
Dependencies
Code Architecture
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for mltronsAutoDataPrep-0.0.7-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a581266f031e79729574450a1c259a6c08f1b21d3c1f2de6cae48129682b5a8c |
|
MD5 | 658424fecdac0acb2016273752343712 |
|
BLAKE2b-256 | 4f3eb8cf9ed74972123e71f0022eab17aca665d8bf3e86995f56fe24b4c9b700 |