First Automated Data Preparation library powered by Deep Learning to automatically clean and prepare TBs of data on clusters at scale.
Project description
mltrons-auto-data-prep :Tool kit that automate the Data Preparation
What is it?
Mltrons-auto-data-prep is a Python package providing flexible and automated way of data preparation in any size of the raw data.It uses Machine Learning and Deep Leaning techniques with the pyspark back-end architecture to clean and prepare TBs of data on clusters at scale.
Main Features
Here are just a few of the things that Mltrons-auto-data-prep does well:
-
Handle Any size of data even in Tbs using Py-spark
-
Filter out Features with Null values more than the threshold
-
Filter out Features with same value for all rows
-
Automatically detects the data type of features
-
Automatically detects datetime features and split in multiple usefull features
-
Automatically detects features containing URLs and remove duplications
-
Automatically detects Skewed features and minimize skewness
Where to get it
The source code is currently hosted on GitHub at: https://github.com/ms8909/mltrons-auto-data-prep
The pypi project is at : https://pypi.org/project/mltronsAutoDataPrep/
How to install
pip install mltronsAutoDataPrep
Dependencies
Code Architecture
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for mltronsAutoDataPrep-0.0.8-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 08bf7f1d1721314846386f36801c2b6392d397f7d97ccbd1964f59777e3d01a2 |
|
MD5 | 7c07ccddd97c12ee5cb42f67b5202dee |
|
BLAKE2b-256 | cd4afce5904045d2ffd82e03eef56c4b91a34ce5616cd5d3b3960fa03e8a0e04 |