A Python library for Automated Exploratory Data Analysis, Automated Data Cleaning and Automated Data Preprocessing For Machine Learning and Natural Language Processing in Python.
Project description
Data-Purifier
A Python library for Automated Exploratory Data Analysis, Automated Data Cleaning and Automated Data Preprocessing For Machine Learning and Natural Language Processing in Python.
Features
-
It gives shape, number of categorical and numerical features, description of the dataset, and also the information about the number of null values and their respective percentage.
-
For understanding the distribution of datasets and getting useful insights, there are many interactive plots generated where the user can select his desired column and the system will automatically plot it. Plot includes
- Count plot
- Correlation plot
- Joint plot
- Pair plot
- Pie plot
Get Started
Install the packages
pip install data-purifier
python -m spacy download en_core_web_sm
Load the module
from datapurifier import Mleda, Nleda, Nlpurifier
Load the dataset and let the magic of automated EDA begin
df = pd.read_csv("./datasets/iris.csv")
ae = Mleda(df)
ae
For Automated EDA and Automated Data Cleaning of NL dataset, load the dataset and pass the dataframe along with the targeted column containing textual data.
nlp_df = pd.read_csv("./datasets/twitter16m.csv", header=None, encoding='latin-1')
nlp_df.columns = ["tweets","sentiment"]
Automated EDA
For Basic EDA, pass the argument basic
as argument in constructor
%%time
eda = Nlpeda(nlp_df, "tweets", analyse="basic")
eda.df
For Word based EDA, pass the argument word
as argument in constructor
%%time
eda = Nlpeda(nlp_df, "tweets", analyse="word")
eda.unigram_df # for seeing unigram datfarame
Automated Data Cleaning
pure = Nlpurifier(nlp_df, "tweets")
View the processed and purified dataframe
pure.df
Example: https://colab.research.google.com/drive/1J932G1uzqxUHCMwk2gtbuMQohYZsze8U?usp=sharing
Python Package: https://pypi.org/project/data-purifier/
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Hashes for data_purifier-0.1.3-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | d2dbb32b3a12285cc00e98fbf823014fc8ca119c25b6fea7bcba9c0b477735b0 |
|
MD5 | e879eaf7569778f60451745924951659 |
|
BLAKE2b-256 | 44d32da2fbddb7eebba4771f485e58887a9f1682cb81bafa8df895d50935b2e6 |