Simple & Easy-to-use python modules to perform Quick Exploratory Data Analysis for any structured dataset!

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

Quick-EDA

Simple & Easy-to-use python modules to perform Quick Exploratory Data Analysis for any structured dataset!

QuickDA

Getting Started

You will need to have Python 3 and Jupyter Notebook installed in your local system. Once installed, clone this repository to your local to get the project structure setup.

git clone https://github.com/sid-the-coder/QuickDA.git

You will also need to install few python package dependencies in your evironment to get started. You can do this by:

pip3 install -r requirements.txt

OR you can also install the package from PyPi Index using the pip installer:

pip3 install quickda

Data Exploration - explore(data)
- data: pd.DataFrame
- method: string, default="summarize"
  - "summarize" : Generates a summary statistics of the dataset
  - "profile" : Generates a HTML Report of the Dataset Profile
- report_name: string, default="Dataset Report"
  - Parameter to customise the generated report name
- is_large_dataset: Boolean, default=False
  - Parameter set to True explicitly to flag, in case of a large dataset
Data Cleaning - clean(data) : [Returns DataFrame]
- data: pd.DataFrame
- method: string, default="default"
  - "default" : Standardizes column names, Removes duplicates rows and Drops missing values
  - "standardize" : Standardizes column names
  - "dropcols" : Drops columns specified by the user
  - "duplicates" : Removes duplicate rows
  - "replaceval" : Replaces a value in dataframe with new value specified by the user
  - "fillmissing" : Interpolates all columns with missing values using forward filling
  - "dropmissing" : Drops all rows with missing values
  - "cardinality" : Reduces Cardinality of a column given a threshold
  - "dtypes" : Explicitly converts the Data Types as specified by the user
  - "outliers" : Removes all outliers in data using IQR method
- columns: list/string, default=[]
  - Parameter to specify column names in the DataFrame
- dtype: string, default="numeric"
  - "numeric" : Converts columns dtype to numeric
  - "category" : Converts columns dtype to category
  - "datetime" : Converts columns dtype to datetime
- to_replace: string/integer/regex, default=""
  - Parameter to pass a value to replace in the DataFrane
- value: string/integer/regex, default=np.nan
  - Parameter to pass a new value that replaces an old value in the Dataframe
- threshold: float, default=0
  - Parameter to set threshold in the range of [0,1] for cardinality
EDA Numerical Features - eda_num(data)
- data: pd.DataFrame
- method: string, default="default"
  - "default" : Shows all Outlier & Distribution Analysis via BoxPlots & Histograms
  - "correlation" : Gets the correlation matrix between all numerical features
- bins: integer, default=10
  - Parameter to set the number of bins while displaying histograms
EDA Categorical Features - eda_cat(data, x)
- data: pd.DataFrame
- x: string, First Categorical Type Column Name
- y: string, default=None
  - Parameter to pass the Second Categorical Type Column Name
- method: string, default="default"
  - "default" : Shows category count plot & summarizes it in a frequency table
EDA Numerical with Categorical Features - eda_numcat(data, x, y)
- data: pd.DataFrame
- x: string/list, Numeric/Categorical Type Column Name(s)
- y: string/list, Numeric/Categorical Type Column Name(s)
- method: string, default="pps"
  - "pps" : Calculates Predictive Power Score Matrix
  - "relationship" : Shows Scatterplot of given features
  - "comparison" : Shows violin plots to compare categories across numerical features
  - "pivot" : Generates pivot table using column names, values and aggregation function
- hue: string, default=None
  - Parameter to visualise a categorical Type feature within scatterplots
- values: string/list, default=None
  - Parameter to set columns to aggregate on pivot views
- aggfunc: string, default="mean"
  - Parameter to set aggregate functions on pivot tables
  - Example: 'min', 'max', 'mean', 'median', 'sum', 'count'
EDA Time Series Data - eda_timeseries(data, x, y)
- data: pd.DataFrame
- x: string, Datetime Type Column Name
- y: string, Numeric Type Column Name

Upcoming Work

Basic Preprocessing for Text Data - Tokenization, Normalization, Noise Removal, Lemmatization
EDA for Text Data - NGrams, POS tagging, Word Cloud, Sentiment Analysis
Quick Insight Generation for all EDA steps - Generate easy-to-read textual insights

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.2.2

Nov 24, 2020

This version

0.2.1

Nov 20, 2020

0.2.0

May 26, 2020

0.1.9

May 26, 2020

0.1.8

May 26, 2020

0.1.7

May 26, 2020

0.1.6

May 26, 2020

0.1.5

May 26, 2020

0.1.4

May 24, 2020

0.1.3

May 23, 2020

0.1.2

May 23, 2020

0.1.1

May 23, 2020

0.1.0

May 23, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quickda-0.2.1.tar.gz (7.4 kB view details)

Uploaded Nov 20, 2020 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

quickda-0.2.1-py3-none-any.whl (9.7 kB view details)

Uploaded Nov 20, 2020 Python 3

File details

Details for the file quickda-0.2.1.tar.gz.

File metadata

Download URL: quickda-0.2.1.tar.gz
Upload date: Nov 20, 2020
Size: 7.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.4

File hashes

Hashes for quickda-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`1b2b8047eb47cea0c7daff81a20cb48e5f686b7228978ff222af2c472b47fa8b`
MD5	`49453b637cf8f86fac08169dcaec68cd`
BLAKE2b-256	`347de29e4f3b05b0b79c40448bc73ea31aeffe39cd738f2ac0a4a1fcd61c9db1`

See more details on using hashes here.

File details

Details for the file quickda-0.2.1-py3-none-any.whl.

File metadata

Download URL: quickda-0.2.1-py3-none-any.whl
Upload date: Nov 20, 2020
Size: 9.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.2.0 pkginfo/1.6.1 requests/2.24.0 setuptools/50.3.2 requests-toolbelt/0.9.1 tqdm/4.48.2 CPython/3.7.4

File hashes

Hashes for quickda-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`0dd801fa2bb4212884a8c7125161bc363a49fd6df252e9719393f2525aa71868`
MD5	`ddd3bf9f7552b435f21a85e024675840`
BLAKE2b-256	`64137fde228dc7e942dbefc9091d3cd5a58827230f83f90027c63446eb0e2a02`

See more details on using hashes here.

quickda 0.2.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Quick-EDA

Getting Started

Table of Contents

Upcoming Work

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes