Skip to main content

Simple & Easy-to-use python modules to perform Quick Exploratory Data Analysis for any structured dataset!

Project description

Quick-EDA

Simple & Easy-to-use python modules to perform Quick Exploratory Data Analysis for any structured dataset!

EDA with Python

Getting Started

Pre-Requistes

You will need to have Python 3 and Jupyter Notebook installed in your local system. Once installed, Fork this repository and clone it on your local to get the project structure setup.

git clone https://github.com/sid-the-coder/Quick-EDA.git

You will also need to install few python package dependencies in your evironment to get started. You can do this by:

pip3 install -r requirements.txt

Table of Contents

  1. Data Exploration - explore(data)

    • data: pd.DataFrame
    • method: string, default="summarize"
      • "summarize" : Generates a summary statistics of the dataset
      • "profile" : Generates a HTML Report of the Dataset Profile
    • report_name: string, default="Dataset Report"
      • Parameter to customise the generated report name
    • is_large_dataset: Boolean, default=False
      • Parameter set to True explicitly to flag, in case of a large dataset
  2. Data Cleaning - clean(data)

    • data: pd.DataFrame
    • method: string, default="default"
      • "default" : Standardizes column names, Removes duplicates rows and Drops missing values
      • "standardize" : Standardizes column names
      • "dropcols" : Drops columns specified by the user
      • "dtypes" : Explicitly converts the Data Types as specified by the user
      • "duplicates" : Removes duplicate rows
      • "replaceval" : Replaces a value in dataframe with new value specified by the user
      • "fillmissing" : Interpolates all columns with missing values using forward filling
      • "dropmissing" : Drops all rows with missing values
      • "outliers" : Removes all outliers in data using IQR method
    • columns: list/string, default=[]
      • Parameter to specify column names in the DataFrame
    • dtype: string, default="numeric"
      • "numeric" : Converts columns dtype to numeric
      • "category" : Converts columns dtype to category
      • "datetime" : Converts columns dtype to datetime
    • to_replace: string/integer/regex, default=""
      • Parameter to pass a value to replace in the DataFrane
    • value: string/integer/regex, default=np.nan
      • Paramter to pass a new value that replaces an old value in the Dataframe
  3. EDA Numerical Features - eda_num(data)

    • data: pd.DataFrame
    • method: string, default="default"
      • "default" : Shows all Outlier & Distribution Analysis via BoxPlots & Histograms
      • "correlation" : Gets the correlation matrix between all numerical features
    • bins: integer, default=10
      • Parameter to set the number of bins while displaying histograms
  4. EDA Categorical Features - eda_cat(data, x)

    • data: pd.DataFrame
    • x: string, First Categorical Type Column Name
    • y: string, default=None
      • Parameter to pass the Second Categorical Type Column Name
    • method: string, default="default"
      • "default" : Shows category count plot & summarizes it in a frequency table
  5. EDA Numerical with Categorical Features - eda_numcat(data, x, y)

    • data: pd.DataFrame
    • x: string/list, Numeric/Categorical Type Column Name(s)
    • y: string/list, Numeric/Categorical Type Column Name(s)
    • method: string, default="pps"
      • "pps" : Calculates Predictive Power Score Matrix
      • "relationship" : Shows Scatterplot of given features
      • "comparison" : Shows violin plots to compare categories across numerical features
      • "pivot" : Generates pivot table using column names, values and aggregation function
    • hue: string, default=None
      • Parameter to visualise a categorical Type feature within scatterplots
    • values: string/list, default=None
      • Parameter to set columns to aggregate on pivot views
    • aggfunc: string, default="mean"
      • Parameter to set aggregate functions on pivot tables
      • Example: 'min', 'max', 'mean', 'median', 'sum', 'count'
  6. EDA Time Series Data - eda_timeseries(data, x, y)

    • data: pd.DataFrame
    • x: string, Datetime Type Column Name
    • y: string, Numeric Type Column Name

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quickda-0.1.1.tar.gz (5.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

quickda-0.1.1-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file quickda-0.1.1.tar.gz.

File metadata

  • Download URL: quickda-0.1.1.tar.gz
  • Upload date:
  • Size: 5.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.4

File hashes

Hashes for quickda-0.1.1.tar.gz
Algorithm Hash digest
SHA256 0e390488667911fc65fab0059e6fa44c9bee34e9ae7d2bf9cb7ca30038fa30f3
MD5 07d564d4f290e7199a7af0f113b6e771
BLAKE2b-256 0c05cdacc9605d342928b94308263c62ffedc1dfa49c293fc835084e7ef3209f

See more details on using hashes here.

File details

Details for the file quickda-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: quickda-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 8.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.4.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.4

File hashes

Hashes for quickda-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 56a00122fd3dc74e78526cca38e3b7184ed617fde8d759a9b8e54b70f8f8b24c
MD5 0f40ebabea9cda9fd80abe67249a0b00
BLAKE2b-256 559ac8d1b97b75c55d93805dbb2ce22ac49df3da23e8bb9df6b4bfe5ba437a31

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page