A python package to help Data Scientists, Machine Learning Engineers and Analysts better understand data. Gives quick insights about given data; general dataset statistics, size and shape of dataset, number of unique data types, number of numerical and non-numerical columns, small overview of dataset, missing data statistics, missing data heatmap and provides methodology to impute missing data.
Project description
datastand
Why datastand? Data + Understand
A python package to help Data Scientists, Machine Learning Engineers and Analysts better understand data. Gives quick insights about a given dataset.
Installation
Run the following command on the terminal to install the package:
pip install datastand
Usage :
Code:
from datastand import datastand
import pandas as pd
df = pd.read_csv("path/to/target/dataframe")
datastand(df)
Output:
General stats:
______________
Size of DataFrame: 309200
Shape of DataFrame: (3865, 80)
Number of unique data types : {dtype('int64'), dtype('O'), dtype('float64')}
Number of numerical columns: 79
Number of non-numerical columns: 1
Missing data:
=======================
DataFrame contains 185698 missing values(60.06%) as follows column-wise:
-----------------------------------------------------------------------
galactic year 0
galaxy 0
existence expectancy index 1
existence expectancy at birth 1
Gross income per capita 28
...
Adjusted net savings 2953
Creature Immunodeficiency Disease prevalence, adult (% ages 15-49), total 2924
Private galaxy capital flows (% of GGP) 2991
Gender Inequality Index (GII) 3021
y 0
Length: 80, dtype: int64
-----------------------------------------------------------------------
Do you wish to long-list missing data statistics?(y/n): y
.
.
.
Code:
# This function is already available in the DataStand class and also available separately
# Here we're running it separately
from datastand import plot_missing
plot_missing(df)
Output:
Code:
from datastand import impute_missing
impute_missing(df)
Output:
Imputing missing data...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 80/80 [00:02<00:00, 30.52it/s]
Imputation complete.
Author/Maintainer
Vincent N. [LinkedIn] [Twitter]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.