Skip to main content

A python package to help Data Scientists, Machine Learning Engineers and Analysts better understand data. Gives quick insights about given data; general dataset statistics, shape of dataset, number of unique data types, number of numerical and non-numerical columns, missing data statistics, missing data heatmap and provides methodology to impute missing data.

Project description

datastand


package logo Why datastand? Data + Understand
A python package to help Data Scientists, Machine Learning Engineers and Analysts better understand data. Gives quick insights about a given dataset.


Installation

Run the following command on the terminal to install the package:

pip install datastand

Usage :

Code:

from datastand import datastand
import pandas as pd

df = pd.read_csv("path/to/target/dataframe")

datastand(df)

Output:

General stats:
==================
Shape of DataFrame: (1202, 13)
Number of unique data types : {dtype('int64'), dtype('O')}
Number of numerical columns: 2
Number of non-numerical columns: 11


Missing data:
=======================
DataFrame contains 2670 missing values (17.09%) as follows column-wise:
-----------------------------------------------------------------------
Gender                 41
Car_Category          372
Subject_Car_Colour    697
Subject_Car_Make      248
LGA_Name              656
State                 656
dtype: int64
-----------------------------------------------------------------------

Do you wish to long-list missing data statistics?(y/n): y
.
.
.

Code:

# This function is already available in the DataStand class and also available separately
# Here we're running it separately 
from datastand import plot_missing

plot_missing(df)

Output:

missing data heatmap

Code:

from datastand import impute_missing

impute_missing(df)

Output:

Imputing missing data...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 80/80 [00:02<00:00, 30.52it/s]
Imputation complete.

Author/Maintainer

Vincent N. [LinkedIn] [Twitter]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datastand-2.5.0.tar.gz (4.5 kB view hashes)

Uploaded Source

Built Distribution

datastand-2.5.0-py3-none-any.whl (5.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page