A python package to help Data Scientists, Machine Learning Engineers and Analysts better understand data. Gives quick insights about given data; general dataset statistics, shape of dataset, number of unique data types, number of numerical and non-numerical columns, missing data statistics, missing data heatmap and provides methodology to impute missing data.
Project description
datastand
Why datastand? Data + Understand
A python package to help Data Scientists, Machine Learning Engineers and Analysts better understand data. Gives quick insights about a given dataset.
Installation
Run the following command on the terminal to install the package:
pip install datastand
Usage :
Code:
from datastand import datastand
import pandas as pd
df = pd.read_csv("path/to/target/dataframe")
datastand(df)
Output:
General stats:
==================
Shape of DataFrame: (1202, 13)
Number of unique data types : {dtype('int64'), dtype('O')}
Number of numerical columns: 2
Number of non-numerical columns: 11
Missing data:
=======================
DataFrame contains 2670 missing values (17.09%) as follows column-wise:
-----------------------------------------------------------------------
Gender 41
Car_Category 372
Subject_Car_Colour 697
Subject_Car_Make 248
LGA_Name 656
State 656
dtype: int64
-----------------------------------------------------------------------
Do you wish to long-list missing data statistics?(y/n): y
.
.
.
Code:
# This function is already available in the DataStand class and also available separately
# Here we're running it separately
from datastand import plot_missing
plot_missing(df)
Output:
Code:
from datastand import impute_missing
impute_missing(df)
Output:
Imputing missing data...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 80/80 [00:02<00:00, 30.52it/s]
Imputation complete.
Author/Maintainer
Vincent N. [LinkedIn] [Twitter]
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file datastand-2.5.0.tar.gz
.
File metadata
- Download URL: datastand-2.5.0.tar.gz
- Upload date:
- Size: 4.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.6.1 CPython/3.11.4 Linux/6.4.0-kali3-amd64
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6d88a14c14053f71f30029eb53232906049a25067c54829b6c92ec1e8d9f9cce |
|
MD5 | 00254f770ab324b4080cd65afcf68353 |
|
BLAKE2b-256 | bfc1d4afd56374c0e6c5fb77820aef7ff02b60dcfc66a4fab71ff8727c8b70c2 |
File details
Details for the file datastand-2.5.0-py3-none-any.whl
.
File metadata
- Download URL: datastand-2.5.0-py3-none-any.whl
- Upload date:
- Size: 5.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.6.1 CPython/3.11.4 Linux/6.4.0-kali3-amd64
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a452302ffe1863d657a86dbefddb9ad0fdd341fd56734009d267fef68b7309cc |
|
MD5 | ff31a7f64b7c192d6972ed633afcc4ed |
|
BLAKE2b-256 | a5813622337a30f0990fa3e4d19ac4b193ed12289b730a338c647ece6b954cd4 |