Skip to main content

A python package to help Data Scientists, Machine Learning Engineers and Analysts better understand data. Gives quick insights about given data; general dataset statistics, shape of dataset, number of unique data types, number of numerical and non-numerical columns, missing data statistics, missing data heatmap and provides methodology to impute missing data.

Project description

datastand


package logo Why datastand? Data + Understand
A python package to help Data Scientists, Machine Learning Engineers and Analysts better understand data. Gives quick insights about a given dataset.


Installation

Run the following command on the terminal to install the package:

pip install datastand

Usage :

Code:

from datastand import datastand
import pandas as pd

df = pd.read_csv("path/to/target/dataframe")

datastand(df)

Output:

General stats:
==================
Shape of DataFrame: (1202, 13)
Number of unique data types : {dtype('int64'), dtype('O')}
Number of numerical columns: 2
Number of non-numerical columns: 11


Missing data:
=======================
DataFrame contains 2670 missing values (17.09%) as follows column-wise:
-----------------------------------------------------------------------
Gender                 41
Car_Category          372
Subject_Car_Colour    697
Subject_Car_Make      248
LGA_Name              656
State                 656
dtype: int64
-----------------------------------------------------------------------

Do you wish to long-list missing data statistics?(y/n): y
.
.
.

Code:

# This function is already available in the DataStand class and also available separately
# Here we're running it separately 
from datastand import plot_missing

plot_missing(df)

Output:

missing data heatmap

Code:

from datastand import impute_missing

impute_missing(df)

Output:

Imputing missing data...
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 80/80 [00:02<00:00, 30.52it/s]
Imputation complete.

Author/Maintainer

Vincent N. [LinkedIn] [Twitter]

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datastand-2.5.0.tar.gz (4.5 kB view details)

Uploaded Source

Built Distribution

datastand-2.5.0-py3-none-any.whl (5.2 kB view details)

Uploaded Python 3

File details

Details for the file datastand-2.5.0.tar.gz.

File metadata

  • Download URL: datastand-2.5.0.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.11.4 Linux/6.4.0-kali3-amd64

File hashes

Hashes for datastand-2.5.0.tar.gz
Algorithm Hash digest
SHA256 6d88a14c14053f71f30029eb53232906049a25067c54829b6c92ec1e8d9f9cce
MD5 00254f770ab324b4080cd65afcf68353
BLAKE2b-256 bfc1d4afd56374c0e6c5fb77820aef7ff02b60dcfc66a4fab71ff8727c8b70c2

See more details on using hashes here.

File details

Details for the file datastand-2.5.0-py3-none-any.whl.

File metadata

  • Download URL: datastand-2.5.0-py3-none-any.whl
  • Upload date:
  • Size: 5.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.6.1 CPython/3.11.4 Linux/6.4.0-kali3-amd64

File hashes

Hashes for datastand-2.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 a452302ffe1863d657a86dbefddb9ad0fdd341fd56734009d267fef68b7309cc
MD5 ff31a7f64b7c192d6972ed633afcc4ed
BLAKE2b-256 a5813622337a30f0990fa3e4d19ac4b193ed12289b730a338c647ece6b954cd4

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page