Skip to main content

Utility package for generating insights for datasets

Project description

data-understand

PyPI data-understand Downloads Run Python E2E Tests Run Python Unit Tests CodeFactor

Motivation

As data scientists and machine learning engineers, we are often required to execute various data science tasks like loading up the dataset into a pandas dataframe, inspecting the columns/rows in the dataset, visualizing the distribution of values, finding feature correlations and determining if there are any sort of imbalances in the dataset. Often these tasks are repetitive and involve creating multiple jupyter notebooks and we have to manage these jupyter notebooks separately with different handles to the location of input dataset. How about you have one tool which could take the directory location of your dataset and generate the boring aforementioned logic for you to execute and learn the same insights about your dataset. All you need to do is to install this tool in your local python environment and then execute the tool from a command line.

Installation

You can install the package data-understand from pypi using the following command:-

pip install data-understand

Usage

Once you have installed the tool locally, you can then look at the various options of the CLI tool:-

data_understand -h
========================================================================================================================
========================================================================================================================
usage: data_understand [-h] [-f FILE_NAME] [-t TARGET_COLUMN] [-p] [-j]

data.understand CLI

options:
  -h, --help            show this help message and exit
  -f FILE_NAME, --file_name FILE_NAME
                        Directory path to CSV file
  -t TARGET_COLUMN, --target_column TARGET_COLUMN
                        Target column name
  -p, --generate_pdf    Generate PDF file for understanding of data
  -j, --generate_jupyter_notebook
                        Generate jupyter notebook file for understanding of data

Notebook and PDF report generation

In order to generate both PDF report and jupyter notebook you can execute the following CLI command:-

data_understand --file_name adult_dataset.csv --target_column income --generate_pdf --generate_jupyter_notebook
========================================================================================================================
========================================================================================================================
The parsed arguments are:- 
file_name: adult_dataset.csv
target_column: income
generate_pdf: True
generate_jupyter_notebook: True
Time taken: 0.0 min 0.0012356000000863787 sec
========================================================================================================================
Generating PDF report and jupyter notebook
========================================================================================================================
Generating PDF report for the dataset in adult_dataset.csv
Successfully generated PDF report for the dataset in adult_dataset.csv at adult_dataset.csv.pdf
Time taken: 0.0 min 7.363417799999979 sec
========================================================================================================================
========================================================================================================================
Generating jupyter notebook for the dataset in adult_dataset.csv
Successfully generated jupyter notebook for the dataset in adult_dataset.csv at adult_dataset.csv.ipynb
Time taken: 0.0 min 0.053841799999986506 sec
========================================================================================================================
Successfully generated PDF report and jupyter notebook
Time taken: 0.0 min 7.485209299999951 sec
========================================================================================================================

This would generate the jupyter notebook and PDF report in the same directory location as your dataset. You can execute the cells in the jupyter notebook to generate various insights and graphs on the fly or you can read through the PDF report to learn about various aspects of your dataset.

Repos using data-understand to generate notebooks and PDF reports

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

data_understand-0.0.6.tar.gz (19.6 kB view details)

Uploaded Source

Built Distribution

data_understand-0.0.6-py3-none-any.whl (26.6 kB view details)

Uploaded Python 3

File details

Details for the file data_understand-0.0.6.tar.gz.

File metadata

  • Download URL: data_understand-0.0.6.tar.gz
  • Upload date:
  • Size: 19.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.5

File hashes

Hashes for data_understand-0.0.6.tar.gz
Algorithm Hash digest
SHA256 02ae24dcf657f524a90904718354ef6444be412d5c7a655f6feb43a2f3c76703
MD5 326cdc360fcd0b7346c2d66c6114f0d6
BLAKE2b-256 3ec14c67b29a438bfbee0f715613f3243dc617132007bf94e62d168e6c1c3128

See more details on using hashes here.

File details

Details for the file data_understand-0.0.6-py3-none-any.whl.

File metadata

File hashes

Hashes for data_understand-0.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 8842d970941d23a5fafe8022c3840c39c2172f57f393bb39d9a52e694e67c21a
MD5 4e08e21a3285b851cbc0d152581338f6
BLAKE2b-256 26c97bf614079f7abd2a64253b4f609249a3a350f737dff60b5256e312a95180

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page