Skip to main content

The quickstatandeda package aims to generate visuals and an main html file for any Python Pandas DataFrame

Project description

quickstatandeda

Python Badge PyPi license GitHub Open Source? Yes! Downloads

quickstatandeda is a Python library for quick and automatic exploratory data analysis and preliminary statistics analysis. The outputs of the main edaFeatures() function are a folder of visualizations and a html file that contains all analyses. This library is built based on mainstream libraries like numpy, pandas, scipy, statsmodel, matplotlib, and seaborn.

Make sure the data types of your input dataframe are correctly converted! Use pd.to_datetime() and astype() functions to convert the data type. Here is a simple example:

import pandas as pd
x = pd.read_csv('xxx.csv')

x['string_column'] = x['string_column'].astype('string')
x['int_column'] = x['int_column'].astype('int')
x['float_column'] = x['float_column'].astype('float')
x['date_time_column'] = pd.to_datetime(x['date_time_column'])
x['binary_column'] = x['binary_column'].replace({'True':True, 'False':False}).astype('bool')
x['categorical_column'] = x['categorical_column'].astype('category')
x['date_column'] = pd.to_datetime(x['date_column'])
x['datetime_column'] = pd.to_datetime(x['datetime_column'])
x['datetime_tz_column'] = x['datetime_column'].dt.tz_localize('UTC')

Note that the t tests are conducted only for binary variable (columns with data type object and have only two unique values). If you have categorical variables with unique values greater than 2, please try to pd.get_dummies() and loc[] functions to convert them to binary ones. Here is a simple example:

import pandas as pd

df = pd.DataFrame({
    'a':['a','b','c']
    })

df = pd.get_dummies(data=df)

df.loc[df.a==1,'a'] = 'a'
df.loc[df.a==0, 'a'] = 'not a'

Installation

Use the package manager pip to install quickstatandeda.

python3 -m pip install quickstatandeda

If there are some version conflicts, try creating a new virtual environment or use pip install --upgrade <package_name> to upgrade the required package.

Usage

Here is a simple example to generate an analysis report using the edaFeatures function:

import pandas as pd
from quickstatandeda import edaFeatures

x = pd.read_csv('xxx.csv')
y = 'target_column'
id = 'id_column_for_paired_t_test'
save_path = 'path_to_save_the_output_files'
significant_level = 0.05
file_name = 'name_of_the_output_html_file'

edaFeatures(x, y, id, save_path, significant_level, file_name)

The outputs are structured as following:

├── <file_name>.html
├── _visuals
│   ├── <plot1>.png
│   ├── <plot2>.png
│   ├── <plot3>.png
│   └── ...

A visuals folder is created automatically to save all the visuals used in the html output file, and both the html file and the visuals folder are presented in the save_path input parameter.

Contributing

If you find a bug 🐛 or want to make some major or minor changes, please open an issue in the GitHub repository to discuss. You are also more than welcome to contact me directly. Please feel free to fork the project, make any changes, and submit and pull request if you want to make some major changes.

Note that a simple test file is provided in the test folder. After making changes, you can simply run pytest test/ at the main folder level to test the package script. It might take more than 8 minutes to test the package.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quickstatandeda-0.1.11.tar.gz (15.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

quickstatandeda-0.1.11-py3-none-any.whl (14.1 kB view details)

Uploaded Python 3

File details

Details for the file quickstatandeda-0.1.11.tar.gz.

File metadata

  • Download URL: quickstatandeda-0.1.11.tar.gz
  • Upload date:
  • Size: 15.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for quickstatandeda-0.1.11.tar.gz
Algorithm Hash digest
SHA256 42570dbc1adc0835eb19e527ef132c3df6c5936a1048b7935161dde6e79d62a6
MD5 8567eab0922f33d5c3f45ba51187ddef
BLAKE2b-256 2104e06e2564d87aa0aa07c196f202542070d6ef0ef2d037020be1a952291d78

See more details on using hashes here.

File details

Details for the file quickstatandeda-0.1.11-py3-none-any.whl.

File metadata

File hashes

Hashes for quickstatandeda-0.1.11-py3-none-any.whl
Algorithm Hash digest
SHA256 090260283a6c3ad796039eaea6a3ee6928b968792b223f771317856a638c433a
MD5 0a8420f3198885ad0f8ac43ee7d15857
BLAKE2b-256 b6e0fcfb19e0dc61b4ab0c97bd98415886e53d6022c673c72352c3f9f7e4c950

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page