clean_assist is a simple library designed to help data scientists observe a descriptive summary of their DataFrame

These details have not been verified by PyPI

Project links

Homepage

Project description

Clean Assist

Clean Assist is a simple library designed to help data scientists observe a summary of any DataFrame they would like to clean. This library also displays charts to view the normal approximation of your variables.

Clean Assist is composed of 2 functions:

clean_assist.table(df, n_rows, n_round)

Displays relevant features to help you on data cleaning and analysis.

Parameters
df : DataFrame you would like to analyze
n_rows : Number of variables to display
n_round : Number of decimals to round calculations
clean_assist.normality(df, list_var, print_img, size_x, size_y, font_size)

Displays histograms to compare the your variables to a normal distribution.

Parameters
df              : DataFrame you would like to analyze
list_var    : Name of columns to analyze in a list format
print_img    : input 'y' to print image or 'n' to not print
size_x         : width of the image output
size_y         : height of the image output
font_size    : font size of the titles and headers

To import the library: copy paste the green colored code to your python code:

- Note: Delete the plus(+) signs after pasting code

+ import requests
+ url = 'https://raw.githubusercontent.com/juanduranc/Clean-Assist/master/library'
+ exec(requests.get(url).text)
+ help(clean_assist)

Example of library usage and interpretation:

1. The following table is a sample of an output form the function: clean_assist.table(df, n_rows, n_round)

VARIABLES	NULLS	COUNT	TYPES	MEAN	MEDIAN	UNIQUES	SAMPLE_________________________________	Outliers	pval(Norm)
AVG_CLICKS_PER_VISIT	0	1946	int64	13.5	13.0	15	[11, 13, 12, 13, 13, 17, 10, 13, 12, 12]	[6,0]	0.03
MEDIAN_MEAL_RATING	47	1899	int64	2.8	3.0	5	[3, 3, 3, 3, 3, 2, 4, 3, 3, 3]	[0,13]	3e-06
REVENUE	0	1946	float64	2107.3	1740.0	859	[1880, 1495, 2572.5, 1647, 1923, 1250]	[0,82]	1e-21
TOTAL_PHOTOS_VIEWED	0	1946	int64	106.4	0.0	371	[0, 90, 0, 0, 253, 0, 705, 0, 0, 0]	[0,120]	5e-90
CROSS_SELL_SUCCESS	0	1946	int64	0.7	1.0	2	[1, 1, 1, 0, 1, 1, 0, 1, 1, 1]		1e-159

Examples of findings:

AVG_CLICKS_PER_VISIT has a similar mean and mean, it aproximates a normal distribution and has 6 lower outliers.
MEDIAN_MEAL_RATING has 47 nulls which need imputation.
Revenue is the only float variables, the rest are integer.
TOTAL_PHOTOS_VIEWED has a median of 0 and 120 upper outliers. This means most people dont look view photos.
CROSS_SELL_SUCCESS has 2 unique values. From the column named sample you can see only ones and zeros. This is a binary or boolean column.

2. Next, a sample output from the function: clean_assist.normality(df, list_var, print_img, size_x, size_y, font_size)

Histograms' interpretation:

MEDIAN_MEAL_RATING has interger values and it mimisc a normal distribution.
AVG_CLICKS_PER_VISIT is the colsest variable to a normal distribution with a p value of 0.03.
REVENUE is right skewed with 82 upper outliers.
TOTAL_PHOTOS_VIEWED has too many zero values. It is also right skewed and far from being a normal distribution.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.3.4

Feb 18, 2020

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cleanassist-1.3.4.tar.gz (5.2 kB view details)

Uploaded Feb 18, 2020 Source

File details

Details for the file cleanassist-1.3.4.tar.gz.

File metadata

Download URL: cleanassist-1.3.4.tar.gz
Upload date: Feb 18, 2020
Size: 5.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.22.0 setuptools/45.1.0.post20200127 requests-toolbelt/0.9.1 tqdm/4.42.0 CPython/3.7.6

File hashes

Hashes for cleanassist-1.3.4.tar.gz
Algorithm	Hash digest
SHA256	`ace10a0d3b3d5191502289899f3831476c24331d1b394aaa777e4e74037d9d7e`
MD5	`911ec0f6026f93fef59a1fe7c49836a9`
BLAKE2b-256	`e03c5b9dbf213d7bf58c96607fa14ba0f6c198eecee4b3f7fa88dcf4bce10c7b`

See more details on using hashes here.

cleanassist 1.3.4

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Clean Assist

Clean Assist is composed of 2 functions:

To import the library: copy paste the green colored code to your python code:

Example of library usage and interpretation:

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes