Skip to main content

A Python package to handle missing values in the dataset

Project description

Project MISSING VALUES

Name Kriti Pandey

Roll no 101703292

Group 3COE13

DESCRIPTION

Data can have missing values for a number of reasons such as observations that were not recorded and data corruption.Handling missing data is important as many machine learning algorithms do not support data with missing values.

Some typical reasons why data is missing:

  1. User forgot to fill in a field.

  2. Data was lost while transferring manually from a legacy database.

  3. There was a programming error.

  4. Users chose not to fill out a field tied to their beliefs about how the results would be used or interpreted.

Specifically, there are 2 steps to handle missing data:

  1. mark invalid or corrupt values as missing in your dataset.

  2. impute missing values with mean values in your dataset.

Installation

Use the package manager pip to install OUTLIER_101703292.

pip install MissingValues_101703292

Usage

Enter csv filename followed by .csv extentsion

MissingValues_101703292 data.csv 

Sample dataset

0 1 2 3 4 5 6 7 8 9
0 6 148.0 72.0 35.0 NaN 33.6 0.627 50 1
1 1 85.0 66.0 29.0 NaN 26.6 0.351 31 0
2 8 183.0 64.0 NaN NaN 23.3 0.672 32 1
3 1 89.0 66.0 23.0 94.0 28.1 0.167 21 0
4 0 137.0 40.0 35.0 168.0 43.1 2.288 33 1
5 5 116.0 74.0 NaN NaN 25.6 0.201 30 0
6 3 78.0 50.0 32.0 88.0 31.0 0.248 26 1
7 10 115.0 NaN NaN NaN 35.3 0.134 29 0
8 2 197.0 70.0 45.0 543.0 30.5 0.158 53 1
9 8 125.0 96.0 NaN NaN NaN 0.232 54 1
10 4 110.0 92.0 NaN NaN 37.6 0.191 30 0
11 10 168.0 74.0 NaN NaN 38.0 0.537 34 1
12 10 139.0 80.0 NaN NaN 27.1 1.441 57 0
13 1 189.0 60.0 23.0 846.0 30.1 0.398 59 1
14 5 166.0 72.0 19.0 175.0 25.8 0.587 51 1
15 7 100.0 NaN NaN NaN 30.0 0.484 32 1
16 0 118.0 84.0 47.0 230.0 45.8 0.551 31 1
17 7 107.0 74.0 NaN NaN 29.6 0.254 31 1
18 1 103.0 30.0 38.0 83.0 43.3 0.183 33 0
19 1 115.0 70.0 30.0 96.0 34.6 0.529 32 1

Input

MissingValues_101703292 Sampledata.csv

Result

REPLACED MISSING VALUES WITH MEAN

S No.   1    2     3     4       5      6      7     8   9

0       0   6  148  72.0  35.0  116.15  33.60  0.627  50  1

1       1   1   85  66.0  29.0  116.15  26.60  0.351  31  0

2       2   8  183  64.0  17.8  116.15  23.30  0.672  32  1

3       3   1   89  66.0  23.0   94.00  28.10  0.167  21  0

4       4   0  137  40.0  35.0  168.00  43.10  2.288  33  1

5       5   5  116  74.0  17.8  116.15  25.60  0.201  30  0

6       6   3   78  50.0  32.0   88.00  31.00  0.248  26  1

7       7  10  115  61.7  17.8  116.15  35.30  0.134  29  0

8       8   2  197  70.0  45.0  543.00  30.50  0.158  53  1

9       9   8  125  96.0  17.8  116.15  30.95  0.232  54  1

10     10   4  110  92.0  17.8  116.15  37.60  0.191  30  0

11     11  10  168  74.0  17.8  116.15  38.00  0.537  34  1

12     12  10  139  80.0  17.8  116.15  27.10  1.441  57  0

13     13   1  189  60.0  23.0  846.00  30.10  0.398  59  1

14     14   5  166  72.0  19.0  175.00  25.80  0.587  51  1

15     15   7  100  61.7  17.8  116.15  30.00  0.484  32  1

16     16   0  118  84.0  47.0  230.00  45.80  0.551  31  1

17     17   7  107  74.0  17.8  116.15  29.60  0.254  31  1

18     18   1  103  30.0  38.0   83.00  43.30  0.183  33  0

19     19   1  115  70.0  30.0   96.00  34.60  0.529  32  1

Constraint

Your csv file should not have categorical data

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

MissingValues_101703292-1.0.3.tar.gz (3.6 kB view hashes)

Uploaded Source

Built Distribution

MissingValues_101703292-1.0.3-py3-none-any.whl (4.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page