Skip to main content

A Python package to remove outliers from a dataset

Project description

Project OUTLIER DETECTION AND REMOVAL

Name Kriti Pandey

Roll no 101703292

Group 3COE13

DESCRIPTION

Outliers are extreme values that deviate from other observations on data , they may indicate a variability in a measurement, experimental errors or a novelty. Outliers can be of two kinds: univariate and multivariate. Univariate outliers can be found when looking at a distribution of values in a single feature space. Multivariate outliers can be found in a n-dimensional space (of n-features). Outliers can also come in different flavours, depending on the environment: point outliers, contextual outliers, or collective outliers. Point outliers are single data points that lay far from the rest of the distribution. Contextual outliers can be noise in data, such as punctuation symbols when realizing text analysis or background noise signal when doing speech recognition. Collective outliers can be subsets of novelties in data such as a signal that may indicate the discovery of new phenomena.

Most common causes of outliers on a data set:

  1. Data entry errors (human errors)

  2. Measurement errors (instrument errors)

  3. Experimental errors (data extraction or experiment planning/executing errors)

  4. Intentional (dummy outliers made to test detection methods)

  5. Data processing errors (data manipulation or data set unintended mutations)

  6. Sampling errors (extracting or mixing data from wrong or various sources)

  7. Natural (not an error, novelties in data)

Ways of finding an outlier:

  1. Box plot

  2. Scatter plot

  3. Interquartile Range

  4. Z score

Installation

Use the package manager pip to install OUTLIER_101703292.

pip install OUTLIER_101703292

Usage

Enter csv filename followed by .csv extentsion

OUTLIER_101703292 data.csv 

Constraint

Your csv file should not have categorical data

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for OUTLIER-101703292, version 1.0.1
Filename, size File type Python version Upload date Hashes
Filename, size OUTLIER_101703292-1.0.1-py3-none-any.whl (5.0 kB) File type Wheel Python version py3 Upload date Hashes View
Filename, size OUTLIER_101703292-1.0.1.tar.gz (3.2 kB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page