Skip to main content

python package for removing outliers from multi-variate data

Project description

outlier_remover_101703283

For : Project-2 (UCS633)
Submitted by : Katinder Kaur
Roll no : 101703283
Group : 3COE13

outlier_remover_101703283 is a Python library for dealing with anomalies or outliers in a dataset. The presence of outliers in a dataset is very common, especially in raw data. Outlier removal is an important preprocessing stage since their presence leads to significant hindrance in the performance and prediction accuracies of the model. There are several methods to detect and remove outliers, this script uses Interquartile Range(IQR) as the method of detection of anomalous data.

Installation

Use the package manager pip to install outlier_remover_101703283.

pip install outlier_remover_101703283

Usage

For command prompt:

usage: outlier_remover [-h] [-o OUTPUTDATAFILE] [-c COLUMNSTOSKIP]
                       InputDataFile

positional arguments:
  InputDataFile         Enter the name of input CSV file with .csv extention

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUTDATAFILE, --OutputDataFile OUTPUTDATAFILE
                        Enter the name of output CSV file with .csv extention
  -c COLUMNSTOSKIP, --ColumnsToSkip COLUMNSTOSKIP
                        Enter the columns to be left out of analysis

Enter the input csv filename followed by .csv extentsion

outlier_remover sample_inputfile.csv

after the records with anomalous values are removed, the resultant data will be implicitly stored in sample_input_sansOutliers.csv (i.e. <InputFileName>_sansOutliers.csv )

Custom output file name:

Destination output file name can be provided explicitly by using -o flag

outlier_remover sample_inputfile.csv -o my_outputfile.csv

the output data in this case will be stored in a csv file named my_outputfile.csv

Skipping out columns:

In some cases one may want to leave some features out of analysis (like in case of catagorical data or indices) , that can be facilitated by using the -c flag

outlier_remover sample_inputfile.csv -c 0,2,8

or

outlier_remover sample_inputfile.csv -c "0,2,8"

Note : Column numbers start from 0.

View help

To view usage help, use

outlier_remover -h

For Python IDLE:

>>> from outlier_remover.outlier_remover import outlier_remover
>>> list_of_columns_to_skip=[]
>>> outlier_remover('inputfile.csv','outputfile.csv',list_of_columns_to_skip)
Removed 2 row(s) successfully.
Save successful!
Check outputfile.csv for results


>>> from outlier_remover.outlier_remover import outlier_removerfn
>>> outlier_removerfn('sample2.csv')
Removed 1 row(s) successfully.
Save successful!
Check sans_outliers.csv for results

License

MIT

Project details


Release history Release notifications

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for outlier-remover-101703283, version 0.0.0
Filename, size File type Python version Upload date Hashes
Filename, size outlier_remover_101703283-0.0.0-py3-none-any.whl (5.4 kB) File type Wheel Python version py3 Upload date Hashes View hashes
Filename, size outlier_remover_101703283-0.0.0.tar.gz (3.9 kB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page