Skip to main content

python package for removing outliers from multi-variate data

Project description

outlier_remover_101703283

For : Project-2 (UCS633)
Submitted by : Katinder Kaur
Roll no : 101703283
Group : 3COE13

outlier_remover_101703283 is a Python library for dealing with anomalies or outliers in a dataset. The presence of outliers in a dataset is very common, especially in raw data. Outlier removal is an important preprocessing stage since their presence leads to significant hindrance in the performance and prediction accuracies of the model. There are several methods to detect and remove outliers, this script uses Interquartile Range(IQR) as the method of detection of anomalous data.

Installation

Use the package manager pip to install outlier_remover_101703283.

pip install outlier_remover_101703283

Usage

For command prompt:

usage: outlier_remover [-h] [-o OUTPUTDATAFILE] [-c COLUMNSTOSKIP]
                       InputDataFile

positional arguments:
  InputDataFile         Enter the name of input CSV file with .csv extention

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUTDATAFILE, --OutputDataFile OUTPUTDATAFILE
                        Enter the name of output CSV file with .csv extention
  -c COLUMNSTOSKIP, --ColumnsToSkip COLUMNSTOSKIP
                        Enter the columns to be left out of analysis

Enter the input csv filename followed by .csv extentsion

outlier_remover sample_inputfile.csv

after the records with anomalous values are removed, the resultant data will be implicitly stored in sample_input_sansOutliers.csv (i.e. _sansOutliers.csv )

Custom output file name:

Destination output file name can be provided explicitly by using -o flag

outlier_remover sample_inputfile.csv -o my_outputfile.csv

the output data in this case will be stored in a csv file named my_outputfile.csv

Skipping out columns:

In some cases one may want to leave some features out of analysis (like in case of catagorical data or indices) , that can be facilitated by using the -c flag

outlier_remover sample_inputfile.csv -c 0,2,8

or

outlier_remover sample_inputfile.csv -c "0,2,8"

Note : Column numbers start from 0.

View help

To view usage help, use

outlier_remover -h

For Python IDLE:

>>> from outlier_remover.outlier_remover import outlier_remover
>>> list_of_columns_to_skip=[]
>>> outlier_remover('inputfile.csv','outputfile.csv',list_of_columns_to_skip)
Removed 2 row(s) successfully.
Save successful!
Check outputfile.csv for results


>>> from outlier_remover.outlier_remover import outlier_removerfn
>>> outlier_removerfn('sample2.csv')
Removed 1 row(s) successfully.
Save successful!
Check sans_outliers.csv for results

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

outlier_remover_101703283-0.0.0.tar.gz (3.9 kB view hashes)

Uploaded Source

Built Distribution

outlier_remover_101703283-0.0.0-py3-none-any.whl (5.4 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page