python package for removing outliers from multi-variate data
Project description
outlier_remover_101703283
For : Project-2 (UCS633)
Submitted by : Katinder Kaur
Roll no : 101703283
Group : 3COE13
outlier_remover_101703283 is a Python library for dealing with anomalies or outliers in a dataset. The presence of outliers in a dataset is very common, especially in raw data. Outlier removal is an important preprocessing stage since their presence leads to significant hindrance in the performance and prediction accuracies of the model. There are several methods to detect and remove outliers, this script uses Interquartile Range(IQR) as the method of detection of anomalous data.
Installation
Use the package manager pip to install outlier_remover_101703283.
pip install outlier_remover_101703283
Usage
For command prompt:
usage: outlier_remover [-h] [-o OUTPUTDATAFILE] [-c COLUMNSTOSKIP]
InputDataFile
positional arguments:
InputDataFile Enter the name of input CSV file with .csv extention
optional arguments:
-h, --help show this help message and exit
-o OUTPUTDATAFILE, --OutputDataFile OUTPUTDATAFILE
Enter the name of output CSV file with .csv extention
-c COLUMNSTOSKIP, --ColumnsToSkip COLUMNSTOSKIP
Enter the columns to be left out of analysis
Enter the input csv filename followed by .csv extentsion
outlier_remover sample_inputfile.csv
after the records with anomalous values are removed, the resultant data will be implicitly stored in sample_input_sansOutliers.csv (i.e. _sansOutliers.csv )
Custom output file name:
Destination output file name can be provided explicitly by using -o flag
outlier_remover sample_inputfile.csv -o my_outputfile.csv
the output data in this case will be stored in a csv file named my_outputfile.csv
Skipping out columns:
In some cases one may want to leave some features out of analysis (like in case of catagorical data or indices) , that can be facilitated by using the -c flag
outlier_remover sample_inputfile.csv -c 0,2,8
or
outlier_remover sample_inputfile.csv -c "0,2,8"
Note : Column numbers start from 0.
View help
To view usage help, use
outlier_remover -h
For Python IDLE:
>>> from outlier_remover.outlier_remover import outlier_remover
>>> list_of_columns_to_skip=[]
>>> outlier_remover('inputfile.csv','outputfile.csv',list_of_columns_to_skip)
Removed 2 row(s) successfully.
Save successful!
Check outputfile.csv for results
>>> from outlier_remover.outlier_remover import outlier_removerfn
>>> outlier_removerfn('sample2.csv')
Removed 1 row(s) successfully.
Save successful!
Check sans_outliers.csv for results
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for outlier_remover_101703283-0.0.0.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3cbed9e31505904265b476ae4a27c25e633fd18b4811ac80a41d2882cf6a9af3 |
|
MD5 | 9b6598a1758cc2410ed85ec7fe883ac9 |
|
BLAKE2b-256 | 2ff0deb5283b10cb023df95eb934bf9678b89980b738ecc91d342f132f9e17d0 |
Hashes for outlier_remover_101703283-0.0.0-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 20427b93ee286f0616648c914c69c4d51023ef171051697a5c6990916770fb68 |
|
MD5 | 72e9e8b39f6bcf4eb19a932bb9c8f4ef |
|
BLAKE2b-256 | fc2674f8df3ddfcc8b190ea17d829ce86db45d0e354ab7a615dc5cf03a67e68f |