Skip to main content

python package for removing outliers from multi-variate data

Project description

outlier_remover_101703283

For : Project-2 (UCS633)
Submitted by : Katinder Kaur
Roll no : 101703283
Group : 3COE13

outlier_remover_101703283 is a Python library for dealing with anomalies or outliers in a dataset. The presence of outliers in a dataset is very common, especially in raw data. Outlier removal is an important preprocessing stage since their presence leads to significant hindrance in the performance and prediction accuracies of the model. There are several methods to detect and remove outliers, this script uses Interquartile Range(IQR) as the method of detection of anomalous data.

Installation

Use the package manager pip to install outlier_remover_101703283.

pip install outlier_remover_101703283

Usage

For command prompt:

usage: outlier_remover [-h] [-o OUTPUTDATAFILE] [-c COLUMNSTOSKIP]
                       InputDataFile

positional arguments:
  InputDataFile         Enter the name of input CSV file with .csv extention

optional arguments:
  -h, --help            show this help message and exit
  -o OUTPUTDATAFILE, --OutputDataFile OUTPUTDATAFILE
                        Enter the name of output CSV file with .csv extention
  -c COLUMNSTOSKIP, --ColumnsToSkip COLUMNSTOSKIP
                        Enter the columns to be left out of analysis

Enter the input csv filename followed by .csv extentsion

outlier_remover sample_inputfile.csv

after the records with anomalous values are removed, the resultant data will be implicitly stored in sample_input_sansOutliers.csv (i.e. _sansOutliers.csv )

Custom output file name:

Destination output file name can be provided explicitly by using -o flag

outlier_remover sample_inputfile.csv -o my_outputfile.csv

the output data in this case will be stored in a csv file named my_outputfile.csv

Skipping out columns:

In some cases one may want to leave some features out of analysis (like in case of catagorical data or indices) , that can be facilitated by using the -c flag

outlier_remover sample_inputfile.csv -c 0,2,8

or

outlier_remover sample_inputfile.csv -c "0,2,8"

Note : Column numbers start from 0.

View help

To view usage help, use

outlier_remover -h

For Python IDLE:

>>> from outlier_remover.outlier_remover import outlier_remover
>>> list_of_columns_to_skip=[]
>>> outlier_remover('inputfile.csv','outputfile.csv',list_of_columns_to_skip)
Removed 2 row(s) successfully.
Save successful!
Check outputfile.csv for results


>>> from outlier_remover.outlier_remover import outlier_removerfn
>>> outlier_removerfn('sample2.csv')
Removed 1 row(s) successfully.
Save successful!
Check sans_outliers.csv for results

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

outlier_remover_101703283-0.0.0.tar.gz (3.9 kB view details)

Uploaded Source

Built Distribution

File details

Details for the file outlier_remover_101703283-0.0.0.tar.gz.

File metadata

  • Download URL: outlier_remover_101703283-0.0.0.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/45.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for outlier_remover_101703283-0.0.0.tar.gz
Algorithm Hash digest
SHA256 3cbed9e31505904265b476ae4a27c25e633fd18b4811ac80a41d2882cf6a9af3
MD5 9b6598a1758cc2410ed85ec7fe883ac9
BLAKE2b-256 2ff0deb5283b10cb023df95eb934bf9678b89980b738ecc91d342f132f9e17d0

See more details on using hashes here.

File details

Details for the file outlier_remover_101703283-0.0.0-py3-none-any.whl.

File metadata

  • Download URL: outlier_remover_101703283-0.0.0-py3-none-any.whl
  • Upload date:
  • Size: 5.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/45.0.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.7.3

File hashes

Hashes for outlier_remover_101703283-0.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 20427b93ee286f0616648c914c69c4d51023ef171051697a5c6990916770fb68
MD5 72e9e8b39f6bcf4eb19a932bb9c8f4ef
BLAKE2b-256 fc2674f8df3ddfcc8b190ea17d829ce86db45d0e354ab7a615dc5cf03a67e68f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page