Skip to main content

Outlier detection using supervised methods in an unsupervised context

Project description

Pseudo-supervised outlier detection

A highly performant alternative to purely unsupervised approaches.

PSOD uses supervised methods to identify outliers in unsupervised contexts. It offers higher accuracy for outliers with top scores than other models while keeping comparable performance on the whole dataset.

The usage is simple.

1.) Install the package:

pip install psod

2.) Import the package:

from psod.outlier_detection.psod import PSOD

3.) Instantiate the class:

iso_class = PSOD()

The class has multiple arguments that can be passed. If older labels exist these could be used for hyperparameter tuning.

4.) Recommended: Normalize the data. PSOD offers preprocessing functions. It can downcast all columns to reduce memory footprint massively (up to 75%). It can also scale the data. For convenience both steps can be called together using:

from psod.preprocessing.full_preprocessing import auto_preprocess

scaled = auto_preprocess(treatment_data)

However they can also be called individually on demand.

5.) Fit and predict:

full_res = iso_class.fit_predict(scaled, return_class=True)

6.) Predict on new data:

full_res = iso_class.predict(scaled, return_class=True, use_trained_stats=True)

The param use_trained_stats is a boolean indicating of conversion from outlier scores to outlier class shall make use of mean and std of prediction errors obtained during training shall be used. If False prediction errors of the provided dataset will be treated as new distribution with new mean and std as classification thresholds.

Classes and outlier scores can always be accessed from the class instance via:

iso_class.scores  # getting the outlier scores
iso_class.outlier_classes  # get the classes

Many parameters can be optimized. Detailed descriptions on parameters can be found using:

help(iso_class)

By printing class instance current settings can be observed:

print(iso_class)

The repo contains example notebooks. Please note that example notebooks do not always contain the newest version. The file psod.py is always the most updated one. See the full article

Release History

  • 1.3.0
    • Widen dependencies for all libraries to support higher versions.
  • 1.2.1
    • Make typing import compatible to Python 3.7
  • 1.2.0
    • Added use_trained_stats to predict function
    • Added doc strings to main functions
    • Fixed a bug where PSOD tried to drop categorical data in the absence of categorical data
  • 1.1.0
    • Add correlation based feature selection
  • 1.0.0
    • Some bug fixes
    • Added yeo-johnson to numerical transformation options and changed the parameter name and type
    • Added preprocessing functionality (scaling and memory footprint reduction)
    • Added warnings to flag risky input params
    • Changed default of numerical preprocessing to None (previously logarithmic)
    • Suppressed Pandas Future and CopySettings warnings
    • Enhanced Readme
  • 0.0.4
    • First version with bare capabilities

Meta

Creator: Thomas Meißner – LinkedIn

PSOD GitHub repository

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

psod-1.3.0.tar.gz (9.4 kB view details)

Uploaded Source

Built Distribution

psod-1.3.0-py3-none-any.whl (9.4 kB view details)

Uploaded Python 3

File details

Details for the file psod-1.3.0.tar.gz.

File metadata

  • Download URL: psod-1.3.0.tar.gz
  • Upload date:
  • Size: 9.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.9.15 Linux/6.2.0-34-generic

File hashes

Hashes for psod-1.3.0.tar.gz
Algorithm Hash digest
SHA256 123306e17f13e0c37b05988ddc1ba8cbbcdb77e670250efdab32f65c12095a4d
MD5 885b15306f58c04ae6fe81eea1f3f7ea
BLAKE2b-256 144f1e3a0e66cb044e15b7a9f160eb7f39a3594ef18b58810fc2d3b95d244b89

See more details on using hashes here.

File details

Details for the file psod-1.3.0-py3-none-any.whl.

File metadata

  • Download URL: psod-1.3.0-py3-none-any.whl
  • Upload date:
  • Size: 9.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.2 CPython/3.9.15 Linux/6.2.0-34-generic

File hashes

Hashes for psod-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 78a8554c5cb7e980cacc42e2f9587512a25a6b8679b79537d1eabe498ddc1909
MD5 88f8f71d104814758cb5c0fa62110cd0
BLAKE2b-256 f4ca04766e79724c4dd422694b818fb9d7a8df515c6a51f27fcda753c8b5dc03

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page