Skip to main content

Minimal tool for outliers detection on small samples set

Project description

Outlier Detector toolkit

Build Status codecov License: MIT Code style: black pre-commit

This project features a set of tools for outlier detection, marking or filtering away samples as they come to your Python analysis code.

Most of the tools rely on double tailed Dixon's Q-test (https://en.wikipedia.org/wiki/Dixon%27s_Q_test).

Installation

pip install outlier-detector

TL;DR

I have a sample, and a know data distribution: is the sample an outlier?
sample = -14.5
distribution = [0.1, 1.1, 4.78, 2.0, 7.2, 5.3]

from outlier_detector.functions import is_outlier
print(is_outlier(distribution, sample))
I have a distribution and I iterate over it: is the n-th sample an outlier?
distribution = [0.1, 1.1, 4.78, 2.0, 7.2, 5.3, 8.1, -14.1, 5.4]
from outlier_detector.detectors import OutlierDetector
od = OutlierDetector(buffer_samples=5)
for sample in distribution:
    print(od.is_outlier(sample))
I have a generating object from which I pop samples; and I want only valid samples: how can I reject outliers?
distribution = [0.1, 1.1, 4.78, 2.0, 7.2, 5.3, 8.1, -14.1, 5.4]
from outlier_detector.filters import filter_outlier

class MyGen:
    def __init__(self):
        self.cursor = -1

    @filter_outlier()
    def pop(self):
        self.cursor += 1
        return distribution[self.cursor]

g = MyGen()
while True:
    try:
        r = g.pop()
        print(r)
    except IndexError:
        print('No more data')
        break
I have a generating object from which I pop samples; and I want to iterate only on valid samples: how can I reject outliers and get an iterator?
distribution = [0.1, 1.1, 4.78, 2.0, 7.2, 5.3, 8.1, -14.1, 5.4]
from outlier_detector.filters import OutlierFilter

class MyGen:
    def __init__(self):
        self.cursor = -1

    def pop(self):
        self.cursor += 1
        return distribution[self.cursor]

g = MyGen()
of = OutlierFilter()
try:
    for sample in of.filter(g.pop):
        print(sample)
except IndexError:
    print('No more data')

Documentation

The toolkit is organized so you can exploit one of the following pattern in the easiest way possible: functions for static analysis, detectors for objects with internal buffers, and filters for decorators.

For documentation see doc file

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

outlier_detector-0.0.3.tar.gz (10.9 kB view details)

Uploaded Source

File details

Details for the file outlier_detector-0.0.3.tar.gz.

File metadata

  • Download URL: outlier_detector-0.0.3.tar.gz
  • Upload date:
  • Size: 10.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.23.0 setuptools/46.1.3 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.5.6

File hashes

Hashes for outlier_detector-0.0.3.tar.gz
Algorithm Hash digest
SHA256 5966e26339bfc7f6da32c68da163073a6ea22be2f2172c6816bea143470a99c0
MD5 7264ada4f20de00e7dcd959733998b6a
BLAKE2b-256 60ef035c637063fa7576a85c5a12a7e1176fd28b4a6f3df297123d804a14c73e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page