Skip to main content

Minimal tool for outliers detection on small samples set

Project description

Outlier Detector toolkit

Build Status codecov License: MIT Code style: black pre-commit

This project features a set of tools for outlier detection, marking or filtering away samples as they come to your Python analysis code.

Most of the tools rely on double tailed Dixon's Q-test (https://en.wikipedia.org/wiki/Dixon%27s_Q_test).

Installation

pip install outlier-detector

TL;DR

I have a sample, and a know data distribution: is the sample an outlier?
sample = -14.5
distribution = [0.1, 1.1, 4.78, 2.0, 7.2, 5.3]

from outlier_detector.functions import is_outlier
print(is_outlier(distribution, sample))
I have a distribution and I iterate over it: is the n-th sample an outlier?
distribution = [0.1, 1.1, 4.78, 2.0, 7.2, 5.3, 8.1, -14.1, 5.4]
from outlier_detector.detectors import OutlierDetector
od = OutlierDetector(buffer_samples=5)
for sample in distribution:
    print(od.is_outlier(sample))
I have a generating object from which I pop samples; and I want only valid samples: how can I reject outliers?
distribution = [0.1, 1.1, 4.78, 2.0, 7.2, 5.3, 8.1, -14.1, 5.4]
from outlier_detector.filters import filter_outlier

class MyGen:
    def __init__(self):
        self.cursor = -1

    @filter_outlier()
    def pop(self):
        self.cursor += 1
        return distribution[self.cursor]

g = MyGen()
while True:
    try:
        r = g.pop()
        print(r)
    except IndexError:
        print('No more data')
        break
I have a generating object from which I pop samples; and I want to iterate only on valid samples: how can I reject outliers and get an iterator?
distribution = [0.1, 1.1, 4.78, 2.0, 7.2, 5.3, 8.1, -14.1, 5.4]
from outlier_detector.filters import OutlierFilter

class MyGen:
    def __init__(self):
        self.cursor = -1

    def pop(self):
        self.cursor += 1
        return distribution[self.cursor]

g = MyGen()
of = OutlierFilter()
try:
    for sample in of.filter(g.pop):
        print(sample)
except IndexError:
    print('No more data')

Documentation

The toolkit is organized so you can exploit one of the following pattern in the easiest way possible: functions for static analysis, detectors for objects with internal buffers, and filters for decorators.

For documentation see doc file

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

outlier_detector-0.0.3.tar.gz (10.9 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page