Robustats is a Python library for high-performance computation of robust statistical estimators.

Robustats

Robustats is a Python library for high-performance computation of robust statistical estimators.

The functions that compute the robust estimators are implemented in C for speed and called by Python.

Estimators implemented in the library:

• Weighted Median (temporal complexity: O(n)) [1, 2, 3]
• Medcouple (temporal complexity: O(n * log(n))) [4, 5, 6, 7]
• Mode (temporal complexity: O(n * log(n))) [8]

How to Install

This library requires Python 3.

You can install the library using Pip.

pip install robustats

You can also install the library directly from GitHub using the following command.

pip install -e 'git+https://github.com/FilippoBovo/robustats.git#egg=robustats'

Otherwise, you may clone the repository, and install and test the Robustats package in the following way.

git clone https://github.com/FilippoBovo/robustats.git
cd robustats
pip install -e .
python -m unittest

How to Use

This is an example of how to use the Robustats library in Python.

import numpy as np
import robustats

# Weighted Median
x = np.array([1.1, 5.3, 3.7, 2.1, 7.0, 9.9])
weights = np.array([1.1, 0.4, 2.1, 3.5, 1.2, 0.8])

weighted_median = robustats.weighted_median(x, weights)

print("The weighted median is {}".format(weighted_median))
# Output: The weighted median is 2.1

# Medcouple
x = np.array([0.2, 0.17, 0.08, 0.16, 0.88, 0.86, 0.09, 0.54, 0.27, 0.14])

medcouple = robustats.medcouple(x)

print("The medcouple is {}".format(medcouple))
# Output: The medcouple is 0.7692307692307692

# Mode
x = np.array([1., 2., 2., 3., 3., 3., 4., 4., 5.])

mode = robustats.mode(x)

print("The mode is {}".format(mode))
# Output: The mode is 3.0

How to Contribute

If you wish to contribute to this library, please follow the patterns and style of the rest of the code.

Tips:

• In C, use malloc to allocate memory to the heap, instead of creating arrays that allocate memory to the stack, as with large array we would incur in a segmentation fault due to stack overflow.
• Avoid recursions where possible to limit the spatial complexity of the problem. In place of recursions, use loops.