Skip to main content

Robustats is a Python library for high-performance computation of robust statistical estimators.

Project description

Robustats

Robustats is a Python library for high-performance computation of robust statistical estimators.

The functions that compute the robust estimators are implemented in C for speed and called by Python.

Estimators implemented in the library:

  • Weighted Median (temporal complexity: O(n)) [1, 2, 3]
  • Medcouple (temporal complexity: O(n * log(n))) [4, 5, 6, 7]
  • Mode (temporal complexity: O(n * log(n))) [8]

How to Install

This library requires Python 3.

You can install the library using Pip.

pip install robustats

You can also install the library directly from GitHub using the following command.

pip install -e 'git+https://github.com/FilippoBovo/robustats.git#egg=robustats'

Otherwise, you may clone the repository, and install and test the Robustats package in the following way.

git clone https://github.com/FilippoBovo/robustats.git
cd robustats
pip install -e .
python -m unittest

How to Use

This is an example of how to use the Robustats library in Python.

import numpy as np
import robustats


# Weighted Median
x = np.array([1.1, 5.3, 3.7, 2.1, 7.0, 9.9])
weights = np.array([1.1, 0.4, 2.1, 3.5, 1.2, 0.8])

weighted_median = robustats.weighted_median(x, weights)

print("The weighted median is {}".format(weighted_median))
# Output: The weighted median is 2.1


# Medcouple
x = np.array([0.2, 0.17, 0.08, 0.16, 0.88, 0.86, 0.09, 0.54, 0.27, 0.14])

medcouple = robustats.medcouple(x)

print("The medcouple is {}".format(medcouple))
# Output: The medcouple is 0.7749999999999999


# Mode
x = np.array([1., 2., 2., 3., 3., 3., 4., 4., 5.])

mode = robustats.mode(x)

print("The mode is {}".format(mode))
# Output: The mode is 3.0

How to Contribute

If you wish to contribute to this library, please follow the patterns and style of the rest of the code.

Moreover, install the Git hooks.

git config core.hooksPath .githooks

Tips:

  • In C, use malloc to allocate memory to the heap, instead of creating arrays that allocate memory to the stack, as with large array we would incur in a segmentation fault due to stack overflow.
  • Avoid recursions where possible to limit the spatial complexity of the problem. In place of recursions, use loops.

References

[1] Cormen, Leiserson, Rivest, Stein - Introduction to Algorithms (3rd Edition).

[2] Cormen - Introduction to Algorithms (3rd Edition) - Instructor's Manual.

[3] Weighted median on Wikipedia.

[4] G. Brys; M. Hubert; A. Struyf (November 2004). "A Robust Measure of Skewness". Journal of Computational and Graphical Statistics. 13 (4): 996–1017.

[5] Donald B. Johnson; Tetsuo Mizoguchi (May 1978). "Selecting The Kth Element In X + Y And X1 + X2 +...+ Xm". SIAM Journal on Computing. 7 (2): 147–153.

[6] Medcouple implementation in Python by Jordi Gutiérrez Hermoso.

[7] Medcouple on Wikipedia.

[8] David R. Bickel, Rudolf Frühwirth. "On a fast, robust estimator of the mode: Comparisons to other robust estimators with applications", Computational Statistics & Data Analysis, Volume 50, Issue 12, 2006, Pages 3500-3530, ISSN 0167-9473.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

robustats-0.1.7.tar.gz (12.3 kB view details)

Uploaded Source

File details

Details for the file robustats-0.1.7.tar.gz.

File metadata

  • Download URL: robustats-0.1.7.tar.gz
  • Upload date:
  • Size: 12.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.0 CPython/3.9.1

File hashes

Hashes for robustats-0.1.7.tar.gz
Algorithm Hash digest
SHA256 ee30848e27dcdb72b38ebdbb7cc8354904e0bd4af2bd4482609362bd01c41a98
MD5 5a9650d2ba3b56bc38a227f5fbdc896a
BLAKE2b-256 10e164507951c10912a423239c10b3842eea284951c083a1c12882cd3b147f84

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page