Skip to main content

This is a Python implementation of the p-square algorithm

Project description

Introduction

This is a python implementation of this paper, which proposes a heuristic algorithm for dynamic calculation of the median and other percentiles. It has the advantage of running in O(1) at each iteration and is hence particularly useful when dealing with continuously (and fastly) incoming data.

Install

  1. From sources

    git clone git@gitlab.octo.com:bdalab/psquare.git
    cd psquare
    python setup.py install
    
  2. Using pip

    pip install psquare
    

Example of use

In the following example, we will estimate the value of the 95th percentile of a N(0,1) distribution using p square algorithm. We will compare our estimates as we continuously draw from the distribution, by comparing with the truth value given by the numpy.percentile function. At the end, we will plot the residuals, and the execution time using psquare and numpy.percentile.

import numpy as np
import matplotlib
import matplotlib.pyplot as plt
import time

from psquare.psquare import PSquare

NB_ITERATIONS = 30000


def random_generator():
    return np.random.normal(0, 1, 1)


def exact_value_for_quantile(values, quantile):
    return np.percentile(values, quantile)


def main():
    values = [random_generator() for _ in range(5)]
    quantile_to_estimate = 95

    psquare = PSquare(quantile_to_estimate)
    exact_quantiles = []
    estimated_quantiles = []
    psquare_exec_time = []
    numpy_exec_time = []

    for val in values:  # p square algorithm necessitates 5 values to start
        psquare.update(val)

    for _ in range(NB_ITERATIONS):
        new_val = random_generator()
        values.append(new_val)

        psquare_start = time.time()
        psquare.update(new_val)
        estimated_quantiles.append(psquare.p_estimate())
        psquare_end = time.time()

        numpy_start = time.time()
        exact_quantiles.append(exact_value_for_quantile(values, quantile_to_estimate))
        numpy_end = time.time()

        psquare_exec_time.append(psquare_end - psquare_start)
        numpy_exec_time.append(numpy_end - numpy_start)

    matplotlib.rc('figure', figsize=(10, 5))
    errors = np.abs(np.array(estimated_quantiles) - np.array(exact_quantiles))
    plt.plot(errors)
    plt.title('Absolute error between p-square predicted value and exact percentile value')
    plt.ylabel('Difference between exact percentile value and p-square estimation')
    plt.xlabel('Size of the dataset')
    plt.rcParams["figure.figsize"] = (10, 5)
    plt.show()

    plt.plot(psquare_exec_time[1:], label='p-square')
    plt.plot(numpy_exec_time[1:], label='numpy percentile')
    plt.title('Execution time to compute percentile on a growing dataset')
    plt.ylabel('Execution time (in seconds)')
    plt.xlabel('Size of the dataset')
    plt.legend()
    plt.rcParams["figure.figsize"] = (10, 5)
    plt.show()


if __name__ == '__main__':
    main()

Error between estimated and exact value

Using previous example we obtain the following figures:

  • Errors between exact and predicted percentile value

  • Execution time between p-square estimations and numpy percentile function:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

psquare-1.1.tar.gz (3.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

psquare-1.1-py3-none-any.whl (4.9 kB view details)

Uploaded Python 3

File details

Details for the file psquare-1.1.tar.gz.

File metadata

  • Download URL: psquare-1.1.tar.gz
  • Upload date:
  • Size: 3.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.1 requests-toolbelt/0.9.1 tqdm/4.39.0 CPython/3.7.3

File hashes

Hashes for psquare-1.1.tar.gz
Algorithm Hash digest
SHA256 90c98c56606772620c96c2b07bf79433b02cb159c5553943f1fbdddd27bf10aa
MD5 e107b88352998275bc904a6548c0157c
BLAKE2b-256 a31d5be857d68c4eb69d94b81dd96663570c2a685a3d8750ab44878a0f849856

See more details on using hashes here.

File details

Details for the file psquare-1.1-py3-none-any.whl.

File metadata

  • Download URL: psquare-1.1-py3-none-any.whl
  • Upload date:
  • Size: 4.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/42.0.1 requests-toolbelt/0.9.1 tqdm/4.39.0 CPython/3.7.3

File hashes

Hashes for psquare-1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ef078241207335e65314ab9968b1f6a24cdfe6356ceb7286ef7f0135c262b19a
MD5 6348ce8bad8e60a0b4980390fd31f115
BLAKE2b-256 16671f0cf584102f762d11d1e7fe3d565a2db4bf2558e08ce8e0b6688dd8f7ca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page