Skip to main content

A tool for automatic transformation of empirical data distributions

Project description

:warning: Still under development

Redistributor is a tool for automatic transformation of empirical data distributions. It is implemented in Python3 as a Scikit-learn transformer.

It allows the user to transform their data from arbitrary distribution into other arbitrary distribution. The source and target distributions can be specified exactly, if known beforehand, or can be inferred from the data. Transformation is piece-wise smooth, monotonic, and invertible, and can be saved for later use on different data assuming the same source distribution.

The empirical distribution can be inferred from a 1D array of data. To redistribute multiple slices of your data use Redistributor_multi class which has a low memory footprint and utilizes parallel computing to apply multiple Redistributor objects.

Installation

:warning: Not yet published on PyPi. Coming soon.

The code is hosted in this GitLab repository. To install the released version from Pypi use:

pip install redistributor

Or install the bleeding edge directly from git:

pip install git+https://gitlab.com/paloha/redistributor

For development, install the package in editable mode with extra dependencies for documentation and testing:

# Clone the repository
git clone git@gitlab.com:paloha/redistributor.git
cd redistributor

 # Use virtual environment [optional]
python3 -m virtualenv .venv
source .venv/bin/activate

# Install with pip in editable mode
pip install -e .[dev]

Compatibility

...

Dependencies

Required packages for Redistributor are specified in the install_requires list in the setup.py file.

Extra dependencies for running the tests, compiling the documentation, or running the examples are specified in the extras_require dictionary in the same file.

The full version-locked list of dependencies and subdependencies is frozen in requirements.txt. Installing with pip install -r requirements.txt in a virtual environment should always lead to a fully functional project.

Mathematical description

Assume we are given data (x\sim S) distributed according to some source distribution (S) on (\mathbb{R}) and our goal is to find a transformation (R) such that (R(x)\sim T) for some target distribution (T) on (\mathbb{R}).

One can mathematically show that a suitable (R\colon \mathbb{R} \to \mathbb{R}) is given by $$ R := F_{T}^{-1} \circ F_{S}, $$ where (F_S) and (F_T) are the cumulative distribution functions of (S) and (T), respectively.

If (S) and (T) is unknown, one can use approximations (\tilde{F}S) and (\tilde{F}T) of the corresponding cumulative distribution functions given by interpolating (partially) sorted data $$ (x_i){i=1}^N \ \text{with} \ x_i \sim S $$ $$ (y_i){i=1}^M \ \text{with} \ y_i \sim T. $$ Defining $$ \tilde{R} := \tilde{F}_{T}^{-1} \circ \tilde{F}_S, $$ one can, under suitable conditions, show that $$ \tilde{R} \xrightarrow[N,M\to \infty]{} R. $$

How to cite

...

License

This project is licensed under the terms of the MIT license. See license.txt for details.

Acknowledgement

This work was supported by the International Mobility of Researchers (program call no.: CZ.02.2.69/0.0/0.0/16027/0008371). opvvv

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

redistributor-0.2.1.tar.gz (12.3 kB view hashes)

Uploaded Source

Built Distribution

redistributor-0.2.1-py3-none-any.whl (12.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page