A tool for automatic transformation of empirical data distributions
Project description
:warning: | Still under development |
---|
Redistributor is a tool for automatic transformation of empirical data distributions. It is implemented in Python3 as a Scikit-learn transformer.
It allows the user to transform their data from arbitrary distribution into other arbitrary distribution. The source and target distributions can be specified exactly, if known beforehand, or can be inferred from the data. Transformation is piece-wise smooth, monotonic, and invertible, and can be saved for later use on different data assuming the same source distribution.
The empirical distribution can be inferred from a 1D array of data. To redistribute multiple slices of your data use Redistributor_multi
class which has a low memory footprint and utilizes parallel computing to apply multiple Redistributor
objects.
Installation
:warning: | Not yet published on PyPi. Coming soon. |
---|
The code is hosted in this GitLab repository. To install the released version from Pypi use:
pip install redistributor
Or install the bleeding edge directly from git:
pip install git+https://gitlab.com/paloha/redistributor
For development, install the package in editable mode with extra dependencies for documentation and testing:
# Clone the repository
git clone git@gitlab.com:paloha/redistributor.git
cd redistributor
# Use virtual environment [optional]
python3 -m virtualenv .venv
source .venv/bin/activate
# Install with pip in editable mode
pip install -e .[dev]
Compatibility
...
Dependencies
Required packages for Redistributor
are specified in the install_requires
list in the setup.py
file.
Extra dependencies for running the tests, compiling the documentation, or running the examples are specified in the extras_require
dictionary in the same file.
The full version-locked list of dependencies and subdependencies is frozen in requirements.txt
. Installing with pip install -r requirements.txt
in a virtual environment should always lead to a fully functional project.
Mathematical description
Assume we are given data (x\sim S) distributed according to some source distribution (S) on (\mathbb{R}) and our goal is to find a transformation (R) such that (R(x)\sim T) for some target distribution (T) on (\mathbb{R}).
One can mathematically show that a suitable (R\colon \mathbb{R} \to \mathbb{R}) is given by $$ R := F_{T}^{-1} \circ F_{S}, $$ where (F_S) and (F_T) are the cumulative distribution functions of (S) and (T), respectively.
If (S) and (T) is unknown, one can use approximations (\tilde{F}S) and (\tilde{F}T) of the corresponding cumulative distribution functions given by interpolating (partially) sorted data $$ (x_i){i=1}^N \ \text{with} \ x_i \sim S $$ $$ (y_i){i=1}^M \ \text{with} \ y_i \sim T. $$ Defining $$ \tilde{R} := \tilde{F}_{T}^{-1} \circ \tilde{F}_S, $$ one can, under suitable conditions, show that $$ \tilde{R} \xrightarrow[N,M\to \infty]{} R. $$
How to cite
...
License
This project is licensed under the terms of the MIT license.
See license.txt
for details.
Acknowledgement
This work was supported by the International Mobility of Researchers (program call no.: CZ.02.2.69/0.0/0.0/16027/0008371).
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for redistributor-0.2.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0b27044dc1b2131a2a62b3a1bb2e0d5cb4429a55e8eb5df674c7e1cbf78c5636 |
|
MD5 | 3395b71b4c63fc12a770a12bc4dff5a7 |
|
BLAKE2b-256 | e139b1ac465f7a88c6cb0329ce4db33c8f477cfc79ae32084e8e3cae1eb7da74 |