Kernel density integral transformation
Project description
# kditransform
The kernel-density integral transformation, like [min-max scaling](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html) and [quantile transformation](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.QuantileTransformer.html), maps continuous features to the range [0, 1]. It achieves a happy balance between these two transforms, preserving the shape of the input distribution like min-max scaling, while nonlinearly attenuating the effect of outliers like quantile transformation. It can also be used to discretize features, offering a data-driven alternative to univariate clustering or [K-bins discretization](https://scikit-learn.org/stable/modules/preprocessing.html#preprocessing-discretization).
You can tune the interpolation $alpha$ between 0 (quantile transform) and $infty$ (min-max transform), but a good default is $alpha=1$, which is equivalent to using scipy.stats.gaussian_kde(bw_method=1). This improves performance for a lot of supervised learning problems; see [classification-plots.ipynb](https://github.com/calvinmccarter/kditransform/blob/master/examples/classification-plots.ipynb) for example code.
<img src=”examples/Accuracy-vs-bwf-iris-pca.jpg” alt=”drawing” width=”300”/><img src=”examples/MSE-vs-bwf-cahousing-linr-nolegend.jpg” alt=”drawing” width=”300”/>
## Installation
` pip install -r requirements.txt pip install -e . pytest `
## Usage
kditransform.KDTransformer is a drop-in replacement for [sklearn.preprocessing.QuantileTransformer](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.QuantileTransformer.html). When alpha (defaults to 1.0) is small, our method behaves like the QuantileTransformer; when alpha is large, it behaves like [sklearn.preprocessing.MinMaxScaler](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.MinMaxScaler.html).
` from kditransform import KDITransformer X = np.random.uniform(size=(500, 1)) kdt = KDITransformer(alpha=1.) Y = kdt.fit_transform(X) `
kditransform.KDIDiscretizer offers an API based on [sklearn.preprocessing.KBinsDiscretizer](https://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.KBinsDiscretizer.html). It encodes each feature ordinally, similarly to KBinsDiscretizer(encode=’ordinal’).
` from kditransform import KDIDiscretizer rng = np.random.default_rng(1) x1 = rng.normal(1, 0.75, size=int(0.55*N)) x2 = rng.normal(4, 1, size=int(0.3*N)) x3 = rng.uniform(0, 20, size=int(0.15*N)) X = np.sort(np.r_[x1, x2, x3]).reshape(-1, 1) kdd = KDIDiscretizer() T = kdd.fit_transform(X) `
Initialized as KDIDiscretizer(enable_predict_proba=True), we can also output one-hot encodings and probabilistic one-hot encodings of single-feature input data.
` kdd = KDIDiscretizer(enable_predict_proba=True).fit(X) P = kdd.predict(X) # one-hot encoding P = kdd.predict_proba(X) # probabilistic one-hot encoding `
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for kditransform-0.1.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 6c56b5bad9871eb9803b375c9a300c72dd1d2c9ea6c11494825fe7ace1b7b49c |
|
MD5 | d7a99fd44a631b1e911c25a85aa6c1ee |
|
BLAKE2b-256 | b073831cdb64dd065c646926a0b72b3c7c9aa83b3ea7d5fb32fcfbfc4447ff08 |