Skip to main content

Continuous Affine Feature Transformations for feature mapping.

Project description

CAFT - Continuous Affine Feature Transformer

PyPI package version number Unit Tests Status License

A custom transformer package that allows users to make affine/geometric transformations on datasets with respect to some curve with a well defined continuous equation.

The transformers attempt to follow the scikit-learn api, however, there are limitations here based on the fact that transformers operate on both X and y variables. This will likely cause issues when used within a scikit-learn pipeline.

Installation

Install caft via pip with

pip install caft

Documentation

Currently, there is no hosted documentation but most functions are well documented, with examples.

Alternatively, there is a thorough example in the example.ipynb notebook.

Useage

The main pattern is as follows.

import sympy as sp
import numpy as np
import matplotlib.pyplot as plt

from caft.odr import SympyODRegressor, ODRegressor
from caft.affine import ContinuousAffineFeatureTransformer

np.random.seed(42)

n = 10000

# Generate data with some natural noise (not errors)
X_true = np.linspace(-2, 2, n) + np.random.uniform(-0.5, 0.5, n)

# Add random measurement errors - both small and extreme
errors_in_X = np.random.normal(0, 0.3, n)
errors_in_y = np.random.normal(0, 5, n)
y =  3 * (X_true + errors_in_X) ** 3 + errors_in_y
fx = 3 * X_true ** 3

# Add systematic error
n_errs = 100
X_outliers = -0.5 * np.ones(n_errs) + 0.2 * np.random.uniform(-0.3, 0.5, n_errs)
y_outliers = -30 * np.ones(n_errs) + np.random.normal(0, 3, n_errs)
X = np.hstack([X_true, X_outliers]).reshape(-1, 1)
y = np.hstack([y, y_outliers])

plt.scatter(X, y)
plt.scatter(X_true, fx, color="r", s=1,)

Alt text

Here we can see the scatter plot of X and y and the original function $y = f(x)$ without noise. Now we can create an affine transformation with respect to the original function (or at least the SympyRegressor estimate of it).

eq = "a * x ** 3 + b"

X_ = X / X.max()
y_ = y / y.max()

sodr = SympyODRegressor(eq, beta0={"a": 0.5, "b": 1})
caft = ContinuousAffineFeatureTransformer(sodr, optimiser="halley")
caft.fit(X_, y_)
Xt, yt = caft.transform(X_, y_)
Xt = Xt.reshape(-1, 1)

plt.scatter(Xt, yt, s=6,)
plt.show()

Alt text

A most thorough example can be found in example.ipynb notebook.

This is some what of an unusual pattern, using a nested regressor within a transformer. However, the benefit here is that it allows each component to be used individually, either for individual equation regression or by rolling your own regressors to create the regressor equation.

Development

Deploy new versions to PyPI using GitHub Actions:

Change version number to __version__ = "X.Y.Z" in caft/__init__.py then

git tag -a "vX.Y.Z" -m "deployment message"
git push --tags

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

caft-0.1.9.tar.gz (75.9 kB view hashes)

Uploaded Source

Built Distribution

caft-0.1.9-py3-none-any.whl (12.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page