Continuous Affine Feature Transformations for feature mapping.
Project description
CAFT - Continuous Affine Feature Transformer
A custom transformer package that allows users to make affine/geometric transformations on datasets with respect to some curve with a well defined continuous equation.
The transformers attempt to follow the scikit-learn api, however, there are limitations here based on the fact that transformers operate on both X
and y
variables. This will likely cause issues when used within a scikit-learn pipeline.
Installation
Install caft
via pip with
pip install caft
Documentation
Currently, there is no hosted documentation but most functions are well documented, with examples.
Alternatively, there is a thorough example in the example.ipynb notebook.
Useage
The main pattern is as follows.
import sympy as sp
import numpy as np
import matplotlib.pyplot as plt
from caft.odr import SympyODRegressor, ODRegressor
from caft.affine import ContinuousAffineFeatureTransformer
np.random.seed(42)
n = 10000
# Generate data with some natural noise (not errors)
X_true = np.linspace(-2, 2, n) + np.random.uniform(-0.5, 0.5, n)
# Add random measurement errors - both small and extreme
errors_in_X = np.random.normal(0, 0.3, n)
errors_in_y = np.random.normal(0, 5, n)
y = 3 * (X_true + errors_in_X) ** 3 + errors_in_y
fx = 3 * X_true ** 3
# Add systematic error
n_errs = 100
X_outliers = -0.5 * np.ones(n_errs) + 0.2 * np.random.uniform(-0.3, 0.5, n_errs)
y_outliers = -30 * np.ones(n_errs) + np.random.normal(0, 3, n_errs)
X = np.hstack([X_true, X_outliers]).reshape(-1, 1)
y = np.hstack([y, y_outliers])
plt.scatter(X, y)
plt.scatter(X_true, fx, color="r", s=1,)
Here we can see the scatter plot of X
and y
and the original function $y = f(x)$ without noise. Now we can create an affine transformation with respect to the original function (or at least the SympyRegressor estimate of it).
eq = "a * x ** 3 + b"
X_ = X / X.max()
y_ = y / y.max()
sodr = SympyODRegressor(eq, beta0={"a": 0.5, "b": 1})
caft = ContinuousAffineFeatureTransformer(sodr, optimiser="halley")
caft.fit(X_, y_)
Xt, yt = caft.transform(X_, y_)
Xt = Xt.reshape(-1, 1)
plt.scatter(Xt, yt, s=6,)
plt.show()
A most thorough example can be found in example.ipynb notebook.
This is some what of an unusual pattern, using a nested regressor within a transformer. However, the benefit here is that it allows each component to be used individually, either for individual equation regression or by rolling your own regressors to create the regressor equation.
Development
Deploy new versions to PyPI using GitHub Actions:
Change version number to __version__ = "X.Y.Z"
in caft/__init__.py
then
git tag -a "vX.Y.Z" -m "deployment message"
git push --tags
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.