Skip to main content

Continuous Affine Feature Transformations for feature mapping.

Project description

CAFT - Continuous Affine Feature Transformer

PyPI package version number Unit Tests Status License

A custom transformer package that allows users to make affine/geometric transformations on datasets with respect to some curve with a well defined continuous equation.

The transformers attempt to follow the scikit-learn api, however, there are limitations here based on the fact that transformers operate on both X and y variables. This will likely cause issues when used within a scikit-learn pipeline.

Installation

Install caft via pip with

pip install caft

Documentation

Currently, there is no hosted documentation but most functions are well documented, with examples.

Alternatively, there is a thorough example in the example.ipynb notebook.

Useage

The main pattern is as follows.

import sympy as sp
import numpy as np
import matplotlib.pyplot as plt

from caft.odr import SympyODRegressor, ODRegressor
from caft.affine import ContinuousAffineFeatureTransformer

np.random.seed(42)

n = 10000

# Generate data with some natural noise (not errors)
X_true = np.linspace(-2, 2, n) + np.random.uniform(-0.5, 0.5, n)

# Add random measurement errors - both small and extreme
errors_in_X = np.random.normal(0, 0.3, n)
errors_in_y = np.random.normal(0, 5, n)
y =  3 * (X_true + errors_in_X) ** 3 + errors_in_y
fx = 3 * X_true ** 3

# Add systematic error
n_errs = 100
X_outliers = -0.5 * np.ones(n_errs) + 0.2 * np.random.uniform(-0.3, 0.5, n_errs)
y_outliers = -30 * np.ones(n_errs) + np.random.normal(0, 3, n_errs)
X = np.hstack([X_true, X_outliers]).reshape(-1, 1)
y = np.hstack([y, y_outliers])

plt.scatter(X, y)
plt.scatter(X_true, fx, color="r", s=1,)

Alt text

Here we can see the scatter plot of X and y and the original function $y = f(x)$ without noise. Now we can create an affine transformation with respect to the original function (or at least the SympyRegressor estimate of it).

eq = "a * x ** 3 + b"

X_ = X / X.max()
y_ = y / y.max()

sodr = SympyODRegressor(eq, beta0={"a": 0.5, "b": 1})
caft = ContinuousAffineFeatureTransformer(sodr, optimiser="halley")
caft.fit(X_, y_)
Xt, yt = caft.transform(X_, y_)
Xt = Xt.reshape(-1, 1)

plt.scatter(Xt, yt, s=6,)
plt.show()

Alt text

A most thorough example can be found in example.ipynb notebook.

This is some what of an unusual pattern, using a nested regressor within a transformer. However, the benefit here is that it allows each component to be used individually, either for individual equation regression or by rolling your own regressors to create the regressor equation.

Development

Deploy new versions to PyPI using GitHub Actions:

Change version number to __version__ = "X.Y.Z" in caft/__init__.py then

git tag -a "vX.Y.Z" -m "deployment message"
git push --tags

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

caft-0.1.9.tar.gz (75.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

caft-0.1.9-py3-none-any.whl (12.8 kB view details)

Uploaded Python 3

File details

Details for the file caft-0.1.9.tar.gz.

File metadata

  • Download URL: caft-0.1.9.tar.gz
  • Upload date:
  • Size: 75.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for caft-0.1.9.tar.gz
Algorithm Hash digest
SHA256 0605e151d305556625dd808a465c731b381427cbc71d1f34e85a69b8d7f6d3e7
MD5 38288455f3b26be23844e1dc5f90978a
BLAKE2b-256 f010b5b0ecf96c5483f7d86914644c51718faf404ad2c8e3d756c0bef519e0e3

See more details on using hashes here.

File details

Details for the file caft-0.1.9-py3-none-any.whl.

File metadata

  • Download URL: caft-0.1.9-py3-none-any.whl
  • Upload date:
  • Size: 12.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.4

File hashes

Hashes for caft-0.1.9-py3-none-any.whl
Algorithm Hash digest
SHA256 f20f13c8aef4802fc744197200871983495feb609b4eef251db5c2b312e7ab88
MD5 e7b7e20787252072b3fa2731d4b47e1a
BLAKE2b-256 44767720aa205d3fdfe1a3c765c63f4bce8ed1a5916a35de8fbdd6adfce76143

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page