Skip to main content

Feature engineering library that helps you keep track of feature dependencies, documentation and schema

Project description

featureclass

Feature engineering library that helps you keep track of feature dependencies, documentation and schema

Installation

Using pip

pip install featureclass

Motivation

This library helps define a featureclass.
featureclass is inspired by dataclass, and is meant to provide alternative way to define features engineering classes.

I have noticed that the below code is pretty common when doing feature engineering:

from statistics import variance
from math import sqrt
class MyFeatures:
    def calc_all(self, datapoint):
        out = {}
        out['var'] = self.calc_var(datapoint),
        out['stdev'] = self.calc_std(out['var'])
        return out
        
    def calc_var(self, data) -> float:
        return variance(data)

    def calc_stdev(self, var) -> float:
        return sqrt(var)

Some things were missing for me from this type of implementation:

  1. Implicit dependencies between features
  2. No simple schema
  3. No documentation for features
  4. Duplicate declaration of the same feature - once as a function and one as a dict key

This is why I created this library.
I turned the above code into this:

from featureclass import feature, featureclass, feature_names, feature_annotations, asdict, as_dataclass
from statistics import variance
from math import sqrt

@featureclass
class MyFeatures:
    def __init__(self, datapoint):
        self.datapoint = datapoint
    
    @feature()
    def var(self) -> float:
        """Calc variance"""
        return variance(self.datapoint)

    @feature()
    def stdev(self) -> float:
        """Calc stdev"""
        return sqrt(self.var)

print(feature_names(MyFeatures)) # ('var', 'stdev')
print(feature_annotations(MyFeatures)) # {'var': float, 'stdev': float}
print(asdict(MyFeatures([1,2,3,4,5]))) # {'var': 2.5, 'stdev': 1.5811388300841898}
print(as_dataclass(MyFeatures([1,2,3,4,5]))) # MyFeatures(stdev=1.5811388300841898, var=2.5)

The feature decorator is using cached_property to cache the feature calculation,
making sure that each feature is calculated once per datapoint

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

featureclass-0.3.0.tar.gz (8.4 kB view hashes)

Uploaded Source

Built Distribution

featureclass-0.3.0-py3-none-any.whl (9.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page