Skip to main content

Feature engineering library that helps you keep track of feature dependencies, documentation and schema

Project description

featureclass

Feature engineering library that helps you keep track of feature dependencies, documentation and schema

Installation

Using pip

pip install featureclass

Motivation

This library helps define a featureclass.
featureclass is inspired by dataclass, and is meant to provide alternative way to define features engineering classes.

I have noticed that the below code is pretty common when doing feature engineering:

from statistics import variance
from math import sqrt
class MyFeatures:
    def calc_all(self, datapoint):
        out = {}
        out['var'] = self.calc_var(datapoint),
        out['stdev'] = self.calc_std(out['var'])
        return out
        
    def calc_var(self, data) -> float:
        return variance(data)

    def calc_stdev(self, var) -> float:
        return sqrt(var)

Some things were missing for me from this type of implementation:

  1. Implicit dependencies between features
  2. No simple schema
  3. No documentation for features
  4. Duplicate declaration of the same feature - once as a function and one as a dict key

This is why I created this library.
I turned the above code into this:

from featureclass import feature, featureclass, feature_names, feature_annotations, asdict, as_dataclass
from statistics import variance
from math import sqrt

@featureclass
class MyFeatures:
    def __init__(self, datapoint):
        self.datapoint = datapoint
    
    @feature()
    def var(self) -> float:
        """Calc variance"""
        return variance(self.datapoint)

    @feature()
    def stdev(self) -> float:
        """Calc stdev"""
        return sqrt(self.var)

print(feature_names(MyFeatures)) # ('var', 'stdev')
print(feature_annotations(MyFeatures)) # {'var': float, 'stdev': float}
print(asdict(MyFeatures([1,2,3,4,5]))) # {'var': 2.5, 'stdev': 1.5811388300841898}
print(as_dataclass(MyFeatures([1,2,3,4,5]))) # MyFeatures(stdev=1.5811388300841898, var=2.5)

The feature decorator is using cached_property to cache the feature calculation,
making sure that each feature is calculated once per datapoint

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

featureclass-0.3.0.tar.gz (8.4 kB view details)

Uploaded Source

Built Distribution

featureclass-0.3.0-py3-none-any.whl (9.0 kB view details)

Uploaded Python 3

File details

Details for the file featureclass-0.3.0.tar.gz.

File metadata

  • Download URL: featureclass-0.3.0.tar.gz
  • Upload date:
  • Size: 8.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.10.2 Linux/5.11.0-1025-azure

File hashes

Hashes for featureclass-0.3.0.tar.gz
Algorithm Hash digest
SHA256 ef8f586622aae156eda6d79d86187cac310ffe3d5c0cfbbb12ef3d9ae1ae7bc5
MD5 cd15889af795e628a1fe4cfa586decb9
BLAKE2b-256 a091abb7c789185f0650129117afb189a893b268f1d1b3663a53d71c0d022438

See more details on using hashes here.

File details

Details for the file featureclass-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: featureclass-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 9.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.12 CPython/3.10.2 Linux/5.11.0-1025-azure

File hashes

Hashes for featureclass-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b7dadbb0620bd87c573d6988f7456af40ec5a4938fcc1630d7453a317ca8ec09
MD5 65cab017dfa826b14006e196f9ba1990
BLAKE2b-256 ee903530f795f51134bd9dcf29c987676deac1a8211521c8444a43af5ef62271

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page