Feature engineering library that helps you keep track of feature dependencies, documentation and schema
Project description
featureclass
Feature engineering library that helps you keep track of feature dependencies, documentation and schema
Installation
Using pip
pip install featureclass
Motivation
This library helps define a featureclass.
featureclass is inspired by dataclass, and is meant to provide alternative way to define features engineering classes.
I have noticed that the below code is pretty common when doing feature engineering:
from statistics import variance
from math import sqrt
class MyFeatures:
def calc_all(self, datapoint):
out = {}
out['var'] = self.calc_var(datapoint),
out['stdev'] = self.calc_std(out['var'])
return out
def calc_var(self, data) -> float:
return variance(data)
def calc_stdev(self, var) -> float:
return sqrt(var)
Some things were missing for me from this type of implementation:
- Implicit dependencies between features
- No simple schema
- No documentation for features
- Duplicate declaration of the same feature - once as a function and one as a dict key
This is why I created this library.
I turned the above code into this:
from featureclass import feature, featureclass, feature_names, feature_annotations, asdict, as_dataclass
from statistics import variance
from math import sqrt
@featureclass
class MyFeatures:
def __init__(self, datapoint):
self.datapoint = datapoint
@feature()
def var(self) -> float:
"""Calc variance"""
return variance(self.datapoint)
@feature()
def stdev(self) -> float:
"""Calc stdev"""
return sqrt(self.var)
print(feature_names(MyFeatures)) # ('var', 'stdev')
print(feature_annotations(MyFeatures)) # {'var': float, 'stdev': float}
print(asdict(MyFeatures([1,2,3,4,5]))) # {'var': 2.5, 'stdev': 1.5811388300841898}
print(as_dataclass(MyFeatures([1,2,3,4,5]))) # MyFeatures(stdev=1.5811388300841898, var=2.5)
The feature decorator is using cached_property to cache the feature calculation,
making sure that each feature is calculated once per datapoint
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file featureclass-0.3.0.tar.gz
.
File metadata
- Download URL: featureclass-0.3.0.tar.gz
- Upload date:
- Size: 8.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.12 CPython/3.10.2 Linux/5.11.0-1025-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | ef8f586622aae156eda6d79d86187cac310ffe3d5c0cfbbb12ef3d9ae1ae7bc5 |
|
MD5 | cd15889af795e628a1fe4cfa586decb9 |
|
BLAKE2b-256 | a091abb7c789185f0650129117afb189a893b268f1d1b3663a53d71c0d022438 |
File details
Details for the file featureclass-0.3.0-py3-none-any.whl
.
File metadata
- Download URL: featureclass-0.3.0-py3-none-any.whl
- Upload date:
- Size: 9.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.1.12 CPython/3.10.2 Linux/5.11.0-1025-azure
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b7dadbb0620bd87c573d6988f7456af40ec5a4938fcc1630d7453a317ca8ec09 |
|
MD5 | 65cab017dfa826b14006e196f9ba1990 |
|
BLAKE2b-256 | ee903530f795f51134bd9dcf29c987676deac1a8211521c8444a43af5ef62271 |