Skip to main content

Open-source Python library designed to improve engineering practices and transparency in feature engineering.

Project description

Feature Fabrica

Feature Fabrica is an open-source Python library designed to improve engineering practices and transparency in feature engineering. It allows users to define features declaratively using YAML, manage dependencies between features, and apply complex transformations in a scalable and convenient manner.

By providing a structured approach to feature engineering, Feature Fabrica aims to save time, reduce errors, and enhance the transparency and reproducibility of your machine learning workflows. Whether you’re a data scientist working on small projects or an engineer managing large-scale pipelines, Feature Fabrica is designed to meet your needs.

Introduction

In machine learning and data science, feature engineering plays a crucial role in building effective models. However, managing complex feature dependencies and transformations can be challenging. Feature Fabrica aims to simplify and streamline this process by providing a structured way to define, manage, and transform features.

With Feature Fabrica, you can:

  • Define features declaratively using YAML.
  • Manage dependencies between features automatically.
  • Apply and chain transformations to compute derived features.
  • Validate feature values using Pydantic.

Key Features

  • Declarative Feature Definitions: Define features, data types, and dependencies using a simple YAML configuration.
  • Transformations: Apply custom transformations to raw features to derive new features.
  • Dependency Management: Automatically handle dependencies between features.
  • Pydantic Validation: Ensure data types and values conform to expected formats.
  • Scalability: Designed to scale from small projects to large machine learning pipelines.
  • Hydra Integration: Leverage Hydra for configuration management, enabling flexible and dynamic configuration of transformations.

Quick Start

Defining Features in YAML

Features are defined in a YAML file. Here’s an example:

feature_a:
  description: "Raw feature A"
  data_type: "float32"

feature_b:
  description: "Raw feature B"
  data_type: "float32"

feature_c:
  description: "Derived feature C"
  data_type: "float32"
  dependencies: ["feature_a", "feature_b"]
  transformation:
    sum_fn:
      _target_: feature_fabrica.transform.SumFn
      iterable: ["feature_a", "feature_b"]
    scale_feature:
      _target_: feature_fabrica.transform.ScaleFeature
      factor: 0.5

Creating and Using Transformations

You can define custom transformations by subclassing the Transformation class:

from typing import Union
import numpy as np
from beartype import beartype
from numpy.typing import NDArray
from feature_fabrica.transform import Transformation

NumericArray = Union[NDArray[np.floating], NDArray[np.int_]]
NumericValue = Union[np.floating, np.int_, float, int]


class ScaleFeature(Transformation):
    def __init__(self, factor: float):
        super().__init__()
        self.factor = factor

    @beartype
    def execute(self, data: NumericArray | NumericValue) -> NumericArray | NumericValue:
        return np.multiply(data, self.factor)

Compiling and Executing Features

To compile and execute features:

import numpy as np
from feature_fabrica.core import FeatureManager

data = {
    "feature_a": np.array([10.0], dtype=np.float32),
    "feature_b": np.array([20.0], dtype=np.float32),
}
feature_manager = FeatureManager(
    config_path="../examples", config_name="basic_features"
)
results = feature_manager.compute_features(data)
print(results["feature_c"])  # 0.5 * (10 + 20) = 15.0
print(results.feature_c)  # 0.5 * (10 + 20) = 15.0

Visualize Features and Dependencies

Track & trace Transformation Chains

import numpy as np
from feature_fabrica.core import FeatureManager

data = {
    "feature_a": np.array([10.0], dtype=np.float32),
    "feature_b": np.array([20.0], dtype=np.float32),
}
feature_manager = FeatureManager(
    config_path="../examples", config_name="basic_features"
)
results = feature_manager.compute_features(data)
print(feature_manager.features.feature_c.get_transformation_chain())
# Transformation Chain: (Transformation: sum_fn, Value: 30.0 Time taken: 9.5367431640625e-07 seconds) -> (Transformation: scale_feature, Value: 15.0, Time taken:  9.5367431640625e-07 seconds)

Visualize Dependencies

from feature_fabrica.core import FeatureManager

feature_manager = FeatureManager(
    config_path="../examples", config_name="basic_features"
)
feature_manager.get_visual_dependency_graph()

image.png

Contributing

We welcome contributions! If you have ideas for improvements or want to report issues, feel free to open a pull request or an issue on GitHub.

How to Contribute

  1. Fork the repository.
  2. Create a new branch (git checkout -b feature/your-feature-name).
  3. Commit your changes (git commit -m 'Add some feature').
  4. Push to the branch (git push origin feature/your-feature-name).
  5. Open a pull request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

feature_fabrica-0.1.7.tar.gz (12.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

feature_fabrica-0.1.7-py3-none-any.whl (13.7 kB view details)

Uploaded Python 3

File details

Details for the file feature_fabrica-0.1.7.tar.gz.

File metadata

  • Download URL: feature_fabrica-0.1.7.tar.gz
  • Upload date:
  • Size: 12.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for feature_fabrica-0.1.7.tar.gz
Algorithm Hash digest
SHA256 40724624349f7249058127777c2c899f3692503ccbc4565562cb9ce161388cba
MD5 8fc14c70a4ee3e41b8b44aa0d1a31e9e
BLAKE2b-256 2771cc72006c0f4e31f98ee1d0b2647d6d37620655040368f8e05ca1316ba944

See more details on using hashes here.

File details

Details for the file feature_fabrica-0.1.7-py3-none-any.whl.

File metadata

File hashes

Hashes for feature_fabrica-0.1.7-py3-none-any.whl
Algorithm Hash digest
SHA256 1e849280f3d9e501baeb2403aca3919316a41fa2f616ea9ca6fa221473076cc4
MD5 e2531085ff25c4ed69d62151f51eca74
BLAKE2b-256 34d2ac0daf3f6db0852934b109973ad39265e6a9a6b0128356cb9256243de926

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page