Skip to main content

Open-source Python library designed to improve engineering practices and transparency in feature engineering.

Project description

Feature Fabrica logo

⚙️ The Framework to Simplify and Scale Feature Engineering ⚙️

Open in Colab

PyPI version Stars Issues License Contributors Code Quality

For data scientists, ML engineers, and AI researchers who want to simplify feature engineering, manage complex dependencies, and boost productivity.


Introduction

Feature Fabrica is an open-source Python library designed to improve engineering practices and transparency in feature engineering. It allows users to define features declaratively using YAML, manage dependencies between features, and apply complex transformations in a scalable and convenient manner.

By providing a structured approach to feature engineering, Feature Fabrica aims to save time, reduce errors, and enhance the transparency and reproducibility of your machine learning workflows. Whether you're working on small projects or managing large-scale pipelines, Feature Fabrica is designed to meet your needs.

Key Features

  • 📝 Declarative Feature Definitions: Define features, data types, and dependencies using a simple YAML configuration.
  • 🔄 Transformations: Apply custom transformations to raw features to derive new features.
  • 🔗 Dependency Management: Automatically handle dependencies between features.
  • ✔️ Pydantic Validation: Ensure data types and values conform to expected formats.
  • 🛡️ Fail-Fast with Beartype: Catch type-related errors instantly during development, ensuring your transformations are robust.
  • 🚀 Scalability: Designed to scale from small projects to large machine learning pipelines.
  • 🔧 Hydra Integration: Leverage Hydra for configuration management, enabling flexible and dynamic configuration of transformations.

🛠️ Quick Start

Installation

To install Feature Fabrica, simply run:

pip install feature-fabrica

Defining Features in YAML

Features are defined in a YAML file. Here’s an example:

feature_a:
  description: "Raw feature A"
  data_type: "float32"

feature_b:
  description: "Raw feature B"
  data_type: "float32"

feature_c:
  description: "Derived feature C"
  data_type: "float32"
  dependencies: ["feature_a", "feature_b"]
  transformation:
    sum_fn:
      _target_: feature_fabrica.transform.SumReduce
      iterable: ["feature_a", "feature_b"]
    scale_feature:
      _target_: feature_fabrica.transform.ScaleFeature
      factor: 0.5

Creating and Using Transformations

You can define custom transformations by subclassing the Transformation class:

from typing import Union
import numpy as np
from beartype import beartype
from numpy.typing import NDArray
from feature_fabrica.transform import Transformation
from feature_fabrica.transform.utils import NumericArray, NumericValue


class ScaleFeature(Transformation):
    def __init__(self, factor: float):
        super().__init__()
        self.factor = factor

    @beartype
    def execute(self, data: NumericArray | NumericValue) -> NumericArray | NumericValue:
        return np.multiply(data, self.factor)

Compiling and Executing Features

To compile and execute features:

import numpy as np
from feature_fabrica.core import FeatureManager

data = {
    "feature_a": np.array([10.0], dtype=np.float32),
    "feature_b": np.array([20.0], dtype=np.float32),
}
feature_manager = FeatureManager(
    config_path="../examples", config_name="basic_features"
)
results = feature_manager.compute_features(data)
print(results["feature_c"])  # 0.5 * (10 + 20) = 15.0
print(results.feature_c)  # 0.5 * (10 + 20) = 15.0

Visualize Features and Dependencies

Track & trace Transformation Chains

import numpy as np
from feature_fabrica.core import FeatureManager

data = {
    "feature_a": np.array([10.0], dtype=np.float32),
    "feature_b": np.array([20.0], dtype=np.float32),
}
feature_manager = FeatureManager(
    config_path="../examples", config_name="basic_features"
)
results = feature_manager.compute_features(data)
print(feature_manager.features.feature_c.get_transformation_chain())
# Transformation Chain: (Transformation: sum_fn, Value: 30.0 Time taken: 9.5367431640625e-07 seconds) -> (Transformation: scale_feature, Value: 15.0, Time taken:  9.5367431640625e-07 seconds)

Visualize Dependencies

from feature_fabrica.core import FeatureManager

feature_manager = FeatureManager(
    config_path="../examples", config_name="basic_features"
)
feature_manager.get_visual_dependency_graph()

image.png

Contributing

We welcome contributions to Feature Fabrica! If you have ideas for new features, improvements, or if you'd like to report issues, feel free to open a pull request or an issue on GitHub.

How to Contribute

  1. Fork the repository to your own GitHub account.
  2. Clone your fork locally.
  3. Create a new branch for your feature or fix.
  4. Commit your changes with a clear and concise message.
  5. Push to the branch.
  6. Open a pull request from your fork to the original repository.

We look forward to your contributions! 😄

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

feature_fabrica-1.3.0.tar.gz (21.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

feature_fabrica-1.3.0-py3-none-any.whl (25.4 kB view details)

Uploaded Python 3

File details

Details for the file feature_fabrica-1.3.0.tar.gz.

File metadata

  • Download URL: feature_fabrica-1.3.0.tar.gz
  • Upload date:
  • Size: 21.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.9

File hashes

Hashes for feature_fabrica-1.3.0.tar.gz
Algorithm Hash digest
SHA256 6817aac0a3a2866555a489210cc75f73dc9685d76ee47bf565399a9ed30487d1
MD5 13f3501d0b0742b831062859cefd286c
BLAKE2b-256 50aeb123a28f96f1ff0a40946572ba956680564cc8560c8ea5a00e7b87b0caa2

See more details on using hashes here.

File details

Details for the file feature_fabrica-1.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for feature_fabrica-1.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4a9594dc20484018eda71d65d3363a5b0747c7defe9feb3b2c6924eccd3f22aa
MD5 8ea52616c812159ae8a4b701486d3c17
BLAKE2b-256 1b2b363f7063ee2906a954083ef80ed3881fba1cd53cd0a01c68cc466ba79e5f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page