No project description provided

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Yummy - delicious Feast extension

Yummy project adds possiblity to run Feast on multiple backends:

This gives flexibility in setting up the feature store on existing environments and using its capabilities. Moreover using Yummy you can combine multiple and different datasources during historical fetch task.

Install yummy:

pip install https://github.com/qooba/yummy.git

Create a feature repository:

feast init feature_repo
cd feature_repo

Offline store:

Polars

To configure the offline store edit feature_store.yaml

project: feature_repo
registry: data/registry.db
provider: local
online_store:
    ...
offline_store:
    type: yummy.YummyOfflineStore
    backend: polars

Dask

To configure the offline store edit feature_store.yaml

project: feature_repo
registry: data/registry.db
provider: local
online_store:
    ...
offline_store:
    type: yummy.YummyOfflineStore
    backend: dask

Ray

To configure the offline store edit feature_store.yaml

project: feature_repo
registry: data/registry.db
provider: local
online_store:
    ...
offline_store:
    type: yummy.YummyOfflineStore
    backend: ray

Spark

To configure the offline store edit feature_store.yaml

project: feature_repo
registry: data/registry.db
provider: local
online_store:
    ...
offline_store:
    type: yummy.YummyOfflineStore
    backend: spark
    spark_conf:
        spark.master: "local[*]"
        spark.ui.enabled: "false"
        spark.eventLog.enabled: "false"
        spark.sql.session.timeZone: "UTC"

Features definition

Example features.py:

from google.protobuf.duration_pb2 import Duration
from feast import Entity, Feature, FeatureView, ValueType
from yummy import ParquetDataSource, CsvDataSource, DeltaDataSource

my_stats_parquet = ParquetDataSource(
    path="/home/jovyan/notebooks/ray/dataset/all_data.parquet",
    event_timestamp_column="datetime",
)

my_stats_delta = DeltaDataSource(
    path="dataset/all",
    event_timestamp_column="datetime",
    #range_join=10,
)

my_stats_csv = CsvDataSource(
    path="/home/jovyan/notebooks/ray/dataset/all_data.csv",
    event_timestamp_column="datetime",
)

my_entity = Entity(name="entity_id", value_type=ValueType.INT64, description="entity id",)

mystats_view_parquet = FeatureView(
    name="my_statistics_parquet",
    entities=["entity_id"],
    ttl=Duration(seconds=3600*24*20),
    features=[
        Feature(name="p0", dtype=ValueType.FLOAT),
        Feature(name="p1", dtype=ValueType.FLOAT),
        Feature(name="p2", dtype=ValueType.FLOAT),
        Feature(name="p3", dtype=ValueType.FLOAT),
        Feature(name="p4", dtype=ValueType.FLOAT),
        Feature(name="p5", dtype=ValueType.FLOAT),
        Feature(name="p6", dtype=ValueType.FLOAT),
        Feature(name="p7", dtype=ValueType.FLOAT),
        Feature(name="p8", dtype=ValueType.FLOAT),
        Feature(name="p9", dtype=ValueType.FLOAT),
        Feature(name="y", dtype=ValueType.FLOAT),
    ], online=True, input=my_stats_parquet, tags={},)

mystats_view_delta = FeatureView(
    name="my_statistics_delta",
    entities=["entity_id"],
    ttl=Duration(seconds=3600*24*20),
    features=[
        Feature(name="d0", dtype=ValueType.FLOAT),
        Feature(name="d1", dtype=ValueType.FLOAT),
        Feature(name="d2", dtype=ValueType.FLOAT),
        Feature(name="d3", dtype=ValueType.FLOAT),
        Feature(name="d4", dtype=ValueType.FLOAT),
        Feature(name="d5", dtype=ValueType.FLOAT),
        Feature(name="d6", dtype=ValueType.FLOAT),
        Feature(name="d7", dtype=ValueType.FLOAT),
        Feature(name="d8", dtype=ValueType.FLOAT),
        Feature(name="d9", dtype=ValueType.FLOAT),
    ], online=True, input=my_stats_delta, tags={},)

    
mystats_view_csv = FeatureView(
    name="my_statistics_csv",
    entities=["entity_id"],
    ttl=Duration(seconds=3600*24*20),
    features=[
        Feature(name="c1", dtype=ValueType.FLOAT),
        Feature(name="c2", dtype=ValueType.FLOAT),
    ], online=True, input=my_stats_csv, tags={},)

Historical fetch

from feast import FeatureStore
import pandas as pd
import time

store = FeatureStore(repo_path=".")

start_time = time.time()
training_df = store.get_historical_features(
    entity_df=entity_df, 
    features = [
        'my_statistics_parquet:p1',
        'my_statistics_parquet:p2',
        'my_statistics_delta:d1',
        'my_statistics_delta:d2',
        'my_statistics_csv:c1',
        'my_statistics_csv:c2'
    ],
).to_df()


print("--- %s seconds ---" % (time.time() - start_time))

training_df

References

This project is based on the Feast project.

I was also inspired by the other projects:

feast-spark-offline-store - spark configuration and session

feast-postgres - parts of Makefiles and github workflows

Project details

These details have not been verified by PyPI

Project links

Homepage

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

0.0.11

Jan 28, 2024

0.0.10

Jan 27, 2024

0.0.9

Apr 1, 2023

0.0.8

Jan 31, 2023

0.0.7

Nov 28, 2022

0.0.6

Nov 17, 2022

0.0.5.1

Oct 8, 2022

0.0.5

Oct 8, 2022

0.0.4.1

Aug 28, 2022

0.0.4

Aug 28, 2022

0.0.3

Jul 26, 2022

0.0.2

Jul 20, 2022

This version

0.0.1

Apr 19, 2022

0.0.0

Dec 8, 2016

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yummy-0.0.1.tar.gz (18.2 kB view hashes)

Uploaded Apr 19, 2022 Source

Hashes for yummy-0.0.1.tar.gz

Hashes for yummy-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`33baa79cefbd821abf59ea4d24583245ec31f30ee4fcc3ed5a3b61e0aacaed06`
MD5	`c1e368afea410ae288efd1eeeeb11779`
BLAKE2b-256	`daf328f7c515290508a2d4edf6207621d53abf826b5b8ee7f67e46eb591115ae`