Skip to main content

Python SDK for Feast

Project description


PyPI - Downloads GitHub contributors unit-tests integration-tests-and-build linter Docs Latest Python API License GitHub Release

Join us on Slack!

👋👋👋 Come say hi on Slack!

Check out our DeepWiki!

Overview

feast-dev%2Ffeast | Trendshift

Feast (Feature Store) is an open source feature store for machine learning. Feast is the fastest path to manage existing infrastructure to productionize analytic data for model training and online inference.

Feast allows ML platform teams to:

  • Make features consistently available for training and serving by managing an offline store (to process historical data for scale-out batch scoring or model training), a low-latency online store (to power real-time prediction), and a battle-tested feature server (to serve pre-computed features online).
  • Avoid data leakage by generating point-in-time correct feature sets so data scientists can focus on feature engineering rather than debugging error-prone dataset joining logic. This ensure that future feature values do not leak to models during training.
  • Decouple ML from data infrastructure by providing a single data access layer that abstracts feature storage from feature retrieval, ensuring models remain portable as you move from training models to serving models, from batch models to realtime models, and from one data infra system to another.

Please see our documentation for more information about the project.

📐 Architecture

The above architecture is the minimal Feast deployment. Want to run the full Feast on Snowflake/GCP/AWS? Click here.

🐣 Getting Started

1. Install Feast

pip install feast

2. Create a feature repository

feast init my_feature_repo
cd my_feature_repo/feature_repo

3. Register your feature definitions and set up your feature store

feast apply

4. Explore your data in the web UI (experimental)

Web UI

feast ui

5. Build a training dataset

from feast import FeatureStore
import pandas as pd
from datetime import datetime

entity_df = pd.DataFrame.from_dict({
    "driver_id": [1001, 1002, 1003, 1004],
    "event_timestamp": [
        datetime(2021, 4, 12, 10, 59, 42),
        datetime(2021, 4, 12, 8,  12, 10),
        datetime(2021, 4, 12, 16, 40, 26),
        datetime(2021, 4, 12, 15, 1 , 12)
    ]
})

store = FeatureStore(repo_path=".")

training_df = store.get_historical_features(
    entity_df=entity_df,
    features = [
        'driver_hourly_stats:conv_rate',
        'driver_hourly_stats:acc_rate',
        'driver_hourly_stats:avg_daily_trips'
    ],
).to_df()

print(training_df.head())

# Train model
# model = ml.fit(training_df)
            event_timestamp  driver_id  conv_rate  acc_rate  avg_daily_trips
0 2021-04-12 08:12:10+00:00       1002   0.713465  0.597095              531
1 2021-04-12 10:59:42+00:00       1001   0.072752  0.044344               11
2 2021-04-12 15:01:12+00:00       1004   0.658182  0.079150              220
3 2021-04-12 16:40:26+00:00       1003   0.162092  0.309035              959

6. Load feature values into your online store

Option 1: Incremental materialization (recommended)

CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
feast materialize-incremental $CURRENT_TIME

Option 2: Full materialization with timestamps

CURRENT_TIME=$(date -u +"%Y-%m-%dT%H:%M:%S")
feast materialize 2021-04-12T00:00:00 $CURRENT_TIME

Option 3: Simple materialization without timestamps

feast materialize --disable-event-timestamp

The --disable-event-timestamp flag allows you to materialize all available feature data using the current datetime as the event timestamp, without needing to specify start and end timestamps. This is useful when your source data lacks proper event timestamp columns.

Materializing feature view driver_hourly_stats from 2021-04-14 to 2021-04-15 done!

7. Read online features at low latency

from pprint import pprint
from feast import FeatureStore

store = FeatureStore(repo_path=".")

feature_vector = store.get_online_features(
    features=[
        'driver_hourly_stats:conv_rate',
        'driver_hourly_stats:acc_rate',
        'driver_hourly_stats:avg_daily_trips'
    ],
    entity_rows=[{"driver_id": 1001}]
).to_dict()

pprint(feature_vector)

# Make prediction
# model.predict(feature_vector)
{
    "driver_id": [1001],
    "driver_hourly_stats__conv_rate": [0.49274],
    "driver_hourly_stats__acc_rate": [0.92743],
    "driver_hourly_stats__avg_daily_trips": [72]
}

📦 Functionality and Roadmap

The list below contains the functionality that contributors are planning to develop for Feast.

🎓 Important Resources

Please refer to the official documentation at Documentation

👋 Contributing

Feast is a community project and is still under active development. Please have a look at our contributing and development guides if you want to contribute to the project:

🌟 GitHub Star History

Star History Chart

✨ Contributors

Thanks goes to these incredible people:

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

feast-0.64.1.dev11.tar.gz (7.5 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

feast-0.64.1.dev11-py3-none-any.whl (8.9 MB view details)

Uploaded Python 3

File details

Details for the file feast-0.64.1.dev11.tar.gz.

File metadata

  • Download URL: feast-0.64.1.dev11.tar.gz
  • Upload date:
  • Size: 7.5 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for feast-0.64.1.dev11.tar.gz
Algorithm Hash digest
SHA256 597b63fdd9dc2d6eda57875c7ccaffae6f3ae82ad7228b572cd51c6ac4a72062
MD5 ca2c20569d53e06152ecf55a924ce998
BLAKE2b-256 a624ef9e580df3f75bc2bb03594b10f305b553895c7afbd65f2298e2a1ab5cee

See more details on using hashes here.

File details

Details for the file feast-0.64.1.dev11-py3-none-any.whl.

File metadata

  • Download URL: feast-0.64.1.dev11-py3-none-any.whl
  • Upload date:
  • Size: 8.9 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.9.25

File hashes

Hashes for feast-0.64.1.dev11-py3-none-any.whl
Algorithm Hash digest
SHA256 41b1546619f4497b89817bdc5bdb9b5b81c93d059d7c2eb73f74839446b3217c
MD5 05c0d67de0e5d872a21b79ed9d9829d2
BLAKE2b-256 6d1d2b3793973337d9f37be2f9d6abdab99e30b679d2b411a0c9d3902bf4aa21

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page