Skip to main content

Trino support for Feast offline store

Project description

Feast Trino Support

Trino is not included in current Feast roadmap, this project intends to add Trino support for Offline Store.

Version compatibilities

The feast-trino plugin is tested on the following versions of python [3.7, 3.8, 3.9]

Here is also how the current feast-trino plugin has been tested against different versions of Feast and Trino

Feast-trino Feast Trino
1.0.* From 0.15.* to 0.18.* 364

Quickstart

Install feast-trino

  • Install stable version
pip install feast-trino
  • Install develop version (not stable):
pip install git+https://github.com/shopify/feast-trino.git@main

Create a feature repository

feast init feature_repo

Edit feature_store.yaml

set offline_store type to be feast_trino.TrinoOfflineStore

project: feature_repo
registry: data/registry.db
provider: local
offline_store:
    type: feast_trino.trino.TrinoOfflineStore
    host: localhost
    port: 8080
    catalog: memory
    connector:
        type: memory
online_store:
    path: data/online_store.db

Create Trino Table

Edit feature_repo/example.py

# This is an example feature definition file
import pandas as pd
from google.protobuf.duration_pb2 import Duration
from feast import Entity, Feature, FeatureView, FileSource, ValueType, FeatureStore

from feast_trino.connectors.upload import upload_pandas_dataframe_to_trino
from feast_trino import TrinoSource
from feast_trino.trino_utils import Trino

store = FeatureStore(repo_path="feature_repo")

client = Trino(
    user="user",
    catalog=store.config.offline_store.catalog,
    host=store.config.offline_store.host,
    port=store.config.offline_store.port,
)
client.execute_query("CREATE SCHEMA IF NOT EXISTS feast")
client.execute_query("DROP TABLE IF EXISTS feast.driver_stats")

input_df = pd.read_parquet("./feature_repo/data/driver_stats.parquet")
upload_pandas_dataframe_to_trino(
    client=client,
    df=input_df,
    table_ref="feast.driver_stats",
    connector_args={"type": "memory"},
)


# Read data from parquet files. Parquet is convenient for local development mode. For
# production, you can use your favorite DWH, such as BigQuery. See Feast documentation
# for more info.
driver_hourly_stats = TrinoSource(
    event_timestamp_column="event_timestamp",
    table_ref="feast.driver_stats",
    created_timestamp_column="created",
)

# Define an entity for the driver. You can think of entity as a primary key used to
# fetch features.
driver = Entity(name="driver_id", value_type=ValueType.INT64, description="driver id",)

# Our parquet files contain sample data that includes a driver_id column, timestamps and
# three feature column. Here we define a Feature View that will allow us to serve this
# data to our model online.
driver_hourly_stats_view = FeatureView(
    name="driver_hourly_stats",
    entities=["driver_id"],
    ttl=Duration(seconds=86400 * 1),
    features=[
        Feature(name="conv_rate", dtype=ValueType.FLOAT),
        Feature(name="acc_rate", dtype=ValueType.FLOAT),
        Feature(name="avg_daily_trips", dtype=ValueType.INT64),
    ],
    online=True,
    batch_source=driver_hourly_stats,
    tags={},
)
store.apply([driver, driver_hourly_stats_view])

# Run an historical retrieval query
output_df = store.get_historical_features(
    entity_df="""
    SELECT
        1004 AS driver_id,
        TIMESTAMP '2021-11-21 15:00:00+00:00' AS event_timestamp
    """,
    features=["driver_hourly_stats:conv_rate"]
).to_df()
print(output_df.head())

Apply the feature definitions

python feature_repo/example.py

Developing and Testing

Developing

git clone https://github.com/shopify/feast-trino.git
cd feast-trino
# creating virtual env ...
python -v venv venv/
source venv/bin/activate

make build

# before commit
make format
make lint

Testing unit test

make start-local-cluster
make test
make kill-local-cluster

Note: You can visit http://localhost:8080/ui/ to access the Web UI of Trino. This makes it easy to look for queries.

Testing against Feast universal suite

make install-feast-submodule
make start-local-cluster
make test-python-universal
make kill-local-cluster

Using different versions of Feast or Trino

The makefile contains the following default values:

  • FEAST_VERSION: v0.15.1
  • TRINO_VERSION: 364

Thus, make install-feast-submodule will automatically compile Feast v0.15.1. If you want to try another version like v0.14.1, you just need to run make install-feast-submodule FEAST_VERSION=v0.14.1

Same applies for TRINO_VERSION when you start the local cluster make start-local-cluster TRINO_VERSION=XXX

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

feast-trino-1.0.1.tar.gz (15.4 kB view details)

Uploaded Source

Built Distribution

feast_trino-1.0.1-py3-none-any.whl (14.6 kB view details)

Uploaded Python 3

File details

Details for the file feast-trino-1.0.1.tar.gz.

File metadata

  • Download URL: feast-trino-1.0.1.tar.gz
  • Upload date:
  • Size: 15.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.1

File hashes

Hashes for feast-trino-1.0.1.tar.gz
Algorithm Hash digest
SHA256 bfa0df6d5d79f91847f577d3085ab45037cb928f339b64714046eee2bb29091b
MD5 230793d54963c598d08b79d35d2c2b66
BLAKE2b-256 4022faab46f2e8b239f67f67c5b6ce62a1ad9839dd98c101e17cdee010dd0b26

See more details on using hashes here.

File details

Details for the file feast_trino-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: feast_trino-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 14.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.8.2 pkginfo/1.8.2 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.1

File hashes

Hashes for feast_trino-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9b407aa2b632a8f9ab4afc4758cacea8638fd3313f34f167b36d3ede71a4b017
MD5 1e8c3e3d5594b3f30dda67d7c568ae7b
BLAKE2b-256 7fea93cdc0d912107e1503392903a5bff59007b4846ff203a1346b833b05a776

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page